Dietrich and Kearns (1986) divided statistics into two broad areas, namely descriptive and inferential statistics. Descriptive statistics is the science of summarizing or describing data, while inferential statistics is the science of interpreting data in order to make estimates, hypotheses testing, predictions, or decisions from the samples to the targeted population.
In clinical trials, data are usually collected through case report forms which are designed to capture clinical information from the studies. The information on the case report forms is then entered into the database. The raw database is always messy, though it does contain valuable clinical information from the study. In practice, it is often of interest to summarize the raw database by a graphical presentation (e.g., a data plot) or by descriptive (or summary) statistics. Descriptive statistics are simple sample statistics such as means and standard deviations (or standard errors) of clinical variables or endpoints. Note that the standard deviation describes the variability of a distribution, either a population distribution or a sample distribution, whereas the standard error is the variability of a sample statistic (e.g., sample mean or sample variance). Descriptive statistics are often used to describe the targeted population before and after the study. For example, at baseline, descriptive statistics are often employed to describe the comparability between treatment groups. After the completion of the study, descriptive statistics are useful tools to reveal possible clinical differences (or effects) or trends of study drugs. As an example, Table 2.5.1 provides a partial listing of individual patient demographics and baseline characteristics from a study comparing the effects of captopril and enalapril on quality of life in the older hypertensive patients (Testa et al., 1993). As can be seen from Table 2.5.1, although as a whole, the patient listing gives a detailed description of the characteristics for individual patients, it does not provide much summary information regarding the study population. In addition descriptive statistics for demographic and baseline information describe not only the characteristics of the study population but also the comparability between treatment groups (see Table 2.5.2). In addition, for descriptive purposes, Table 2.5.3 groups patients into low, medium, and high categories according to the ranking of their scores on the baseline quality of life scale. It can be seen that there is a potential difference in treatment effect among the three groups with regard to the change from baseline on the quality of life. These differences were confirmed to be statistically significant by valid statistical tests. Therefore a preliminary investigation of descriptive statistics of primary clinical endpoints may reveal a potential drug effect.
When we observe some potential differences (effects) or trends, it is necessary to further confirm with certain assurance that the differences (effects) or trends indeed exist and are not due to chance alone. For this purpose it is necessary to provide inferential statistics for the observed differences (effects) or trends. Inferential statistics such as confidence intervals and hypotheses testing are often performed to provide statistical inference on the possible differences (effects) or trends that can be detected based on descriptive statistics. For the rest of this section, we will focus on confidence intervals (or interval estimates). Hypotheses testing will be discussed in more detail in the following section.
Clinical endpoints are often used to asses the efficacy and safety of drug products. For example, diastolic blood pressure is one of the primary clinical endpoints for the study of ACE inhibitor agents in the treatment of hypertensive patients. The purpose of the diastolic blood pressure for hypertensive patients is to compare their average diastolic blood pressure with the norm for ordinary health subjects. However, the average diastolic blood pressure for the hypertensive patients is unknown. We will need to estimate the average diastolic blood pressure based on the observed diastolic pressures obtained from the hypertensive patients. The observed diastolic blood pressures and the average of these diastolic blood pressures are the sample and sample mean of the study. The sample mean is an estimate of the unknown population average diastolic blood pressure. Point estimates may not be of practical use. For example, suppose that the sample mean is 98mmHg. It is then important to know whether the population average for the hypertensive patients could reasonably be 90mmHg given that the sample average turned out to be 98 mmHg. This kind of information depends on the knowledge of the standard error, not merely of the point estimate itself.
The observed diastolic blood pressures are usually scattered around the sample mean. Based on these observed diastolic blood pressures, the standard error of the sample mean of the observed diastolic blood pressures can be obtained. If the distribution of the diastolic blood pressure appears to be a bell shaped and the sample size is of moderate size, then there is about 95% chance that the unknown average diastolic blood pressure of the targeted population will fall within the area between approximate two (i.e., 1.96) standard errors below and above the sample mean. The lower and upper limits of the area constitute an interval estimate for the unknown population average diastolic blood pressure. An interval estimate is usually referred to as a confidence interval with a desired confidence level,
Was this article helpful?
Do You Suffer From High Blood Pressure? Do You Feel Like This Silent Killer Might Be Stalking You? Have you been diagnosed or pre-hypertension and hypertension? Then JOIN THE CROWD Nearly 1 in 3 adults in the United States suffer from High Blood Pressure and only 1 in 3 adults are actually aware that they have it.