The range of outcomes from an experiment are also described mathematically by their central tendency and their dispersion. Central tendency is a measure of the center of the distribution. This can be characterized by the mean (the arithmetic average) of the outcomes or by the median, which is the value above and below which the number of outcomes is the same. The mean of 3, 4, and 8 is 5, whereas the median is 4. The median length of response to a gene therapy trial might be 30 days, meaning as many people had less than 30 days' benefit as had more than that. The mean might be considerably moreā€”if one person benefited for 180 days, for instance.

Dispersion is a measure of how spread out the outcomes of the random variable are from their mean. It is characterized by the variance or standard deviation. The spread of the data can often be as important as the central tendency in estimating the value of the results. For instance, suppose the median number of errors in a gene-sequencing procedure was 3 per 10,000 bases sequenced. This error rate might be acceptable if the range that was found in 100 trials was between 0 and 5 errors, but it would be unacceptable if the range was between 0 and 150 errors. The occasional large number of errors makes the data from any particular procedure suspect.

Another important concept in statistics is that of populations and samples. The population represents every possible experimental unit that could

?be measured. For example, every zebra on the continent of Africa might represent a population. If we were interested in the mean genetic diversity of zebras in Africa, it would be nearly impossible to actually analyze the

DNA of every single zebra; neither can we sequence the entire DNA of any individual. Therefore we must take a random selection of some smaller number of zebras and some smaller amount of DNA, and then use the mean differences among these zebras to make inferences about the mean diversity in the entire population.

Any summary measure of the data, such as the mean of variance in a subset of the population, is called a sample statistic. The summary measure of the entire group is called a population parameter. Therefore, we use statistics to estimate parameters. Much of statistics is concerned with the accuracy of parameter estimates. This is the statistical science of point estimation.

The final major discipline of statistics is hypothesis testing. All scientific investigations begin with a motivating question. For example, do identical twins have a higher likelihood than fraternal twins of both developing alcoholism ?

From the question, two types of hypotheses are derived. The first is called the null hypothesis. This is generally a theory about the value of one or more population parameters and is the status quo, or what is commonly believed or accepted. In the case of the twins, the null hypothesis might be that the rates of concordance (i.e., both twins are or are not alcoholic) are the same for identical and fraternal twins. The alternate hypothesis is generally what you are trying to show. This might be that identical twins have a higher concordance rate for alcoholism, supporting a genetic basis for this disorder. It is important to note that statistics cannot prove one or the other hypothesis. Rather, statistics provides evidence from the data that supports one hypothesis or the other.

0 0

Post a comment