Google
  • Ambassador of Healing
    YouthAIDS ambassador Seane Corn circles the globe teaching the healing power of yoga
    by Jeff Berry
  • The Beating of Your
    Heart

    Special conference looks at what’s known—and not known— about cardiovascular disease in HIV
    by Enid Vázquez
XVII International AIDS Conference

Statistics 101

A basic guide to the numbers game behind research
by Amy G. Cutrell

Have you ever stood around a cocktail party discussing statistics? I didn’t think so. But we hear about them all the time, especially when we’re absorbing clinical trial results—or election year polls. Understanding the basics of statistics is not that hard to do, despite how it seemed in school!

There are a handful of concepts that serve as the building blocks for learning statistics. I’m jumping ahead of mean, median, mode; those will not be covered here (see sidebar). We will look first at standard deviation, standard error, and confidence interval, and how they all tie in to p-value.

Building Block #1: Standard Deviation

The standard deviation is simply a measure of the amount of variability in your particular sample. Because we can’t practically measure everyone in a population that we are interested in studying, we have to take a sample of the population. The standard deviation describes the variation in measurements from individual to individual data point within the sample.

Below is a picture of what the standard deviation (SD) tells us. The mean, μ, which is the average, is the highest point in the center of the bell-shaped curve. The area between plus or minus (±) 1 standard deviation (1σ) of the mean captures 68% of all measurements in your sample; the area between ± 2 standard deviations (2σ) captures 95% of all measurements in your sample. In other words, not very many data points—approximately 5%—will lie more than 2 standard deviations from the mean.

Building Block #2: Standard Error

The standard error (SE) is important in describing how well the sample mean represents the true population mean. Remember, because we can’t practically measure everyone in the population, we take a random sample. Every random sample will give a slightly different estimation of the whole population. The standard error gives you a measure of how precise your sample mean is compared to the true population mean. It is calculated by the standard deviation divided by the square root of the mean. So it depends on the size of your sample. As the sample size gets larger, then variability gets smaller and we get a more precise measurement of the truth. If we measure every member of the population, then it is no longer a sample. There is only one value that can be computed by measuring every member of the population, thus there is no variability and the truth is known.

Building Block #3: Confidence Interval

Since every random sample will give a slightly different estimation of the whole population, it makes sense to try to describe what the true population looks like with more than a single number. The confidence interval (CI) estimates a range of values within which we are pretty sure the truth lies. The confidence interval depends on the standard error. It is calculated by the sample mean you have measured in your sample plus or minus approximately 2 times the standard error. For example, the 95% CI gives us the range of values within which we are confident that the true population mean falls 95% of the time.

Every point outside of the confidence interval is very unlikely to occur by chance alone. If we have a 95% CI, then it means that there is a 5% chance that the true mean of the population is outside of that interval. In other words, we’re pretty confident of the value of our confidence interval! This area outside the confidence area corresponds to a parameter called alpha, α. And usually we split α so that half is in the upper tail (2.5%) and half is in the lower tail (2.5%). This is what we mean by a two-tailed confidence interval.

The CI tells us a lot about our sample. We can draw conclusions about statistical significance based on the location of the CI. For example, suppose we have a CI that estimates the difference between two groups: a value of zero corresponds to no difference (that is, the null value). Therefore a CI that excludes zero denotes a statistically significant finding. Beware, though, that the null value is not always zero! It depends upon your null hypothesis, which will be described below.

Combining Building Blocks 1, 2, & 3: Describing a Sample

Before moving on to new concepts, let’s put the three summary statistics of standard deviation, standard error, and confidence interval together by considering an example from blood pressure measurements on 50 individuals. The scatter plot in the picture below shows each of the 50 individual measurements from this hypothetical sample. Our sample mean is represented by the large dot in the center of the 4 vertical lines beside the scatter plot. In the first green line, we can see that about 2/3 of our sample results are contained within +/- 1 standard deviation. In the second green line, 95% of our data points are covered by +/- 2 standard deviations. Once we know the standard error, we can construct the confidence interval. The 95% CI, depicted by the second blue line, gives us the range within which we are 95% confident that the true population mean lies.

Four Steps in Conducting an Experiment or Study


Step 1

The first step of the experiment is to state your hypothesis. The null hypothesis, H0, is pre-defined and represents a statement which is the reverse of what we hope the experiment will show. It is named the null hypothesis because it is the statement that we want the data to reject. The alternative hypothesis, Ha, is also predefined and represents a statement of what we hope the experiment will show. Ha is the hypothesis that there is a real effect.

Suppose we design a study to test if Optimized Background Treatment (OBT) + New Drug are better than just OBT alone. Our null hypothesis is that there is no difference between the groups. Our alternative hypothesis is that the two groups are not the same. (We hope that OBT + New Drug is better!) Depending on the data gathered from the study, we will either reject the null hypothesis or not.

Step 2

The next step is to design your experiment and select the test statistic. The test statistic defines the method that will be used to compare the two groups and help interpret the outcome at the end of the study.

Sometimes the comparison may be based on the differences in means, and use continuous data analysis methods. Sometimes the comparison may be based on proportions, and use categorical data analysis methods. There are many possibilities.

For our example from Step 1, the test statistic will be based on the comparison of the proportion of patients with HIV viral load (VL) less than (<) 50 copies/mL in each treatment group of our study sample.

Many other important decisions go into designing the experiment besides selecting the test statistic. We also calculate the sample size, agree on the power of the study, and establish parameters like α and β (described below).

Step 3

After we generate a random study sample, we conduct the study and collect the data. The third step is to investigate the hypotheses stated in Step 1 and compare the groups. In our hypothetical example, we find that 75% of subjects in the OBT + New Drug group achieve VL <50 copies/mL compared to 35% of patients in the control group. This produces a p-value <0.0001.

The p-value (the “p” stands for “probability”) helps us decide whether the data from the random study sample supports the null hypothesis or the alternative hypothesis. P-value is the probability that these results would occur if there was truly no difference between the groups—that is, how likely the results would have been observed purely by chance. The closer the p-value is to 0, the greater the likelihood that the observed difference in viral load is real and not due to chance, thus the more reason we have to reject the null hypothesis in favor of the alternative hypothesis. We look for a p-value of 0.05 or smaller. This represents a 5-in-100 probability—a very small chance indeed!

Step 4

The last step is to compare the p-value with α and interpret the finding. Alpha is called the significance level. As described above in the section on CI, it is the area outside of the confidence area. It is most commonly defined as α=0.05. If the p-value is less than or equal to α, then the null hypothesis is rejected and we declare a statistically significant finding has been observed. If the p-value is greater than α, then the null hypothesis is not rejected.

Remember, our hypothetical example produced a p-value <0.0001. This is well below α=0.05, so we reject the null hypothesis and conclude that OBT + New Drug and OBT alone are different. We can even take it one step further and conclude that OBT + New Drug are better than OBT.

The results of a study are often described by both the p-value and the 95% CI. The p-value is a single number that guides whether or not to reject the null hypothesis. The 95% CI provides a range of plausible values for describing the underlying population.

Concepts in Designing a Study: Type I error, Type II error, and power

Three more terms that we often hear or read about are called Type I error, Type II error, and power. They are inter-related and are important in the design stage and in the interpretation stage as well.

One way to think about them is to consider the relationship between a smoke detector and a house fire. (Reference: Larry Gonick & Woollcott Smith; The Cartoon Guide to Statistics; 1993; pp151-152). The purpose of the smoke detector, of course, is to warn us in case of a fire. However, it is possible to have a fire without an alarm, as well as an alarm without a fire. Those are situations or errors that we do not ideally want, but they are possible events nevertheless. So the “true state” can be either no fire (Ho) or house fire (Ha).

Ideally, we want the alarm to alert us if there is a fire and we want the alarm to remain silent when there is no fire.

If we have an alarm without a fire, then a Type I error has been committed. This corresponds to α, which is the probability of claiming a difference/rejecting Ho. Alpha is normally pre-set to 0.05. In other words, we accept a 5% chance of a “false alarm.”

If we have a fire but it does not cause an alarm, then a Type II error has been committed. This corresponds to beta, β, which is the probability of missing a difference when one truly exists/not rejecting Ho. Beta is normally pre-set to 10% or 20%. In other words, we accept a 10% or 20% chance of a “failed alarm.”

Power is defined by 1-β, which is the probability of a real fire when there is an alarm. It is normally pre-set to 80% or 90%. It controls the probability of observing a true difference, or a “true alarm.” In other words, with power=80%, we accept that eight trials out of 10 will correctly declare a true difference and that two trials out of 10 will incorrectly miss a true difference. β is a risk that we would want to minimize; and it is a risk to minimize as much as possible but it comes with a price: a larger study, plus more time to recruit subjects, measure, and report.

A p-value greater than α=0.05 could be non-significant because there is truly no difference between the groups. Or it could be non-significant because the study is not large enough to detect a true underlying difference. Determining the optimal sample size for a study requires a great deal of thought in the beginning at the planning stage. A sample size that is unnecessarily large is a waste of resources. But a sample size that is too small has a higher likelihood of not representing the underlying population and consequently missing a “true alarm.” The small study has a wider confidence interval because the standard error is large, or less precise. As we said above in Building Block #2, when the sample size gets larger, the variability gets smaller and we get a more precise measurement of the truth. The optimal sample size depends on all of the various assumptions that go into its calculation.

For instance, to plan a superiority study as in Step 2 above, we need to make decisions/assumptions on the following parameters: α (generally 0.05), whether the hypothesis is one-sided or two-sided (generally two-sided), power (generally 80-90%), the response rate in the test arm, and the response rate in the control arm. These assumptions are directly tied to the study being designed—so different types of studies require different sets of information for the sample size calculation. Changing any one of the decisions/assumptions will change the sample size calculation.

Basic Requirements of Clinical Research Study

Understanding the basics of statistics is helpful in evaluating the messages that arise out of research. Good research follows clearly articulated steps and serious planning. The goal of research is to answer a question. In order to do so, it comes down to establishing:

  • Clearly stated primary objective
  • Clearly defined primary endpoint
  • Well-defined study population
  • Study design characteristics that support the objectives\
  • Adequate sample size
  • Analysis plan that addresses objective and all important analysis issues

In conclusion, study designs are chosen depending on the questions that are being studied. Study endpoints are selected according to the hypothesis under investigation and the study population being enrolled. And study interpretations depend on the hypotheses being tested. Statistics can help weigh the evidence and draw conclusions from the data.

Amy Cutrell resides in Chapel Hill, NC, and has worked at GlaxoSmithKline for twenty years. She received a MS in biostatistics from the UNC School of Public Health.

This site contains HIV prevention messages that may not be appropriate for all audiences. Since HIV infection is spread primarily through sexual practices or by sharing needles, prevention messages and programs may address these topics. If you are not seeking such information or may be offended by such materials, please exit this website.

Este sitio del web contiene mensajes de la prevención del VIH que pueden no ser appropiados para todas las audiencias. Puesto que la infección del VIH es propagada sobre todo con prácticas sexuales o por compartiendo jeringas, los mensajes y los programas de la prevención pueden dirigirse a estos asuntos. Si usted no está buscando tal información o puede ser ofendido por tales materiales, salga por favor de este website.

Opinions expressed in Positively Aware, Positively Aware en Español, or tpan.com are not necessarily those of staff or membership of Test Positive Aware Network (TPAN), its supporters and sponsors, or distributing agencies. Information, resources, and advertising in Positively Aware, Positively Aware en Español, or tpan.com do not constitute endorsement or recommendation of any medical treatment or product.

TPAN recommends that all medical treatments or products be discussed thoroughly and frankly with a licensed and fully HIV-informed medical practitioner, preferably a personal physician.

Although Positively Aware, Positively Aware en Español, and tpan.com take great care to ensure the accuracy of all the information that it presents, Positively Aware, Positively Aware en Español, and tpan.com staff and volunteers, TPAN, or the institutions and personnel who provide us with information cannot be held responsible for any damages, direct or consequential, that arise from use of this material or due to errors contained herein.

See our full Privacy Policy & Terms of Use
©Copyright 2008 positivelyaware.com