Statistical power

The power of a statistical test is the probability that the test will reject a false null hypothesis * that is, that it will not make a Type II error. The higher the power, the greater the chance of obtaining a statistically significant result when the null hypothesis is false.

Statistical tests attempt to use data from samples to determine if differences or similarities exist in a population. For example, to test the null hypothesis that the mean scores of men and women on a test do not differ, samples of men and women will be drawn, the test administered to them, and the mean score in each group compared with a statistical test. If the populations of men and women have different mean scores but the test of the sample data concludes that there is no such difference, a Type II error has been made.

Statistical power depends on the significance criterion, the size of the difference or the strength of the similarity (that is, the effect size) in the population, and the sensitivity of the data.

A significance criterion is a statement of how unlikely a difference must be, if the null hypothesis is true, to be considered significant. The most commonly used criteria are probabilities of 0.05, 0.01, and 0.001. If the criterion is 0.05, the probability of the difference must be less than 0.05, and so on. The greater the effect size, the greater the power. Calculation of power requires that researchers determine the effect size they want to detect.

Sensitivity can be increased by using statistical controls, by increasing the reliability of measures (as in psychometric reliability), and by increasing the size of the sample. Increasing sample size is the most commonly used method for increasing statistical power.

Although there are no formal standards for power, most researchers who assess the power of their tests use 0.80 as a standard for adequacy.