Skip to content

stats sample size

MetabolomicsAustralia-Bioinformatics edited this page Jan 9, 2020 · 4 revisions

Statistical Power & Sample Size

Sample Size

The reason why statisticians insist on a sufficiently large sample size (n = 20 is the generally acceptable minimum) is so that certain values computed from the sample, most importantly the sample mean and variance, are close to the true (but unknown) population mean and variance. The difference between sample values (e.g. sample mean) and population values is the error. Error will tend to zero as sample size increases, becoming exactly zero when the sample comprises of the entire population.

There is some notion of a "representative sample" in biology, but this is falling out of favour because (1) journals won't hear of it and (2) representative samples are only representative in certain aspects (e.g. biological variability, or isogenic samples), but not all (e.g. technical variability, such as measurement error).

Restricting our attention to just the t-test: the t-test is still quite useful for small sample sizes; the examples given in the original Student (1905) paper used sample sizes ranging from 4 to 10. Notably, however, the t-test is (supposedly) more conservative, such that a smaller sample size would require a higher bar to clear (i.e. greater separation between the histograms of group populations) in order to confidently reject the null hypothesis.

Statistical power

The power of a test is the probability that the test will correctly reject the null hypothesis (i.e. probability of a true positive). This is commonly used in practice to get an appropriate sample size calculation before the experiment is done, given some assumptions about the (desired) magnitude of effect and statistical significance threshold being used.

Clone this wiki locally