Biologists, medical doctors and students in these branches of science typically grapple with statistics, but finally many of them reach a basic level of understanding of “significance” and “p-values”. The bad news is that neither of these concepts characterizes sufficiently the reliability of our statistical decisions. This tutorial is intended for those practicing biologists and medical doctors who are interested in the limitations of simply reporting significance or p values. This text is not a comprehensive introduction to hypothesis testing, and basic level of understanding of its principles is assumed. Practical, how-to sections are written on a gray background. At the end of the tutorial, a brief summary without any theory is provided.
Download the PDF file here:
Statistics with Excel
It is often overlooked that Excel can perform most statistical
calculations ordinary biologists need. I have created an Excel
macro-enabled workbook which will do all the statistical calculations
you probably need if you are a biologist.
The capabilities of the program include:
- descriptive statistics (mean, SD, SEM, median, mode, skewness, kurtosis)
- normality tests
- calculations with the normal, binomial and Poisson distributions
- z-test, Student's t-tests, Welch test, F test
- estimation of the power of a statistical test
- estimation of the required sample size to reach a certain power
- estimating and controlling the false discovery rate (e.g. Benjamini-Hochberg and Storey methods)
- ANOVA (1-, 2- and 3-way, repeated-measures ANOVA with one factor)
- Levene's test
- non-parametric tests:Wilcoxon test, sign test, median test, Mann-Whitney test
- chi2 test of independence
- Kolmogorov-Smirnov test
- tests for populations proportions
- Kaplan-Meier logrank test
- calculation of sensitivity, specificity, negative and positive predictive values
- linear and polynomial regression with p value estimations, linear regression on ranks (Spearman)
- Deming linear regression (when observations of both the X and Y variables are associated with error)
- general purpose fitting
Download the Excel workbook here:
The workbook requires Excel 2010 or above and the Solver Add-in installed.
n-way ANOVA from summary statistics in Matlab
The Matlab program anovanFromSumStat can perform one-way, two-way, ... n-way ANOVA on the main and interaction effects when only summary statistics (mean, SD and size of each group) is available.
The program runs in four different modes depending on the first argument:
- anovaArray=anovanFromSumStat('gen'): it will generate the array containing the means, SDs and size of each group.
- anovaArray=anovanFromSumStat('regen',anovaArray):it will modify the anovaArray created using the 'gen' option.
- varargout=anovanFromSumStat('calc',anovaArray): it will perform ANOVA with the array created in the previous step.
- anovanFromSumStat('ver'): version of the program is displayed.
Help is available when typing 'help anovanFromSumStat' at the Matlab command prompt.
Download the Matlab P-file here:
Estimation of the false discovery rate
Correction for false discoveries in multiple comparisons
Download the Matlab M-file here:
Determine the power of a two-sample t-test
The power of a statistical test is the probability that the test will lead to the rejection of the null hypothesis given it is indeed false. The power can be calculated if the effect size is known. More description about the program and the principles it is based on is available in this tutorial:
Download the Matlab M-file here:
Determine the required sample size to reach a certain power in a two-sample t-test
It can often be estimated how large an effect is expected in an investigation. In order for this effect to be detectable in a statistical test, a certain minimum sample size is required, which is determined by this Matlab program. More description about the program and the principles it is based on is available in this tutorial: