Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Aug 5;100(16):9440-5.
doi: 10.1073/pnas.1530509100. Epub 2003 Jul 25.

Statistical significance for genomewide studies

Affiliations

Statistical significance for genomewide studies

John D Storey et al. Proc Natl Acad Sci U S A. .

Abstract

With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features in a genomewide data set are tested against some null hypothesis, where a number of features are expected to be significant. Here we propose an approach to measuring statistical significance in these genomewide studies based on the concept of the false discovery rate. This approach offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted. In doing so, a measure of statistical significance called the q value is associated with each tested feature. The q value is similar to the well known p value, except it is a measure of significance in terms of the false discovery rate rather than the false positive rate. Our approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
A density histogram of the 3,170 p values from the Hedenfalk et al. (14) data. The dashed line is the density histogram we would expect if all genes were null (not differentially expressed). The dotted line is at the height of our estimate of the proportion of null p values.
Fig. 2.
Fig. 2.
Results from the Hedenfalk et al. (14) data. (a) The q values of the genes versus their respective t statistics. (b) The q values versus their respective p values. (c) The number of genes occurring on the list up through each q value versus the respective q value. (d) The expected number of false positive genes versus the total number of significant genes given by the q values.
Fig. 3.
Fig. 3.
The formula image versus λ for the data of Hedenfalk et al. (14). The solid line is a natural cubic spline fit to these points to estimate formula image.

References

    1. Morton, N. E. (1955) Am. J. Hum. Gen. 7, 277–318. - PMC - PubMed
    1. Lander, E. S. & Kruglyak, L. (1995) Nat. Genet. 11, 241–247. - PubMed
    1. Storey, J. D. (2003) Ann. Stat., in press.
    1. Storey, J. D. (2002) J. R. Stat. Soc. B 64, 479–498.
    1. Benjamini, Y. & Hochberg, Y. (1995) J. R. Stat. Soc. B 85, 289–300.