Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004;5(12):R101.
doi: 10.1186/gb-2004-5-12-r101. Epub 2004 Nov 26.

GOToolBox: functional analysis of gene datasets based on Gene Ontology

Affiliations
Comparative Study

GOToolBox: functional analysis of gene datasets based on Gene Ontology

David Martin et al. Genome Biol. 2004.

Abstract

We have developed methods and tools based on the Gene Ontology (GO) resource allowing the identification of statistically over- or under-represented terms in a gene dataset; the clustering of functionally related genes within a set; and the retrieval of genes sharing annotations with a query gene. GO annotations can also be constrained to a slim hierarchy or a given level of the ontology. The source codes are available upon request, and distributed under the GPL license.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flowchart of the GOToolBox programs.
Figure 2
Figure 2
Typical output from the GO-Stats program. From the input of a group of Drosophila genes, GO-stat returns a series of GO terms associated with them (columns 1 and 3). The terms are ranked according to a P-value representing their statistical relevance (column 8). The output also lists additional useful information: column 2 describes the depth at which a given GO term is found in the GO hierarchy (note that some terms can be found at several levels simultaneously; for example, GO:0009586). Columns 4 and 6 list the numbers of genes annotated for a given term in the reference and the user sets, respectively. Columns 5 and 7 list the corresponding occurrence frequencies. Finally, the last column indicates whether a given GO term is enriched (E) or depleted (D), based on the term frequency ratio (column 7/column 5). Note that hyperlinks to GO terms definitions by the GO consortium are provided (underlined in column 3). In such an output, all GO terms associated with the input genes are listed in the table. To visualize the hierarchy between these terms, an interactive functional feature is provided with GO-Stats: by clicking on a term (radio button on the left of GO terms list), all its parent terms in the list are highlighted. Finally, when working in the program, moving the mouse pointer on the GO ID column will make all the genes associated with a given GO term appear in a box.
Figure 3
Figure 3
Typical output from the GO-Family program. In this figure, we have asked for all the genes from human, mouse and nematode that share more than 45% functional similarity with an input gene: the Drosophila gene engrailed. The output is composed of four columns: rank, name of similar gene, percentage of similarity and species from which the similar gene is issued.
Figure 4
Figure 4
Use of the GOToolBox programs in the PRODISTIN framework. (a) Flowchart of the programs used in the PRODISTIN pipeline. The 'Dataset creation' program and GO-Diet are used to generate a slimmed protein annotation file in a suitable format (tlf). This tlf file can be used as input both for PRODISTIN and for the tree-visualization program TreeDyn (not shown in the figure). In a second step, when functional classes have been generated by PRODISTIN, the GO-Stats tool allows the evaluation of the relevance of the class annotation term. (b) Histograms showing the distribution of the relevance values for the 79 classes issued from PRODISTIN (probability is described in the Features section).

References

    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. - DOI - PMC - PubMed
    1. Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS, Hester ET, Jia Y, Juvik G, Roe T, Schroeder M, et al. SGD: Saccharomyces Genome Database. Nucleic Acids Res. 1998;26:73–79. doi: 10.1093/nar/26.1.73. - DOI - PMC - PubMed
    1. The FlyBase database of the Drosophila Genome Projects and community literature. The FlyBase Consortium. Nucleic Acids Res. 1999;27:85–88. doi: 10.1093/nar/27.1.85. - DOI - PMC - PubMed
    1. Blake JA, Eppig JT, Richardson JE, Davisson MT. The Mouse Genome Database (MGD): expanding genetic and genomic resources for the laboratory mouse. The Mouse Genome Database Group. Nucleic Acids Res. 2000;28:108–111. doi: 10.1093/nar/28.1.108. - DOI - PMC - PubMed
    1. Stein L, Sternberg P, Durbin R, Thierry-Mieg J, Spieth J. WormBase: network access to the genome and biology of Caenorhabditis elegans. Nucleic Acids Res. 2001;29:82–86. doi: 10.1093/nar/29.1.82. - DOI - PMC - PubMed

Publication types