Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Mar 18;105(11):4323-8.
doi: 10.1073/pnas.0701722105. Epub 2008 Mar 7.

Network properties of genes harboring inherited disease mutations

Affiliations

Network properties of genes harboring inherited disease mutations

Igor Feldman et al. Proc Natl Acad Sci U S A. .

Abstract

By analyzing, in parallel, large literature-derived and high-throughput experimental datasets we investigate genes harboring human inherited disease mutations in the context of molecular interaction networks. Our results demonstrate that network properties influence the likelihood and phenotypic consequences of disease mutations. Genes with intermediate connectivities have the highest probability of harboring germ-line disease mutations, suggesting that disease genes tend to occupy an intermediate niche in terms of their physiological and cellular importance. Our analysis of tissue expression profiles supports this view. We show that disease mutations are less likely to occur in essential genes compared with all human genes. Disease genes display significant functional clustering in the analyzed molecular network. For about one-third of known disorders with two or more associated genes we find physical clusters of genes with the same phenotype. These clusters are likely to represent disorder-specific functional modules and suggest a framework for identifying yet-undiscovered disease genes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
The probabilities (fractions) for three gene categories: monogenic disease (green), all disease (blue), and essential (red) as a function of their network connectivity. (A) Fit of the probabilistic models to the GW network data. The curves represent the maximum-likelihood model fits to all (nonbinned) data (see Methods). Separately for monogenic and all disease genes, two models describing the data were tested: a general model using a bell β-like function and a uniform null hypothesis model. For essential genes, a rising β-like function was tested against a uniform null hypothesis. The log-likelihood differences between the models are shown next to the corresponding arrows. The individual data points representing the fractions of all disease, monogenic disease, and essential genes at each connectivity are shown by green, blue, and red dots. The data points were collected to four bins for display purposes only. For each bin, 99% confidence intervals for the posterior probabilities are represented by colored densities. The color intensity and width of the densities represent the probability values. (Inset) The data for the first bin. The error bars represent SEM. (B) A simpler bar plot of the GW network data presented in A. The fractions of all disease genes, monogenic disease genes, and essential genes are shown for different gene connectivity bins. The error bars represent SEM. (C) The same as A for the Y2H network with fit of the probabilistic models to the Y2H network data. (D) A simpler bar plot of the Y2H network data presented in C. The fractions of all disease genes, monogenic disease genes, and essential genes are shown for different gene connectivity bins. The error bars represent SEM.
Fig. 2.
Fig. 2.
Tissue expression distribution for disease and essential genes. The tissue expression index (TEI) was calculated for every gene as the fraction of the 79 tissues analyzed by Su et al. (18) in which the gene was detected as expressed. The genes with large indexes are expressed in almost all tissues, whereas the genes with small indexes have a limited expression distribution. Shown are the fractions for disease and essential genes as the function of tissue expression index. The error bars represent SEM.
Fig. 3.
Fig. 3.
Average network connectivity for disease genes with different phenotypes. Data are shown separately for the GW (Left) and Y2H (Right) networks. Light-gray columns show the average connectivity for disease genes displaying decrease/no-decrease in life expectancy. Dark-gray columns show the average connectivity of disease genes with recessive/dominant phenotypes. The error bars represent one standard error. Because the connectivity distributions for each category are not parametric, we used the Mann–Whitney test to determine significance of the difference between categories.
Fig. 4.
Fig. 4.
Physical clustering of disease genes in the GW network. (A) The red line shows the fraction of all genes located at a certain network distance or closer. The shortest path between each gene pair was used to calculate the network distance. The blue line shows the fraction of all disease genes located at a certain network distance or closer. (B) Disorder-specific clustering of disease genes. In the GW network, there are 38 clusters of physically interacting genes associated with the same disorder. To investigate the significance of the observed clustering we simulated the distribution of the physical clusters between genes associated with the same disorder. The random distribution of clusters was obtained by reshuffling network edges while preserving the total connectivity of each gene. The reshuffling was repeated 1,000 times to obtain the distribution. The results of the network randomization show that the observed clustering is highly statistically significant (z score > 7.5). The Gaussian fit to the simulated distribution is shown with a solid red line.
Fig. 5.
Fig. 5.
Genes and proteins harboring variation causing the same disease phenotype tend to form directly (physically) connected clusters. Physical-interaction gene clusters associated with 38 disease phenotypes are shown. Gene and phenotype names are indicated for each cluster. The phenotype numbers and cluster colors serve as the key for Fig. 6.
Fig. 6.
Fig. 6.
Genes and proteins harboring variation causing the same disease phenotype tend to form directly (physically) connected clusters (continued from Fig. 5). (A) A visualization of the same 38 phenotypic gene clusters as shown in Fig. 5 within the GW molecular interaction network. Genes associated with the same phenotype are indicated by the same-color semitransparent spheres. Note that several genes within the network are known to affect multiple phenotypes (network nodes with multicolor stripes). The blue cubes represent essential genes and provide additional network context. (B) A detailed view of gene connectivity distribution (compare with Fig. 1 A and B) in the GW network. Disease (red and yellow spheres), essential (blue cubes), and other (white dots) genes are placed along concentric circles that represent gene connectivity layers within the molecular network. In our GW network the connectivity covers range between 1 (the outermost circle) and 340 (the center). We can see that the intermediate connectivity range contains a higher proportion of disease genes participating in physical clustering (red spheres) than the disease genes that do not (yellow spheres). Subplots (A and C) focus exclusively on the subset of disease genes that have within-phenotype physical-interaction clustering. (C) The overlaps and physical interactions between gene clusters linked to 38 disease phenotypes. Each node represents a whole disease cluster of the same color and number as used in Fig. 5. Two nodes are connected by a red edge when there is at least one direct physical interaction that links two genes from the two distinct phenotypic clusters represented by the graph nodes. Two nodes are connected by a green edge when their corresponding disease clusters share at least one gene.

References

    1. Friedman C, Kra P, Yu H, Krauthhammer M, Rzhetsky A. Bioinformatics. 2001:S74–S82. - PubMed
    1. Rzhetsky A, Iossifov I, Koike T, Krauthhammer M, Kra P, Morris M, Yu H, Duboue PA, Weng W, Wilbur WJ, et al. J Biomed Inform. 2004;37:43–53. - PubMed
    1. Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, et al. Cell. 2005;122:957–968. - PubMed
    1. Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, et al. Nature. 2005;437:1173–1178. - PubMed
    1. Gandhi TK, Zhong J, Mathivanan S, Karthick L, Chandrika KN, Mohan SS, Sharma S, Pinkert S, Nagaraju S, Periaswamy B, et al. Nat Genet. 2006;38:285–293. - PubMed

Publication types

LinkOut - more resources