Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Jan 13:4:2.
doi: 10.1186/1471-2105-4-2. Epub 2003 Jan 13.

An automated method for finding molecular complexes in large protein interaction networks

Affiliations

An automated method for finding molecular complexes in large protein interaction networks

Gary D Bader et al. BMC Bioinformatics. .

Abstract

Background: Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery.

Results: This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation.

Conclusion: Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Effect of Overlap Score Threshold on Number of Predicted and Matched Known Complexes for the Gavin Evaluation Figure legend: Average and maximum number of predicted and matched known complexes seen during MCODE parameter optimization (840 parameter combinations) plotted as a function of overlap score threshold. As the stringency for the closeness that a predicted complex must match a known complex is increased (increase in overlap score), fewer predicted complexes match known complexes. Note that these curves do not correspond to the best parameter set, but rather are an average of results from all tried parameter combinations.
Figure 2
Figure 2
Number of Predicted and Matched Known Complexes at Overlap Score Threshold of 0.2 Figure legend: Number of known complexes matched to MCODE predicted complexes plotted against number of MCODE predicted complexes, both with an overlap score above 0.2.
Figure 3
Figure 3
Examples of Gavin Benchmark Complexes Missed and Hit by MCODE Figure legend: Protein complexes are represented as graphs using the spoke model. Vertices represent proteins and edges represent experimentally determined interactions. Blue vertices are baits in the Gavin et al. study. A) A Cdc3 complex hand-annotated by Gavin et al. that was missed by MCODE because of a lack of connectivity information among sub-components. This complex annotation was the result of a single co-immunoprecipitation experiment. B) The Arp2/3 complex as annotated by Gavin et al. and as found by MCODE with parameters optimized to the data set. Note the five extra proteins that have minimal connectivity to main cluster. C) The protein connection map seen from the crystal structure of the Arp2/3 complex. The crystal structure is from Bos taurus (cow), but is assumed to be very similar to yeast based on very high similarity between cow and yeast Arp2/3 subunits.
Figure 4
Figure 4
Effect of Vertex Weight Percentage Parameter on Predicted Complex Size Figure legend: As the vertex weight percentage (VWP) parameter of MCODE is increased, the number of predicted complexes steadily decreases and the average and largest size of predicted complexes increases exponentially. The y-axis follows a logarithmic scale. For reference, the average and maximum size of the MIPS benchmark complexes are 6 and 81, respectively and of the Gavin benchmark complexes are 11.8 and 88, respectively.
Figure 5
Figure 5
Overlap Score Distributions of Pre HTMS and AllYeast interaction sets with MIPS Complex Benchmark Optimized MCODE Parameter Sets Figure legend: The number of MCODE predicted complexes in the pre-large scale mass spectrometry (Pre HTMS) and AllYeast protein-protein interaction sets with a given overlap score threshold compared to the MIPS benchmark complex set is shown. The majority of predicted complexes have an overlap score of zero meaning that they had no overlap with the catalogue of known MIPS protein complexes.
Figure 6
Figure 6
Sensitivity vs. Specificity Plots of MCODE Results Among Various Data Sets Figure legend: Specificity is plotted versus sensitivity of the best MCODE results at an overlap score above 0.2 against both the MIPS (Panel A) and Gavin (Panel B) complex benchmarks. Panel A shows that there are no large inherent differences among interaction data sets resulting from significantly different experimental methods (data set: sensitivity, specificity; Y2H:0.10,0.27; Benchmark:0.29,0.36; HTP Only:0.14;0.24; Pre HTMS:0.27,0.31; AllYeast:0.27,0.26; Gavin Spoke:0.10,0.38). Panel B shows that the Gavin benchmark is expectedly biased towards the Gavin interaction data set and thus should not be used as a general benchmark (data set: sensitivity, specificity; Y2H:0.03,0.10; Benchmark:0.11,0.16; HTP Only:0.24;0.33; Pre HTMS:0.10,0.13; AllYeast:0.27,0.26; Gavin Spoke:0.31,0.79).
Figure 7
Figure 7
The Second Highest Ranked MCODE Predicted Complex is Involved in RNA Processing and Modification . Figure legend: This complex incorporates the known polyadenylation factor I complex (Cft1, Cft2, Fip1, Pap1, Pfs2, Pta1, Ysh1, Yth1 and Ykl059c) and contains other proteins highly connected to this complex, some of unknown function. The fact that the unknown proteins (Yor179c and Pti1) connect more to known RNA processing/modification proteins than to other proteins in the larger data set likely indicates that these proteins function in RNA processing/modification. This complex was ranked second by MCODE from the predicted complexes in the AllYeast interaction set.
Figure 8
Figure 8
An MCODE Predicted Complex Involved in Cytokinesis Figure legend: This predicted complex incorporates the known Septin complex (Cdc3, Cdc10, Cdc11 and Cdc12) involved in cytokinesis and other cytokinesis related proteins. The Yal027w protein is of unknown function, but likely functions in cell cycle control according to this figure, possibly in cytokinesis. This complex was ranked 23rd by MCODE from the predicted complexes in the AllYeast interaction set.
Figure 9
Figure 9
Effect of Complex Score Threshold on MCODE Prediction Accuracy Figure legend: MCODE complexes equal to or greater than a specific score were compared to a benchmark comprising the combined MIPS and Gavin benchmarks. Accuracy was calculated as the number of known complexes better or equal to the threshold score divided by the total number of predicted complexes (matching and non-matching) at that threshold. A complex was deemed to match a known complex if it had an overlap score above 0.2. The number of predicted complexes that matched known complexes at each score threshold is shown as labels on the plot.
Figure 10
Figure 10
An MCODE Predicted Complex That is Too Large (Relaxed Parameters) Figure legend: An example of a predicted complex that incorporates two complexes, proteasome (left) and an RNA processing complex (right). These should probably be predicted as separate complexes as can be seen by the clear distinction of biological role annotation on one side of this layout compared to the other (purple versus blue). This figure, however, shows the large amount of overall connectivity between these two complexes. This complex was ranked fourth by MCODE from the predicted complexes in the AllYeast interaction set with slightly relaxed parameters compared to the optimized prediction.
Figure 11
Figure 11
MCODE in Directed Mode Figure legend: MCODE was used in directed mode to further study the complex in Figure 10 by using seed vertices from high density regions of the two parts of this complex. A) The result of examining the Lsm complex using MCODE parameters that are too relaxed (haircut = TRUE, fluff = FALSE, VWP = 0.05). B) The final Lsm complex using MCODE parameters of haircut = TRUE, fluff = FALSE and VWP = 0 seeded with Lsm4. C) The final 26S proteasome complex seeded with Rpt1 using MCODE parameters haircut = TRUE, fluff = TRUE and VWP = 0.2. Visible here are two regions of density in this complex corresponding to the 20S proteolytic subunit (left side – mainly Pre proteins) and the 19S regulatory subunit (right side – mainly Rpt and Rpn proteins).
Figure 12
Figure 12
Examining Complex Connectivity with MCODE Figure legend: The complexes shown here are known to be nuclear localized and are involved in protein degradation (19S proteasome subunit), mRNA processing (Lsm complex and mRNA Cleavage/Polyadenylation complex), cell cycle (anaphase promoting complex) and transcription (SAGA transcriptional activation complex).

References

    1. Fields S. Proteomics. Proteomics in genomeland. Science. 2001;291:1221–1224. doi: 10.1126/science.291.5507.1221. - DOI - PubMed
    1. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. doi: 10.1038/35001009. - DOI - PubMed
    1. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A. 2001;98:4569–4574. doi: 10.1073/pnas.061034498. - DOI - PMC - PubMed
    1. Drees BL, Sundin B, Brazeau E, Caviston JP, Chen GC, Guo W, et al. A protein interaction map for cell polarity development. J Cell Biol. 2001;154:549–571. doi: 10.1083/jcb.200104057. - DOI - PMC - PubMed
    1. Fromont-Racine M, Mayes AE, Brunet-Simon A, Rain JC, Colley A, Dix I, et al. Genome-wide protein interaction screens reveal functional networks involving Sm-like proteins. Yeast. 2000;17:95–110. doi: 10.1002/1097-0061(20000630)17:2<95::AID-YEA16>3.0.CO;2-H. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances