Information assessment on predicting protein-protein interactions

Nan Lin¹, Baolin Wu, Ronald Jansen, Mark Gerstein, Hongyu Zhao

Affiliations

PMID: 15491499
PMCID: PMC529436
DOI: 10.1186/1471-2105-5-154

Comparative Study

Information assessment on predicting protein-protein interactions

Nan Lin et al. BMC Bioinformatics. 2004.

. 2004 Oct 18:5:154.

doi: 10.1186/1471-2105-5-154.

Authors

Nan Lin¹, Baolin Wu, Ronald Jansen, Mark Gerstein, Hongyu Zhao

Affiliation

¹ Department of Mathematics, Washington University in St. Louis, St. Louis, MO 63130, USA. [email protected] <[email protected]>

PMID: 15491499
PMCID: PMC529436
DOI: 10.1186/1471-2105-5-154

Abstract

Background: Identifying protein-protein interactions is fundamental for understanding the molecular machinery of the cell. Proteome-wide studies of protein-protein interactions are of significant value, but the high-throughput experimental technologies suffer from high rates of both false positive and false negative predictions. In addition to high-throughput experimental data, many diverse types of genomic data can help predict protein-protein interactions, such as mRNA expression, localization, essentiality, and functional annotation. Evaluations of the information contributions from different evidences help to establish more parsimonious models with comparable or better prediction accuracy, and to obtain biological insights of the relationships between protein-protein interactions and other genomic information.

Results: Our assessment is based on the genomic features used in a Bayesian network approach to predict protein-protein interactions genome-wide in yeast. In the special case, when one does not have any missing information about any of the features, our analysis shows that there is a larger information contribution from the functional-classification than from expression correlations or essentiality. We also show that in this case alternative models, such as logistic regression and random forest, may be more effective than Bayesian networks for predicting interactions.

Conclusions: In the restricted problem posed by the complete-information subset, we identified that the MIPS and Gene Ontology (GO) functional similarity datasets as the dominating information contributors for predicting the protein-protein interactions under the framework proposed by Jansen et al. Random forests based on the MIPS and GO information alone can give highly accurate classifications. In this particular subset of complete information, adding other genomic data does little for improving predictions. We also found that the data discretizations used in the Bayesian methods decreased classification performance.

PubMed Disclaimer

Figures

**Figure 1**
**Importance measure of genomic features from the random forest algorithm** The horizontal axis presents the importance measure whereas the vertical axis denotes the genomic features.

**Figure 2**
ROC curves of random forest, logistic regression and Bayesian networks using 7-fold cross validations

**Figure 3**
Histograms of MIPS and Gene Ontology function data for gold standard positives and negatives

**Figure 4**
Zoom-in histograms of MIPS and Gene Ontology function data for gold standard positives and negatives on the lower end

**Figure 5**
**ROC curves of random forest using different genomic feature sets** 'All' – all genomic information; 'MIPS+GO' – only MIPS and Gene Ontology function data; 'ELSE' – genomic features other than MIPS and Gene Ontology function data

See this image and copyright information in PMC

References

1. Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science. 2003;302:449–453. doi: 10.1126/science.1087361. - DOI - PubMed
1. Mewes HW, Frishman D, Güldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Münsterkötter M, Rudd S, Weil B. MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 2002;30:31–34. doi: 10.1093/nar/30.1.31. - DOI - PMC - PubMed
1. Kumar A, Agarwal S, Heyman JA, Matson S, Heidtman M, Piccirillo S, Umansky L, Drawid A, Jansen R, Liu Y, Cheung KH, Miller P, Gerstein M, Roeder GS, Snyder M. Subcellular localization of the yeast proteome. Gene Dev. 2002;16:707–719. doi: 10.1101/gad.970902. - DOI - PMC - PubMed
1. Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW. A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell. 1998;2:65–73. doi: 10.1016/S1097-2765(00)80114-8. - DOI - PubMed
1. Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, Kidd MJ, King AM, Meyer MR, Slade D, Lum PY, Stepaniants SB, Shoemaker DD, Gachotte D, Chakraburtty K, Simon J, Bard M, Friend SH. Functional discovery via a compendium of expression profiles. Cell. 2000;102:109–126. doi: 10.1016/S0092-8674(00)00015-5. - DOI - PubMed

Publication types

Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Information assessment on predicting protein-protein interactions

Affiliation

Information assessment on predicting protein-protein interactions

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases