Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Mar 20;7 Suppl 1(Suppl 1):S2.
doi: 10.1186/1471-2105-7-S1-S2.

Choosing negative examples for the prediction of protein-protein interactions

Affiliations

Choosing negative examples for the prediction of protein-protein interactions

Asa Ben-Hur et al. BMC Bioinformatics. .

Abstract

The protein-protein interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to help direct experimentalists in the search for novel interactions. This need has prompted the development of a number of methods for predicting protein-protein interactions based on various sources of data and methodologies. The common method for choosing negative examples for training a predictor of protein-protein interactions is based on annotations of cellular localization, and the observation that pairs of proteins that have different localization patterns are unlikely to interact. While this method leads to high quality sets of non-interacting proteins, we find that this choice can lead to biased estimates of prediction accuracy, because the constraints placed on the distribution of the negative examples makes the task easier. The effects of this bias are demonstrated in the context of both sequence-based and non-sequence based features used for predicting protein-protein interactions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The dependence of prediction accuracy, quantified by the area under the ROC/ROC50 curves, on the co-localization threshold used to choose negative examples. Enforcing the condition that no two proteins in the set of negative examples have a GO component similarity that is greater than a given threshold (the co-localization threshold) imposes a constraint on the distribution of negative examples. This constraint makes it easier for the classifier to distinguish between positive and negative examples, and the effect gets stronger as the co-localization threshold becomes smaller. All methods are SVM-based classifiers trained using different kernels on two interaction datasets. Results are computed using five-fold cross-validation, averaged over five drawings of negative examples. The spectrum kernel method uses pairs of k-mers as features; the motif method uses the composition of discrete sequence motifs, and the non-sequence method uses features such as co-expression as measured in microarray experiments, similarity in GO process and function annotations etc. We performed our experiment on two yeast physical interaction datasets: the BIND data is derived from the BIND database; the experiments using the non-sequence data were performed on a subset of reliable interactions that are found by multiple assays in BIND; DIP/MIPS is a dataset of reliable interactions derived from the DIP and MIPS databases.

References

    1. von Mering C, Krause R, Snel B, Cornell M, Olivier SG, Fields S, Bork P. Comparative assessment of large-scale data sets of protein-protein interactions. Nature. 2002;417:399–403. - PubMed
    1. Sprinzak E, Margalit H. Correlated sequence-signatures as markers of protein-protein interaction. Journal of Molecular Biology. 2001;311:681–692. - PubMed
    1. Deng M, Mehta S, Sun F, Chen T. Inferring domain-domain interactions from protein-protein interactions. Genome Research. 2002;12:1540–1548. - PMC - PubMed
    1. Gomez SM, Noble WS, Rzhetsky A. Learning to predict protein-protein interactions. Bioinformatics. 2003;19:1875–1881. - PubMed
    1. Wang H, Segal E, Ben-Hur A, Koller D, Brutlag DL. Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale. In: Saul LK, Weiss Y, Bottou L, editor. Advances in Neural Information Processing Systems 17. Cambridge, MA: MIT Press; 2005. pp. 1465–1472.

Publication types

LinkOut - more resources