PLM-interact: extending protein language models to predict protein-protein interactions

Dan Liu¹, Francesca Young¹, Kieran D Lamb¹, Adalberto Claudio Quiros^{2

3}, Alexandrina Pancheva⁴, Crispin J Miller^{2

4}, Craig Macdonald⁵, David L Robertson⁶, Ke Yuan^{7

8

9}

Affiliations

¹ MRC-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom.
² School of Cancer Sciences, University of Glasgow, Glasgow, United Kingdom.
³ School of Computing Science, University of Glasgow, Glasgow, United Kingdom.
⁴ Cancer Research UK Scotland Institute, Glasgow, United Kingdom.
⁵ School of Computing Science, University of Glasgow, Glasgow, United Kingdom. [email protected].
⁶ MRC-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom. [email protected].
⁷ School of Cancer Sciences, University of Glasgow, Glasgow, United Kingdom. [email protected].
⁸ School of Computing Science, University of Glasgow, Glasgow, United Kingdom. [email protected].
⁹ Cancer Research UK Scotland Institute, Glasgow, United Kingdom. [email protected].

PMID: 41145424
PMCID: PMC12559430
DOI: 10.1038/s41467-025-64512-w

PLM-interact: extending protein language models to predict protein-protein interactions

Dan Liu et al. Nat Commun. 2025.

. 2025 Oct 27;16(1):9012.

doi: 10.1038/s41467-025-64512-w.

Authors

Dan Liu¹, Francesca Young¹, Kieran D Lamb¹, Adalberto Claudio Quiros^{2

3}, Alexandrina Pancheva⁴, Crispin J Miller^{2

4}, Craig Macdonald⁵, David L Robertson⁶, Ke Yuan^{7

8

9}

Affiliations

¹ MRC-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom.
² School of Cancer Sciences, University of Glasgow, Glasgow, United Kingdom.
³ School of Computing Science, University of Glasgow, Glasgow, United Kingdom.
⁴ Cancer Research UK Scotland Institute, Glasgow, United Kingdom.
⁵ School of Computing Science, University of Glasgow, Glasgow, United Kingdom. [email protected].
⁶ MRC-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom. [email protected].
⁷ School of Cancer Sciences, University of Glasgow, Glasgow, United Kingdom. [email protected].
⁸ School of Computing Science, University of Glasgow, Glasgow, United Kingdom. [email protected].
⁹ Cancer Research UK Scotland Institute, Glasgow, United Kingdom. [email protected].

PMID: 41145424
PMCID: PMC12559430
DOI: 10.1038/s41467-025-64512-w

Abstract

Computational prediction of protein structure from amino acid sequence alone has been achieved with unprecedented accuracy, yet the prediction of protein-protein interactions remains a challenge. Here, we assess the ability of protein language models (PLMs), routinely applied to protein folding, to be retrained for protein-protein interaction prediction. Existing models that exploit PLMs use a pre-trained PLM feature set, ignoring that the proteins are physically interacting. We propose PLM-interact, which goes beyond single proteins by jointly encoding protein pairs to learn their relationships, analogous to the next-sentence prediction task from natural language processing. This approach achieves state-of-the-art performance in a widely adopted cross-species protein-protein interaction prediction benchmark: trained on human data and tested on mouse, fly, worm, E. coli and yeast. In addition, we develop a fine-tuning method for PLM-interact to detect mutation effects on interactions. Finally, we report that the model outperforms existing approaches in predicting virus-host interaction at the protein level. Our work demonstrates that large language models can be extended to learn the intricate relationships among biomolecules from their sequences alone.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

**Fig. 1. A comparison of PLM-interact to an existing protein-protein interaction (PPI) prediction architecture.**
a Typical PPI prediction models use pre-trained ‘frozen’ protein language models to extract single-protein embeddings with a trainable interaction classifier. b PLM-interact uses a protein language model with a longer context to handle a pair of protein sequences directly. Both the mask language modelling task and a binary classification task predicting interaction status are used to train the model.

**Fig. 2. The benchmarking results of PLM-interact compared with state-of-the-art protein-protein interaction (PPI) prediction models: PLM-interact achieves the highest PPI prediction performance.**
a The data size of training, validation and test protein pairs. b The taxonomic tree of the training and test species is aligned with the precision-recall curve of each model on each test species. A bar plot of AUPR values illustrates the PPI prediction benchmark. The distribution of predicted interaction probabilities of positive and negative protein pairs for each PPI model is shown in Supplementary Fig. 3. All species icons in panel (b) are created in BioRender (Liu, D., 2025; https://BioRender.com/1ezsj7q). Source data are provided as a Source Data file.

**Fig. 3. Protein-protein interaction (PPI) example for each species that is predicted correctly by PLM-interact but not by TUnA and TT3D.**
Protein-protein structures are predicted by Chai-1 and visualised with ChimeraX. The predicted interaction probabilities of PPI models range between 0 and 1. A predicted interaction probability > 0.5, is considered to be a positive PPI, while < 0.5 is a negative pair. Interacting proteins are shown from left (yellow) to right (green), respectively. **Mouse**: P97287 (Induced myeloid leukaemia cell differentiation protein Mcl-1 homologue) and P63028 (Translationally-controlled tumour protein); **Fly**: Q9W0F0 (Dynein light chain roadblock) and Q7K035 (AT23443p); **Worm**: Q21955 (Mediator of RNA polymerase II transcription subunit 15) and Q9N4F2 (Mediator of RNA polymerase II transcription subunit 19); **Yeast**: P23644 (Mitochondrial import receptor subunit TOM40) and P53507 (Mitochondrial import receptor subunit TOM7); and ***E. coli***: A0A454A7G5 (ABC transporter permease protein) and A0A454A7H5 (Possible ABC-transport protein, ATP-binding component). See Supplementary Fig. 4 for the corresponding AlphaFold3 predicted structures. The ipTMs of both Chai-1 and AlphaFold3 for each structure are shown in Supplementary Table 2. Source data are provided as a Source Data file.

**Fig. 4. Performance comparison of protein-protein interaction (PPI) models on the Bernett benchmarking dataset.**
a The x-axis shows the evaluation metrics (AUPR, Precision, Recall, AUROC and F1-score) for six PPI models: PLM-interact, TUnA, Topsy-Turvy, D-SCRIPT, PIPR and DeepPPI, the y-axis represents the corresponding metric values. b The distribution of predicted interaction probabilities of PLM-interact and TUnA for the positive and negative protein pairs, respectively. Source data are provided as a Source Data file.

**Fig. 5. Predicting mutation effects on protein-protein interactions (PPIs).**
a Diagrammatic overview of the binary mutation effect classification task. b Inference and fine-tuning of PLM-interact to predict mutation effects that increase or decrease interaction rate/strength. The log formula in this panel is the log-predicted interaction probability ratio between the mutant and canonical pairs. c Precision-Recall curves of two fine-tuned PLM-interact models and four zero-shot models (PLM-interact, TUnA, Topsy-Turvy and D-SCRIPT) on the test dataset. d The ROC curves of the fine-tuned PLM-interact models and four zero-shot models on the test dataset. Source data are provided as a Source Data file.

**Fig. 6. Demonstration of PLM-interact detecting changes in human protein-protein interactions (PPIs) associated with mutations.**
a shows an example of a mutation causing an increase in binding affinity, while (b) shows a mutation causing a decrease in binding affinity. These PPI structures are predicted using Chai-1 and visualised with ChimeraX; here, the mutated amino acids are highlighted in purple. In each panel, the log-predicted interaction probability ratio between the mutant and canonical protein pairs is shown for fine-tuned PLM-interact and zero-shot TUnA and Topsy-Turvy, respectively. The positive log ratio indicates a mutation-increasing PPI, while the negative log ratio indicates a mutation-decreasing PPI. The ipTMs of both Chai-1 and AlphaFold3 for each structure are shown in Supplementary Table 3. Interacting protein structures are shown from left (yellow) to right (green). a Residue 600 Tyrosine (Y) of P33993 (DNA replication licensing factor MCM7) is mutated to Glutamic Acid (E), increasing its interaction with P33992 (DNA replication licensing factor MCM5). b Residue 151 Asparagine (N) of Q16595 (Frataxin, mitochondrial) is mutated to Alanine (A), decreasing interaction with Q9H1K1 (Iron-sulfur cluster assembly enzyme ISCU). See Supplementary Fig. 8 for the corresponding AlphaFold3 predicted structures. Source data are provided as a Source Data file.

**Fig. 7. The benchmarking results of virus-human protein-protein interaction (PPI) models.**
a Comparison of AUPR, F1 and MCC metrics of PLM-interact against recent virus-human PPI models. b The distribution of the length of virus proteins, human proteins and virus-human protein pairs. c The virus-human PPIs shown are correctly predicted by our model, and their 3D complex structures are experimentally verified and obtained from the human-virus PPI database (**HVIDB**). From left (green) to right (yellow), these interacting protein structures are: Tumour necrosis factor receptor superfamily member 14 (Human protein: Q92956) with Envelope glycoprotein D (human herpes simplex virus 1: P57083), Ephrin-B2 (human protein: P52799) with Glycoprotein G (Nipah virus protein: Q9IH62) and Retinoblastoma-associated protein (human protein: P06400) with Large T antigen (Simian virus 40: P03070). Note: The metrics results of the other three models in panel (a) are taken from STEP paper. Source data are provided as a Source Data file.

See this image and copyright information in PMC

References

1. Berggård, T., Linse, S. & James, P. Methods for the detection and analysis of protein–protein interactions. PROTEOMICS7, 2833–2842 (2007). - PubMed
1. David, A. & Sternberg, M. J. E. The contribution of missense mutations in core and rim residues of protein–protein interfaces to human disease. J. Mol. Biol.427, 2886–2898 (2015). - PMC - PubMed
1. Vassilev, L. T. et al. In vivo activation of the p53 pathway by small-molecule antagonists of MDM2. Science303, 844–848 (2004). - PubMed
1. Kotlyar, M., Pastrello, C., Sheahan, N. & Jurisica, I. Integrated interactions database: tissue-specific view of the human and model organism interactomes. Nucleic Acid Res.44, D536–D541 (2015). - PMC - PubMed
1. Shin, W.-H., Kumazawa, K., Imai, K., Hirokawa, T. & Kihara, D. Current challenges and opportunities in designing protein–protein interaction targeted drugs. Adv. Appl. Bioinform. Chem. 13, 11–25 (2020). - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

PLM-interact: extending protein language models to predict protein-protein interactions

Affiliations

PLM-interact: extending protein language models to predict protein-protein interactions

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources