Abstract
As the protein databases continue to expand at an exponential rate, fed by daily uploads from multiple large scale genomic and metagenomic projects, the problem of assigning a function to each new protein has become the focus of significant research interest in recent times. Herein, we review the most recent advances in the field of automated function prediction (AFP). We begin by defining what is meant by biological “function” and the means of describing such functions using standardised machine readable ontologies. We then focus on the various function-prediction programs available, both sequence and structure based, and outline their associated strengths and weaknesses. Finally, we conclude with a brief overview of the future challenges and outstanding questions in the field, which still remain unanswered.

Similar content being viewed by others
References
- Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402 
- Ashburner M, Lewis S (2002) On ontologies for biologists: the Gene Ontology–untangling the web. Novartis Found Symp 247: 66–80; discussion 80–63, 84–90, 244–252 
- Attwood TK et al (2003) PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res 31:400–402 
- Bork P (2000) Powers and pitfalls in sequence analysis: the 70% hurdle. Genome Res 10:398–400 
- Breitkreutz BJ, Stark C, Tyers M (2003) The GRID: the general repository for interaction datasets. Genome Biol 4:R23 
- Di Gennaro JA et al (2001) Enhanced functional annotation of protein sequences via the use of structural descriptors. J Struct Biol 134:232–245 
- Eisenberg D, Marcotte EM, Xenarios I, Yeates TO (2000) Protein function in the post-genomic era. Nature 405:823–826 
- Enault F, Suhre K, Claverie JM (2005) Phydbac “Gene Function Predictor”: a gene annotation tool based on genomic context analysis. BMC Bioinformatics 6:247 
- Friedberg I (2006) Automated protein function prediction–the genomic challenge. Brief Bioinform 7:225–242 
- Galperin MY, Walker DR, Koonin EV (1998) Analogous enzymes: independent inventions in enzyme evolution. Genome Res 8:779–790 
- Gibrat JF, Madej T, Bryant SH (1996) Surprising similarities in structure comparison. Curr Opin Struct Biol 6:377–385 
- Gilks WR, Audit B, de Angelis D, Tsoka S, Ouzounis CA (2005) Percolation of annotation errors through hierarchically structured protein sequence databases. Math Biosci 193:223–234 
- Godzik A, Jambon M, Friedberg I (2007) Computational protein function prediction: are we making progress? Cell Mol Life Sci 64:2505–2511 
- Goldsmith-Fischman S, Honig B (2003) Structural genomics: computational methods for structure analysis. Protein Sci 12:1813–1821 
- Henikoff JG, Greene EA, Pietrokovski S, Henikoff S (2000) Increased coverage of protein families with the blocks database servers. Nucleic Acids Res 28:228–230 
- Hulo N et al (2008) The 20 years of PROSITE. Nucleic Acids Res 36:D245–D249 
- Jeffery CJ (2003) Moonlighting proteins: old proteins learning new tricks. Trends Genet 19:415–417 
- Jones S, Thornton JM (2004) Searching for functional sites in protein structures. Curr Opin Chem Biol 8:3–7 
- Laskowski RA, Watson JD, Thornton JM (2003) From protein structure to biochemical function? J Struct Funct Genomics 4:167–177 
- Lehne B, Schlitt T (2009) Protein-protein interaction databases: keeping up with growing interactomes. Hum Genomics 3:291–297 
- Losko S, Heumann K (2009) Semantic data integration and knowledge management to represent biological network associations. Methods Mol Biol 563:241–258 
- Rost B (2002) Enzyme function less conserved than anticipated. J Mol Biol 318:595–608 
- Rost B, Liu J, Nair R, Wrzeszczynski KO, Ofran Y (2003) Automatic prediction of protein function. Cell Mol Life Sci 60:2637–2650 
- Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol 3:88 
- Sleator RD, Shortall C, Hill C (2008) Metagenomics. Lett Appl Microbiol 47:361–366 
- Smith CL, Goldsmith CA, Eppig JT (2005) The mammalian phenotype ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol 6:R7 
- Taubig H, Buchner A, Griebsch J (2006) PAST: fast structure-based searching in the PDB. Nucleic Acids Res 34:W20–W23 
- Todd AE, Orengo CA, Thornton JM (2001) Evolution of function in protein superfamilies, from a structural perspective. J Mol Biol 307:1113–1143 
- Walker MG, Volkmuth W, Sprinzak E, Hodgson D, Klingler T (1999) Prediction of gene function by genome-scale expression analysis: prostate cancer-associated genes. Genome Res 9:1198–1203 
- Wallace AC, Laskowski RA, Thornton JM (1996) Derivation of 3D coordinate templates for searching structural databases: application to Ser-His-Asp catalytic triads in the serine proteinases and lipases. Protein Sci 5:1001–1013 
- Watson JD, Laskowski RA, Thornton JM (2005) Predicting protein function from sequence and structural data. Curr Opin Struct Biol 15:275–284 
- Ye Y, Godzik A (2004) FATCAT: a web server for flexible structure comparison and structure similarity searching. Nucleic Acids Res 32:W582–W585 
- Zhao XM, Chen L, Aihara K (2008) Protein function prediction with high-throughput data. Amino Acids 35:517–530 
Acknowledgments
The authors wish to acknowledge the financial assistance of the Faculty of Engineering and Science, the Department of Biological Sciences and the Department of Computing at Cork Institute of Technology.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Erko Stackebrandt.
Rights and permissions
About this article
Cite this article
Sleator, R.D., Walsh, P. An overview of in silico protein function prediction. Arch Microbiol 192, 151–155 (2010). https://doi.org/10.1007/s00203-010-0549-9
- Received: 
- Revised: 
- Accepted: 
- Published: 
- Issue date: 
- DOI: https://doi.org/10.1007/s00203-010-0549-9 
