Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Oct 15;21(20):3824-31.
doi: 10.1093/bioinformatics/bti627. Epub 2005 Aug 16.

Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap

Affiliations

Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap

Gavin A Price et al. Bioinformatics. .

Erratum in

  • Bioinformatics. 2005 Dec 1;21(23):4318

Abstract

Motivation: Protein sequence comparison methods are routinely used to infer the intricate network of evolutionary relationships found within the rapidly growing library of protein sequences, and thereby to predict the structure and function of uncharacterized proteins. In the present study, we detail an improved statistical benchmark of pairwise protein sequence comparison algorithms. We use bootstrap resampling techniques to determine standard statistical errors and to estimate the confidence of our conclusions. We show that the underlying structure within benchmark databases causes Efron's standard, non-parametric bootstrap to be biased. Consequently, the standard bootstrap underpredicts average performance when used in the context of evaluating sequence comparison methods. We have developed, as an alternative, an unbiased statistical evaluation based on the Bayesian bootstrap, a resampling method operationally similar to the standard bootstrap.

Results: We apply our analysis to the comparative study of amino acid substitution matrix families and find that using modern matrices results in a small, but statistically significant improvement in remote homology detection compared with the classic PAM and BLOSUM matrices.

Availability: The sequence sets and code for performing these analyses are available from http://compbio.berkeley.edu/.

Contact: [email protected].

PubMed Disclaimer

Publication types

LinkOut - more resources