Skip to main content

Flexible Sequence Similarity Searching with the FASTA3 Program Package

  • Protocol
Bioinformatics Methods and Protocols

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 132))

  • 3151 Accesses

  • 545 Citations

Abstract

Since the publication of the first rapid method for comparing biological sequences 15 years ago (1), DNA and protein sequence comparisons have become routine steps in biochemical characterization, from newly cloned proteins to entire genomes. As the DNA and protein sequence databases become more complete, a sequence similarity search is more likely to reveal a database sequence with statistically significant similarity, and thus inferred homology, to a query sequence. Indeed, even in the archaebacterium Methanococcus jannaschii, more than 40% of the open reading frames could be assigned a function based on significant sequence similarity to a protein of known function (2).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

') var buybox = document.querySelector("[data-id=id_"+ timestamp +"]").parentNode var buyingOptions = buybox.querySelectorAll(".buying-option") ;[].slice.call(buyingOptions).forEach(initCollapsibles) var buyboxMaxSingleColumnWidth = 480 function initCollapsibles(subscription, index) { var toggle = subscription.querySelector(".buying-option-price") subscription.classList.remove("expanded") var form = subscription.querySelector(".buying-option-form") var priceInfo = subscription.querySelector(".price-info") var buyingOption = toggle.parentElement if (toggle && form && priceInfo) { toggle.setAttribute("role", "button") toggle.setAttribute("tabindex", "0") toggle.addEventListener("click", function (event) { var expandedBuyingOptions = buybox.querySelectorAll(".buying-option.expanded") var buyboxWidth = buybox.offsetWidth ;[].slice.call(expandedBuyingOptions).forEach(function(option) { if (buyboxWidth <= buyboxMaxSingleColumnWidth && option != buyingOption) { hideBuyingOption(option) } }) var expanded = toggle.getAttribute("aria-expanded") === "true" || false toggle.setAttribute("aria-expanded", !expanded) form.hidden = expanded if (!expanded) { buyingOption.classList.add("expanded") } else { buyingOption.classList.remove("expanded") } priceInfo.hidden = expanded }, false) } } function hideBuyingOption(buyingOption) { var toggle = buyingOption.querySelector(".buying-option-price") var form = buyingOption.querySelector(".buying-option-form") var priceInfo = buyingOption.querySelector(".price-info") toggle.setAttribute("aria-expanded", false) form.hidden = true buyingOption.classList.remove("expanded") priceInfo.hidden = true } function initKeyControls() { document.addEventListener("keydown", function (event) { if (document.activeElement.classList.contains("buying-option-price") && (event.code === "Space" || event.code === "Enter")) { if (document.activeElement) { event.preventDefault() document.activeElement.click() } } }, false) } function initialStateOpen() { var buyboxWidth = buybox.offsetWidth ;[].slice.call(buybox.querySelectorAll(".buying-option")).forEach(function (option, index) { var toggle = option.querySelector(".buying-option-price") var form = option.querySelector(".buying-option-form") var priceInfo = option.querySelector(".price-info") if (buyboxWidth > buyboxMaxSingleColumnWidth) { toggle.click() } else { if (index === 0) { toggle.click() } else { toggle.setAttribute("aria-expanded", "false") form.hidden = "hidden" priceInfo.hidden = "hidden" } } }) } initialStateOpen() if (window.buyboxInitialised) return window.buyboxInitialised = true initKeyControls() })()

Institutional subscriptions

Similar content being viewed by others

References

  1. Wilbur, W. J. and Lipman, D. J. (1983) Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci. USA 80, 726–730.

    Article  PubMed  CAS  Google Scholar 

  2. Bult, C. J., White, O., Olsen, G. J., Zhou, L., Fleischmann, R. D., Sulton, G. G., Blake, J. A., Fitzgerald, L. M., Clayton, R. A., Gocayne, J. D., Kerlavage, A. R., Dougherty, B. A., Tomb, J.-F., Adams, M. D., Reisch, C. I., Overbeek, R., Kirkness, E. F., Weinstock, K. G., Merrick, J. M., Glodek, A., Scott, J. L., Geoghagen, N. S. M., Weidman, J. F., Fuhrmann, J. L., Nguyen, D., Utterback, T. R., Kelley, J. M., Peterson, J. D., Sadow, P. W., Hanna, M. C., Cotton, M. D., Roberts, K. M., Hurst, M. A., Kaine, B. P., Borodovsky, M., Klenk, H.-P., Fraser, C. M., Smith, H. O., Woese, C. R., and Venter, J. C. (1996) Complete genome sequence of the methanogenic archaeon, methanococcus jannaschii. Science 273, 1058–1073.

    CAS  Google Scholar 

  3. Altschul, S. F., Boguski, M. S., Gish, W., and Wootton, J. C. (1994) Issues in searching molecular sequence databases. Nat. Genet. 6, 119–129.

    Article  PubMed  CAS  Google Scholar 

  4. Pearson, W. R. (1996) Effective protein sequence comparison. Meth. Enzymol. 266, 227–258.

    Article  PubMed  CAS  Google Scholar 

  5. Pearson, W. R. (1997) Identifying distantly related protein sequences. Comput. Appl. Biosci. (now Bioinformatics) 13, 325–332.

    CAS  Google Scholar 

  6. Pearson, W. R. (1998) Empirical statistical estimates for sequence similarity searches. J. Mol. Biol. 276, 71–84.

    Article  PubMed  CAS  Google Scholar 

  7. Pearson, W. R. and Lipman, D. J. (1988) Improved tools for biological sequence comparison Proc. Natl. Acad. Sci. USA 85, 2444–2448.

    Article  CAS  Google Scholar 

  8. Lipman, D. J. and Pearson, W. R. (1985) Rapid and sensitive protein similarity searches. Science 227, 1435–1441.

    Article  PubMed  CAS  Google Scholar 

  9. Bleasby, A. J., Akrigg, D., and Attwood, T. K. (1994) Owl-a non-redundant composite protein sequence database. Nucleic Acids Res. 22, 3574–3577.

    PubMed  CAS  Google Scholar 

  10. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) A basic local alignment search tool. J. Mol. Biol. 215, 403–410.

    PubMed  CAS  Google Scholar 

  11. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.

    Article  PubMed  CAS  Google Scholar 

  12. Arratia, R., Gordon, L., and Waterman, M. S. (1986) An extreme value theory for sequence matching. Ann. Stat. 14, 971–993.

    Article  Google Scholar 

  13. Karlin, S. and Altschul, S. F. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268.

    Article  PubMed  CAS  Google Scholar 

  14. Wootton, J. C. and Federhen, S. (1993) Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17, 149–163.

    Article  CAS  Google Scholar 

  15. Pearson, W. R., Wood, T., Zhang, Z., and Miller, W. (1997) Comparison of DNA sequences with protein sequences. Genomics 46, 24–36.

    Article  PubMed  CAS  Google Scholar 

  16. Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitutions matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10,915–10,919.

    Article  PubMed  CAS  Google Scholar 

  17. Schwartz, R. M. and Dayhoff, M. (1978) Matrices for detecting distant relationships, in Atlas of Protein Sequence and Structure, vol. 5, suppl. 3 (Dayhoff, M., ed.) National Biomedical Research Foundation, Silver Spring, MD, pp. 353–358.

    Google Scholar 

  18. Altschul, S. F. (1991) Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 219, 555–565.

    Article  PubMed  CAS  Google Scholar 

  19. Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) The rapid generation of mutation data matrices from protein sequences. Comp. Appl. Biosci. (now Bioinformatics) 8, 275–282.

    CAS  Google Scholar 

  20. Pearson, W. R. (1995) Comparison of methods for searching protein sequence databases. Protein Sci. 4, 1145–1160.

    Article  PubMed  CAS  Google Scholar 

  21. Bairoch, A. (1991) PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 19 (suppl) 2241–2245.

    Article  PubMed  CAS  Google Scholar 

  22. Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.

    Article  PubMed  CAS  Google Scholar 

  23. Huang, X. and Miller, W. (1991) A time-efficient, linear-space local similarity algorithm. Adv. Appl. Math. 12, 337–357.

    Article  Google Scholar 

  24. Waterman, M. S. and Eggert, M. (1987) A new algorithm for best subsequences alignment with application to tRNA-rRNA comparisons. J. Mol. Biol. 197, 723–728.

    Article  PubMed  CAS  Google Scholar 

  25. Myers, E. W. and Miller, W. (1988) Optimal alignments in linear space. Comp. Appl. Biosci. 4, 11–17.

    PubMed  CAS  Google Scholar 

  26. Kyte, J. and Doolittle, R. F. (1982) A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132.

    Article  PubMed  CAS  Google Scholar 

  27. Barker, W. C., Garavelli, J. S., Haft, D. H., Hunt, L. T., Marzec, C. R., Orcutt, B. C., Srinivasarao, G. Y., Yeh, L. S. L., Ledley, R. S., Mewes, H. W., Pfeiffer, F., and Tsugita, A. (1998) The PIR-International protein sequence database. Nucleic Acids Res. 26, 27–32.

    Article  PubMed  CAS  Google Scholar 

  28. Chao, K.-M., Pearson, W. R., and Miller, W. (1992) Aligning two sequences within a specified diagonal band. Comp. Appl. Biosci. (now Bioinformatics) 8, 481–487.

    CAS  Google Scholar 

  29. Altschul, S. F. and Gish, W. (1996) Local alignment statistics. Meth. Enzymol. 266, 460–480.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Humana Press Inc., Totowa, NJ

About this protocol

Cite this protocol

Pearson, W.R. (2000). Flexible Sequence Similarity Searching with the FASTA3 Program Package. In: Misener, S., Krawetz, S.A. (eds) Bioinformatics Methods and Protocols. Methods in Molecular Biology™, vol 132. Humana Press, Totowa, NJ. https://doi.org/10.1385/1-59259-192-2:185

Download citation

  • DOI: https://doi.org/10.1385/1-59259-192-2:185

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-0-89603-732-8

  • Online ISBN: 978-1-59259-192-3

  • eBook Packages: Springer Protocols

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics