Abstract
Microbial communities and their associated bioactive compounds1,2,3 are often disrupted in conditions such as the inflammatory bowel diseases (IBD)4. However, even in well-characterized environments (for example, the human gastrointestinal tract), more than one-third of microbial proteins are uncharacterized and often expected to be bioactive5,6,7. Here we systematically identified more than 340,000 protein families as potentially bioactive with respect to gut inflammation during IBD, about half of which have not to our knowledge been functionally characterized previously on the basis of homology or experiment. To validate prioritized microbial proteins, we used a combination of metagenomics, metatranscriptomics and metaproteomics to provide evidence of bioactivity for a subset of proteins that are involved in host and microbial cellâcell communication in the microbiome; for example, proteins associated with adherence or invasion processes, and extracellular von Willebrand-like factors. Predictions from high-throughput data were validated using targeted experiments that revealed the differential immunogenicity of prioritized Enterobacteriaceae pilins and the contribution of homologues of von Willebrand factors to the formation of Bacteroides biofilms in a manner dependent on mucin levels. This methodology, which we term MetaWIBELE (workflow to identify novel bioactive elements in the microbiome), is generalizable to other environmental communities and human phenotypes. The prioritized results provide thousands of candidate microbial proteins that are likely to interact with the host immune system in IBD, thus expanding our understanding of potentially bioactive gene products in chronic disease states and offering a rational compendium of possible therapeutic compounds and targets.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 /Â 30Â days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Data availability
Associated data generated during this study are included in the published Article and its Supplementary Tables. All assembled metagenomic contigs, ORFs, gene families, protein families, functional profiles, taxonomic profiles and prioritized profiles of protein families related with this study are available at http://huttenhower.sph.harvard.edu/metawibele. The raw data for the HMP2 metagenomes, metatranscriptomes and metaproteomes were obtained from the IBDMDB website (https://ibdmdb.org, NCBI BioProject PRJNA398089). Sequence data for the Red Sea metagenomes were obtained from SRA BioProject PRJNA289734. The following public databases were used: UniProt (https://www.uniprot.org), UniRef90 (https://www.uniprot.org/uniref), Pfam (https://pfam.xfam.org), DOMINE (https://manticore.niehs.nih.gov/cgi-bin/Domine), the Expression Atlas (https://www.ebi.ac.uk/gxa), SIFTS (https://www.ebi.ac.uk/pdbe/docs/sifts), the Database of Essential Genes (http://essentialgene.org) and the PDB (https://www.rcsb.org).
Code availability
The open-source MetaWIBELE software is available through http://huttenhower.sph.harvard.edu/metawibele. Manuals and online tutorials describing MetaWIBELE are available at https://github.com/biobakery/metawibele. User support is provided through the bioBakery help forum (https://forum.biobakery.org). Additional software details are provided in the Methods.
References
Cohen, L. J. et al. Commensal bacteria make GPCR ligands that mimic human signalling molecules. Nature 549, 48â53 (2017).
Guo, C. J. et al. Discovery of reactive microbiota-derived metabolites that inhibit host proteases. Cell 168, 517â526 (2017).
Bhattarai, Y. et al. Gut microbiota-produced tryptamine activates an epithelial G-protein-coupled receptor to increase colonic secretion. Cell Host Microbe 23, 775â785 (2018).
Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655â662 (2019).
Galperin, M. Y. & Koonin, E. V. âConserved hypotheticalâ proteins: prioritization of targets for experimental study. Nucleic Acids Res. 32, 5452â5463 (2004).
Galperin, M. Y. & Koonin, E. V. From complete genome sequence to âcompleteâ understanding? Trends Biotechnol. 28, 398â406 (2010).
Joice, R., Yasuda, K., Shafquat, A., Morgan, X. C. & Huttenhower, C. Determining microbial products and identifying molecular targets in the human microbiome. Cell Metab. 20, 731â741 (2014).
Buffie, C. G. et al. Precision microbiome reconstitution restores bile acid mediated resistance to Clostridium difficile. Nature 517, 205â208 (2015).
Zipperer, A. et al. Human commensals producing a novel antibiotic impair pathogen colonization. Nature 535, 511â516 (2016).
Morgan, X. C. et al. Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome Biol. 13, R79 (2012).
Suzek, B. E., Huang, H., McGarvey, P., Mazumder, R. & Wu, C. H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282â1288 (2007).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25â29 (2000).
UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204âD212 (2015).
Konstantinidis, K. T. & Tiedje, J. M. Towards a genome-based taxonomy for prokaryotes. J. Bacteriol. 187, 6258â6264 (2005).
Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36, 996â1004 (2018).
Plaza Oñate, F. et al. MSPminer: abundance-based reconstitution of microbial pan-genomes from shotgun metagenomic data. Bioinformatics 35, 1544â1552 (2019).
Li, J. et al. An integrated catalog of reference genes in the human gut microbiome. Nat. Biotechnol. 32, 834â841 (2014).
Jandhyala, S. M. et al. Role of the normal gut microbiota. World J. Gastroenterol. 21, 8787â8803 (2015).
Zhang, R., Ou, H. Y. & Zhang, C. T. DEG: a database of essential genes. Nucleic Acids Res. 32, D271âD272 (2004).
Sokol, H. et al. Faecalibacterium prausnitzii is an anti-inflammatory commensal bacterium identified by gut microbiota analysis of Crohn disease patients. Proc. Natl Acad. Sci. USA 105, 16731â16736 (2008).
Lopez-Siles, M., Duncan, S. H., Garcia-Gil, L. J. & Martinez-Medina, M. Faecalibacterium prausnitzii: from microbiology to diagnostics and prognostics. ISME J. 11, 841â852 (2017).
Schirmer, M., Garner, A., Vlamakis, H. & Xavier, R. J. Microbial genes and pathways in inflammatory bowel disease. Nat. Rev. Microbiol. 17, 497â511 (2019).
Lewis, J. D. et al. Inflammation, antibiotics, and diet as environmental stressors of the gut microbiome in pediatric Crohnâs disease. Cell Host Microbe 18, 489â500 (2015).
Franzosa, E. A. et al. Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nature Microbiol. 4, 293â305 (2019).
Hall, A. B. et al. A novel Ruminococcus gnavus clade enriched in inflammatory bowel disease patients. Genome Med. 9, 103 (2017).
Hughes, E. R. et al. Microbial respiration and formate oxidation as metabolic signatures of inflammation-associated dysbiosis. Cell Host Microbe 21, 208â219 (2017).
Högbom, M. & Ihalin, R. Functional and structural characteristics of bacterial proteins that bind host cytokines. Virulence 8, 1592â1601 (2017).
Wells, T. J., Tree, J. J., Ulett, G. C. & Schembri, M. A. Autotransporter proteins: novel targets at the bacterial cell surface. FEMS Microbiol. Lett. 274, 163â172 (2007).
Pizarro-Cerdá, J. & Cossart, P. Bacterial adhesion and entry into host cells. Cell 124, 715â727 (2006).
Palmela, C. et al. Adherent-invasive Escherichia coli in inflammatory bowel disease. Gut 67, 574â587 (2018).
Xu, Q. et al. A Distinct type of pilus from the human microbiome. Cell 165, 690â703 (2016).
Zhang, Y., Thompson, K. N., Huttenhower, C. & Franzosa, E. A. Statistical approaches for differential expression analysis in metatranscriptomics. Bioinformatics 37, i34âi41 (2021).
Starks, A. M., Froehlich, B. J., Jones, T. N. & Scott, J. R. Assembly of CS1 pili: the role of specific residues of the major pilin, CooA. J. Bacteriol. 188, 231â239 (2006).
Galkin, V. E. et al. The structure of the CS1 pilus of enterotoxigenic Escherichia coli reveals structural polymorphism. J. Bacteriol. 195, 1360â1370 (2013).
Vatanen, T. et al. Variation in microbiome LPS immunogenicity contributes to autoimmunity in humans. Cell 165, 842â853 (2016).
Dalbey, R. E. & Kuhn, A. Protein traffic in Gram-negative bacteriaâhow exported and secreted proteins find their way. FEMS Microbiol. Rev. 36, 1023â1045 (2012).
Costa, T. R. et al. Secretion systems in Gram-negative bacteria: structural and mechanistic insights. Nat. Rev. Microbiol. 13, 343â359 (2015).
Shipman, J. A., Berleman, J. E. & Salyers, A. A. Characterization of four outer membrane proteins involved in binding starch to the cell surface of Bacteroides thetaiotaomicron. J. Bacteriol. 182, 5365â5372 (2000).
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235â242 (2000).
Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N. & Sternberg, M. J. The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protoc. 10, 845â858 (2015).
Dong, R., Pan, S., Peng, Z., Zhang, Y. & Yang, J. mTM-align: a server for fast protein structure database search and multiple protein structure alignment. Nucleic Acids Res. 46, W380âw386 (2018).
Treuner-Lange, A. et al. PilY1 and minor pilins form a complex priming the type IVa pilus in Myxococcus xanthus. Nat. Commun. 11, 5054 (2020).
Co, J. Y. et al. Mucins trigger dispersal of Pseudomonas aeruginosa biofilms. NPJ Biofilms Microbiomes 4, 23 (2018).
Medema, M. H. et al. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 39, W339âW346 (2011).
Haroon, M. F., Thompson, L. R., Parks, D. H., Hugenholtz, P. & Stingl, U. A catalogue of 136 microbial draft genomes from Red Sea metagenomes. Sci. Data 3, 160050 (2016).
Acknowledgements
This work has been supported in part by a research agreement with Takeda Pharmaceuticals (C.H.) and by NIH NIDDK grants R24DK110499 (C.H., W.S.G. and R.J.X.), P30DK043351 (R.J.X.), the Center for Microbiome Informatics and Therapeutics (R.J.X.), NIH AT009708 (R.J.X.), and DK 127171 (R.J.X.). We especially appreciate the participants in the HMP2 Inflammatory Bowel Disease Multi-omics Database who made this study possible. The computations in this paper were run in part on the FASRC Cannon cluster supported by the FAS Division of Science Research Computing Group at Harvard University.
Author information
Authors and Affiliations
Contributions
Y.Z., A.K., C.H. and E.A.F. designed the research. Y.Z., A.B., A. Subramanian, A.R., A. Shafquat and E.A.F. performed computational analysis. S.B. performed the experimental validation of pilins. G.P. performed the experiments for validating VWF-containing proteins. Y.Z. and L.J.M. implemented the software. Y.Z. and E.K.A. wrote the tutorial document and tested the software. C.A. and D.R.P. participated in generating the assembly data. Y.Z. and C.H. wrote the manuscript with feedback from the other authors. K.N.T., Y.W., S.M.K., A.P., E.A.F. and all other authors participated in editing the manuscript. R.J.X., H.V., W.S.G. and A.K. participated in interpretation of the primary findings. C.H. and E.A.F. supervised the research. All authors approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
C.H. is on the scientific advisory board of Seres Therapeutics and Empress Therapeutics. W.S.G. is on the scientific advisory board of Freya Biosciences, Senda Biosciences, Artizan Biosciences and Tenza. The laboratory of W.S.G. receives funding from Merck. R.J.X. is a member of the scientific advisory board of Nestle and Senda Biosciences. A.K. presents employment by Takeda that may gain or lose financially through this publication.
Peer review
Peer review information
Nature thanks Robert Quinn, Paul Wilmes and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisherâs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Overview of MetaWIBELE workflow and analysis summary in the HMP2 dataset.
a, MetaWIBELE identifies novel potentially bioactive gene products from microbial communities. MetaWIBELE prioritizes and partially annotates putatively bioactive gene products from shotgun metagenomes, using a combination of primary and secondary sequence properties, ecological distributions, and host or environmental phenotypes. The process begins with single-sample metagenomic assemblies, from which open reading frames are called and clustered into gene families. These are quantified, annotated (MetaWIBELE-characterize), and ranked by likely bioactivity (MetaWIBELE-prioritize). This results in proteins from across a set of communities with potential bioactivity in their environments of origin, annotated with the quantitative sources of this bioactivity evidence and per-family information such as abundance, taxonomic origin, and (when known) putative molecular roles. b, Quantitative characteristics of MetaWIBELE applied to the 1,595 metagenomes in the HMP2. Overall strategy used by MetaWIBELE for protein family construction, annotation, and prioritization, and the associated input data and results when applied to datasets used for identifying microbial gene products with potential bioactivity in HMP2. SC: Strong homology to known characterized proteins, SU: Strong homology to known uncharacterized proteins, RH: Remote homology to known proteins, NH: No homology to known proteins. TM: transmembrane. DDI: domain-domain interaction.
Extended Data Fig. 2 Uncharacterized protein families have comparable abundance distribution and sequence composition to known proteins.
a, Nominally characterized and uncharacterized protein families were distinguished with homology-based search against UniRef90 (release 2019_01). We defined strong homology following the UniRef90 criterion of â¥90% identity and â¥80% coverage, remote homology as identity from 25% to 90% and coverage from 25% to 80%, and non-homologous proteins as those with <25% identity or <25% coverage or no hit to UniRef90 proteins. Here, we use âuncharacterized known proteinsâ to refer to UniRef90 proteins that do not have any Gene Ontology annotations in UniProt (release 2019_01). Distribution of prevalences and abundances of protein families across the four categories of protein families. b, The fractions of novel proteins (proteins with remote homology or without homology to known proteins) are comparable to known proteins across samples. c, Bray-Curtis dissimilarities over protein family profiles between samples from different participants, samples from the same participant over time, and technical replicates. Variability among novel proteins was more extreme than among known proteins, but less extreme than among known proteins with rare abundance (bottom 50%). Box plot boxes indicate quartiles and whiskers show inner fences. d, Uncharacterized proteins with comparable abundance to known proteins fit a neutral model of microbiome assembly (Methods). âUnclassified taxonâ indicates a group of genes which lack taxonomic information but can be binned into the same MSP based on co-abundance information. eâg, Uncharacterized proteins showed similar sequence composition with known proteins. Characterized and uncharacterized proteins had similar distributions of lengths of assembled contigs (e), protein lengths (f) and GC content (g).
Extended Data Fig. 3 An integrated annotation approach characterizes millions of gut microbial protein families.
a, We enumerated the degree to which annotations based on local homology or secondary structure could be assigned by MetaWIBELE. âInterPro signaturesâ represents protein signatures in the InterPro except Pfam domains. âInteractionâ means domain-domain interactions as predicted by DOMINE. âOthersâ includes other types of protein subcellular localization. (e.g., cytoplasmic, membrane, periplasmic, etc.). âUnknown functionâ represents proteins without any putative biochemical annotations. b, Most of the functional annotations assigned by MetaWIBELE were consistent with those in UniProt when evaluated on characterized proteins. âUniProt_uniqueâ annotations are quite rare, indicating the good sensitivity of MetaWIBELE. Meanwhile, âMetaWIBELE_uniqueâ annotations are also in the minority, which could be a ceiling on false positives, but there are likely to be many false negatives from UniProt as well. c, Each row represents one type of annotation. Each column indicates the number of protein families with corresponding annotation types (indicated with black point) intersection. The âUnclassified taxonomyâ category represents protein families without taxonomy information. âMSPsâ (metagenomic species pangenomes) are built by binning co-abundant genes across metagenomic samples. âDomainsâ are domain-based annotations including Pfam domains and domain-domain interactions. âHost facingâ indicates annotations which are likely to be involved in host-microbial interactions (e.g., signal peptides, transmembrane). âInterPro signaturesâ, âOthersâ and âUnknown functionâ are defined in a.
Extended Data Fig. 4 Novel protein families can be taxonomically annotated and greatly expand pangenomes of common gut taxa.
a, Schematic of MetaWIBELEâs guilt-by-association approach for per-protein-family taxonomic annotation leveraging co-abundance profiles (MSPs). If reference sequence annotations are consistent within a group of co-varying proteins, their most-specific shared taxonomy can be transferred to other sequences within the family. b, We validated this novel taxonomic annotation method on a 20% holdout set of known proteins. c, To optimize the parameters, we tested different cut-offs for the fraction of protein families between the most and second-most dominant taxon within MSP using the holdout set in b. Stringent cut-offs (i.e., requiring more consistently classified taxa) reduced the power of taxonomic assignment for more specific levels (e.g., species or genus) but controlled false positives. Lenient cut-offs (i.e., requiring less consistently classified taxa) introduced more spurious assignments with good sensitivity to the assignment of species or genus. This sensitivity-specificity trade-off is best-balanced at our default cut-off value of 0.5. d, Comparison of taxonomic annotations by homology-based and guilt-by-association approaches. e, The top 25 genera with the highest number of newly annotated proteins (Supplementary Table 3). The first row indicates the number of genomes in RefSeq per genus. The second row indicates the mean relative abundance of known (i.e., SC and SU) and novel proteins (RH and NH), in which red dots represent the mean of known proteins and blue dots represent the mean of novel proteins. f, Uncharacterized proteins expanded common gut taxa. Each clade represents one genus. Circle bars show relative abundance of different categories of protein families. g, Similar representative genera with dominant abundance were identified in HMP2 and MetaHIT. The top 50 genera (with highest mean abundance) were selected for plotting. Box plot boxes indicate quartiles and whiskers show inner fences.
Extended Data Fig. 5 Essential genes are assigned higher priority scores using MetaWIBELEâs unsupervised approach.
a, Prevalence and abundance of 1.6M protein families from the HMP2 metagenomes. Essential proteins (based on DEG homology, see full list from Supplementary Table 4) were enriched in proteins prioritized by the harmonic mean of these values. b, When assumed to be true positives (i.e., âimportantâ proteins), essential proteins were notably well-predicted by ecological properties. This was true across a range of beta parameter settings: i.e., the relative weight for prevalence versus abundance in the calculation of a unified priority score (higher beta implies more weight assigned to prevalence). câe, Distributions of prevalence (c), abundance (d) and priority scores (e) are plotted for all proteins and essential genes, respectively. Box plot boxes indicate quartiles, and whiskers show inner fences.
Extended Data Fig. 6 Protein families associated with severe IBD phenotypes.
a, A total of 348,973 protein families were prioritized as potentially bioactive in IBD, with all four categories of homology-based characterization dominated by proteins with decreased differential abundance (DA) during dysbiosis. The integrated priority score is a meta-rank combining both phenotypic significance and effect size of DA with ecological properties (abundance and prevalence). DA p-values are from modified linear models (Methods), and effect sizes are differences between means log-scaled abundances among phenotypes. Positive values indicate more abundance in âcasesâ (i.e., the dysbiotic state of Crohnâs disease (CD) or ulcerative colitis (UC)). b, Functional annotations assigned to DA prioritized protein families by global homology (top left) or local structural properties (Methods). c, More protein families were depleted in dysbiosis samples than enriched in dysbiosis samples. The largest source of DA families (nâ=â1,595 from 130 participants; linear mixed-effects model: adjusted p-value with BenjaminiâHochberg FDR correctionâ<â0.05) corresponded with the differences between dysbiotic and non-dysbiotic samples from individuals with CD, whereas those with UC were less well separated. The effect size was computed as the difference of mean values in the dysbiotic condition compared to the non-dysbiotic condition within each IBD phenotype at the log-transformed scale. d, Highly prioritized protein families of Ruminococcus gnavus were grouped into multiple MSPs. Most such R. gnavus proteins were enriched in the dysbiotic states of IBD and fell into msp_127 and msp_306, whereas a few proteins were depleted in dysbiosis and failed to cluster as MSP members (full list from Supplementary Table 12). e, Highly prioritized protein families of Faecalibacterium prausnitzii were grouped into multiple MSPs and tended to be depleted in dysbiosis (full list from Supplementary Table 13).
Extended Data Fig. 7 Potentially bioactive protein families are validated by metaproteomics.
a, Both known and novel proteins showed metaproteomics (MPX) evidence, though only a small fraction of protein families were detected owing to the relatively low coverage of all metaproteomics in the HMP2. âMPX-prevalentâ refers to a set with relatively higher prevalence in MPX samples for more consistent detection, in which we thresholded the mean value of the prevalence of proteins in MPX samples (full list from Supplementary Table 7). b, Among the MPX validated proteins, the fraction of prioritized novel proteins (e.g., RH and NH) was comparable to the known protein (e.g., SC and SU). c, The prioritized proteins were significantly enriched in the set of proteins with MPX evidence (two-tailed Fisherâs exact test; adjusted p-value with BenjaminiâHochberg FDR correction < 2.2eâ16 for SC and SU, 3.3e-258 for RH, 1.4e-21 for NH). d, e, Protein families profiled by MPX had significantly higher priority scores for both known and novel proteins (GSEA method; FDR-adjusted Pâ=â0.0012 for âMPX-prevalenceâ regardless of characterization categories in d and stratified in SC, SU, RH in e, and FDR-adjusted P = 0.0051 for RH category in e; Supplementary Table 8). Prioritization distribution of âMPX-prevalentâ proteins with different characterization levels are shown in e.
Extended Data Fig. 8 Supporting evidence for the bioactivity of Enterobacteriaceae pilus components and VWF homologues.
a, Effect of bacterial co-culture on pilin gene expression in bacterial strains. Expression of a subset of highly prioritized bacterial pilin genes by RTâqPCR is normalized to rpoA. b, Expression of other cytokines in HCT-15 cells after co-culture with pilin-encoding strains (Group 1 and 3) versus non-pilin strains (Group 0) (nâ=â3 independent experiments for each strain; unpaired two-tailed Studentâs t test: *pâ<â0.05, **pâ<â0.01, ***pâ<â0.001, ns, not significant; error bars: SEM). mRNA levels are normalized to a GAPDH reference and mean ± SEM are shown (full list from Supplementary Table 24). âUntreatedâ group represents baseline expression in HCT-15 without bacterial co-culture. c, Predicted structure of VWF-containing families from Oscillibacter. 3TXA_A from the PDB was identified as the closest homologue to Cluster_148958 (the representative of this group), based on structural rather than sequence similarity (Methods). The comparison of protein structures between Cluster_148958 (regions modelled at >90% accuracy by Phyre2) and chain A of 3TXA) is shown.
Extended Data Fig. 9 Quantitative evaluation of MetaWIBELE.
a, b, BGC genes are enriched among proteins prioritized by MetaWIBELEâs unsupervised and supervised approaches of MetaWIBELE (full list from Supplementary Table 30). We quantified BGC genes using MetaWIBELE priority scores generated by the unsupervised approach (a) and the supervised approach (b). câf, In addition, assembly-based gene quantification from MetaWIBELE agrees well with reference-based quantification from HUMAnN among known proteins. c, MetaWIBELE identified most of the HUMAnN-detected protein families in the HMP2 dataset along with many unique proteins. Abundances assigned to proteins detected by both MetaWIBELE and HUMAnN were highly correlated over samples (Spearmanâs correlation, two-tailed pâ<â2.2eâ16) (d), had similar Bray-Curtis dissimilarity profiles between samples (e) and were highly correlated within samples (f). Box plot boxes indicate quartiles and whiskers show inner fences.
Extended Data Fig. 10 Potentially bioactive microbial protein families from marine ecosystems are prioritized by MetaWIBELE.
a, More than 80% (out of 469,542 in total) of the protein families from Red Sea metagenomes were uncharacterized, and more than 70% were novel proteins (proteins with remote homology or without homology to known proteins), which was on average 25% greater than (generally better-studied) human associated communities. b, These uncharacterized proteins were abundant across samples, indicating that they are likely to contribute to unknown but important biochemical functions within the ocean ecosystems. c, Further, MetaWIBELE prioritized 334,386 protein families which showed differential abundance (DA) between the epipelagic (EPI) and mesopelagic (MES) layers, still including ~80% uncharacterized protein families. Effect sizes are differences between mean log-scaled abundances among depth layers. Positive values indicate more abundance in the mesopelagic layer. d, Functional annotations of prioritized protein families for each category were assigned by MetaWIBELE. e, f, Enumeration of the prioritization score and fold enrichment (the ratio of the overlap to the expected overlap) of species and Pfam domains among highly prioritized protein families. The top 30 species and Pfam domains with the largest mean fold enrichment are listed in decreasing order. Effect size is as defined in c (full list from Supplementary Tables 32, 33).
Supplementary information
Supplementary Information
This file contains Supplementary Methods; Supplementary Discussion; Supplementary Notes 1-4; legends for Supplementary Tables 1â33 and Supplementary References
Rights and permissions
About this article
Cite this article
Zhang, Y., Bhosle, A., Bae, S. et al. Discovery of bioactive microbial gene products in inflammatory bowel disease. Nature 606, 754â760 (2022). https://doi.org/10.1038/s41586-022-04648-7
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41586-022-04648-7
This article is cited by
-
Predicting functions of uncharacterized gene products from microbial communities
Nature Biotechnology (2025)
-
Microbiota in inflammatory bowel disease: mechanisms of disease and therapeutic opportunities
Nature Reviews Microbiology (2025)
-
Multi-omic analysis reveals transkingdom gut dysbiosis in metabolic dysfunction-associated steatotic liver disease
Nature Metabolism (2025)
-
The gutâbrain axis underlying hepatic encephalopathy in liver cirrhosis
Nature Medicine (2025)
-
Gut microbiome and cardiometabolic comorbidities in people living with HIV
Microbiome (2024)