Extended Data Fig. 7: Functional repertoire of the gut microbial protein families is expanded by FUGAsseM in the HMP2. | Nature Biotechnology

Extended Data Fig. 7: Functional repertoire of the gut microbial protein families is expanded by FUGAsseM in the HMP2.

From: Predicting functions of uncharacterized gene products from microbial communities

Extended Data Fig. 7

(a-c) FUGAsseM assigned high-confidence annotations to previously uncharacterized protein families across all GO aspects—BP (a), MF (b), and CC (c)—in the HMP2. (d–g) Among the top 25 most uncharacterized HMP2 species with the greatest number of novel proteins, many novel proteins were annotated with high-confidence MF and CC terms, improving functional coverage for both well-studied and poorly characterized taxa. Predictions were defined at two thresholds: ‘default’ (probability ≥ 0.75) and ‘stringent’ (≥ 0.85). Categories include: ‘no_ann’ (no high-confidence predictions), ‘preserved_ann’ (proteins already annotated in UniProt), ‘amp_ann (default/stringent)’ (known proteins with new predictions), and ‘new_ann (default/stringent)’ (uncharacterized proteins with new annotations). Full results are in Supplementary Table 13. (h-i) FUGAsseM annotations also captured biologically relevant signals associated with inflammatory bowel disease (IBD). Gene Set Enrichment Analysis (GSEA) showed that proteins prioritized by MetaWIBELE in IBD were more strongly enriched in FUGAsseM-predicted annotations. Normalized enrichment scores (NES) revealed significant enrichment (h) and depletion (i) among protein families associated with IBD. Here, NES is an adjusted enrichment score by correcting differences of gene-set sizes, reflecting the degree to which a gene list is overrepresented at the top of a gene list ranked by MetaWIBELE’s prioritization. The top 25 BP most significantly enriched terms (GSEA test; FDR-adjusted P < 0.25 for multiple comparisons) are listed in decreasing order by NES.

Source data

Back to article page