Fig. 2: FUGAsseM predicts BP terms of microbial communities with high accuracy. | Nature Biotechnology

Fig. 2: FUGAsseM predicts BP terms of microbial communities with high accuracy.

From: Predicting functions of uncharacterized gene products from microbial communities

Fig. 2

a, FUGAsseM’s term-level performance strongly correlates with STRING’s isolate-based predictions and expands coverage to species lacking isolate data (n = 21 total terms). The Pearson correlation coefficients (95% confidence interval (CI)) and unadjusted P values between each pair of measurements are shown (n = 84 total terms). AUROCs were averaged per term per species (details in Supplementary Tables 5 and 6). Box plots display the median (line at the 50th percentile), IQR (box spanning the 25th to 75th percentiles), whiskers (extending to 1.5 × the IQR) and mean values (dark points). b, Across species, FUGAsseM shows comparable performance to STRING but supports more species (n = 147 total species). The Pearson correlation coefficients (95% CI) and unadjusted P values between each pair of measurements are shown (n = 140 total species). The full list is provided in Supplementary Tables 7 and 8. Box plots are displayed as in a. c, FUGAsseM matches the accuracy of state-of-the-art methods designed for single organisms while scaling to more species. AUROCs were averaged across the most abundant HMP2 species. Because of method limitations, only the top ten and five species were tested with DeepGOPlus and NetGO2.0, respectively. Only 14 of the top 25 species had isolate-based data in STRING (Supplementary Table 9). d, FUGAsseM-MTX and FUGAsseM-full models retained strong accuracy in predicting BP terms supported by newly accumulated experimental evidence (n = 34 total terms for temporal hold-out evaluation; Supplementary Table 10). Box plots are displayed as in a. e, FUGAsseM predicted significantly higher scores (GSEA method; FDR-adjusted P < 0.002 for multiple comparisons) for the annotations that lacked experimental evidence at T0 but gained accumulated experimental validation from T0 to T1 (that is, accumulated evidence) or totally unseen annotations at T0 with accumulated experimental validation at T1 (that is, new evidence).

Source data

Back to article page