Extended Data Fig. 6: Threshold selection for high-confidence predictions of FUGAsseM. | Nature Biotechnology

Extended Data Fig. 6: Threshold selection for high-confidence predictions of FUGAsseM.

From: Predicting functions of uncharacterized gene products from microbial communities

Extended Data Fig. 6

(a) The thresholds of prediction confidence when achieving the maximum F1 score were heterozygous across models (here, Random Forest models) used for predicting each term per species (n = 21,785 total term-species pairs for prediction). Box plots display the median (line at the 50th percentile), interquartile range (box spanning the 25th to 75th percentiles), whiskers (extending to 1.5× IQR), and mean values (dark points). (b) In addition, known annotations in UniProt tended to be predicted with higher confidence than those unknowns in all types of GO aspects. A threshold like 0.75 prediction probability looks strict enough to cover most of known annotations with high confidence. (c) Though the threshold of 0.75 achieved high recall while keeping new potentially true predictions, it still looks ‘default’ with low precision. We defined a ‘stringent’ threshold (that is, 0.85 prediction probability) that doubled the precision while maintaining recall. However, a more ‘stringent’ threshold makes it more possible to miss true new predictions meanwhile.

Source data

Back to article page