Skip to main content
Fig. 3 | GigaScience

Fig. 3

From: A close look at protein function prediction evaluation protocols

Fig. 3

Label distribution comparison between CV, NA and NP. First we computed the probability (number of annotated proteins/number of all proteins) of GO category i in the training and test sets for all three setups, denoted by \(p_{i}^{\text {tr}}\) and \(p_{i}^{\text {tst}}\), respectively; in the CV setup the calculation was performed five times for each fold and averaged across the five folds. The discrepancy for category i is then defined as: \(|p_{i}^{\text {tr}} - p_{i}^{\text {tst}}| / (p_{i}^{\text {tr}} + p_{i}^{\text {tst}})\). The average discrepancy is shown in top left panel. p-values based on paired t-tests for CV vs NA and CV vs NP in all three subontologies for both species are less than 1E−4 or 10−4. The individual signed discrepancy values (without the absolute value) are shown in the other three panels in sorted order by their magnitude for each setup

Back to article page