Skip to main content
Fig. 3 | GigaScience

Fig. 3

From: A close look at protein function prediction evaluation protocols

Fig. 3

Label distribution comparison between CV, NA and NP. First we computed the probability (number of annotated proteins/number of all proteins) of GO category i in the training and test sets for all three setups, denoted by \(p_{i}^{\text {tr}}\) and \(p_{i}^{\text {tst}}\), respectively; in the CV setup the calculation was performed five times for each fold and averaged across the five folds. The discrepancy for category i is then defined as: \(|p_{i}^{\text {tr}} - p_{i}^{\text {tst}}| / (p_{i}^{\text {tr}} + p_{i}^{\text {tst}})\). The average discrepancy is shown in top left panel. p-values based on paired t-tests for CV vs NA and CV vs NP in all three subontologies for both species are less than 1Eāˆ’4 or 10āˆ’4. The individual signed discrepancy values (without the absolute value) are shown in the other three panels in sorted order by their magnitude for each setup

Back to article page