Accuracy calculation as defined in Engelhardt et al. (2011)

Uses SIFTER's 2011 definition of accuracy, where a protein is tagged as accurately predicted if the highest ranked prediction matches it.

accuracy_sifter(pred, lab, tol = 1e-10, highlight = "", ...)

# S3 method for aphylo_estimates
accuracy_sifter(pred, lab, tol = 1e-10, highlight = "", ...)

# S3 method for default
accuracy_sifter(pred, lab, tol = 1e-10, highlight = "", nine_na = TRUE, ...)

Arguments

pred: A matrix of predictions, or an aphylo_estimates object.
lab: A matrix of labels (0,1,NA, or 9 if nine_na = TRUE).
tol: Numeric scalar. Predictions within tol of the max score will be tagged as the prediction made by the model (see deails).
highlight: Pattern passed to sprintf used to highlight predicted functions that match the observed.
...: Further arguments passed to the method. In the case of aphylo_estimates, the arguments are passed to predict.aphylo_estimates().
nine_na: Treat 9 as NA.

Value

A data frame with Ntip() rows and four variables. The variables are:

Gene: Label of the gene
Predicted: The assigned gene function.
Observed: The true set of gene functions.
Accuracy: The measurement of accuracy according to Engelhardt et al. (2011).

Details

The analysis is done at the protein level. For each protein, the function compares the YES annotations of that proteins with the predicted by the model. The algorithm selects the predicted annotations as those that are within tol of the maximum score.

This algorithm doesn't take into account NOT annotations (0s), which are excluded from the analysis.

When highlight = "", no highlight is done.

Examples

set.seed(81231)
atree <- raphylo(50, psi = c(0,0), P = 3)
ans <- aphylo_mcmc(atree ~ mu_d + mu_s + Pi)
#> Warning: While using multiple chains, a single initial point has been passed via `initial`: c(0.9, 0.5, 0.1, 0.05, 0.5). The values will be recycled. Ideally you would want to start each chain from different locations.
#> Convergence has been reached with 10000 steps. Gelman-Rubin's R: 1.0134. (500 final count of samples).

accuracy_sifter(ans)
#>    Gene Predicted                Observed Accuracy
#> 1     1   fun0001 fun0000,fun0001,fun0002        1
#> 2     2   fun0001         fun0000,fun0001        1
#> 3     3   fun0002                 fun0002        1
#> 4     4   fun0002         fun0001,fun0002        1
#> 5     5   fun0002                 fun0002        1
#> 6     6   fun0002                 fun0002        1
#> 7     7   fun0002         fun0001,fun0002        1
#> 8     8   fun0001         fun0000,fun0001        1
#> 9     9   fun0002         fun0000,fun0002        1
#> 10   10   fun0001         fun0001,fun0002        1
#> 11   11   fun0002         fun0001,fun0002        1
#> 12   12   fun0002                 fun0002        1
#> 13   13   fun0002                 fun0002        1
#> 14   14   fun0002                 fun0002        1
#> 15   15   fun0002                 fun0002        1
#> 16   16   fun0002                 fun0002        1
#> 17   17   fun0001         fun0000,fun0001        1
#> 18   18   fun0002                 fun0002        1
#> 19   19   fun0002         fun0000,fun0002        1
#> 20   20   fun0001 fun0000,fun0001,fun0002        1
#> 21   21   fun0002                 fun0002        1
#> 22   22   fun0001         fun0000,fun0001        1
#> 23   23   fun0002                 fun0002        1
#> 24   24   fun0002         fun0001,fun0002        1
#> 25   25   fun0002         fun0001,fun0002        1
#> 26   26   fun0000         fun0000,fun0001        1
#> 27   27   fun0002 fun0000,fun0001,fun0002        1
#> 28   28   fun0002         fun0001,fun0002        1
#> 29   29   fun0002 fun0000,fun0001,fun0002        1
#> 30   30   fun0000 fun0000,fun0001,fun0002        1
#> 31   31   fun0002 fun0000,fun0001,fun0002        1
#> 32   32   fun0002                 fun0002        1
#> 33   33   fun0002         fun0000,fun0002        1
#> 34   34   fun0001                 fun0001        1
#> 35   35   fun0002                 fun0002        1
#> 36   36   fun0002                 fun0002        1
#> 37   37   fun0002                 fun0002        1
#> 38   38   fun0002         fun0000,fun0002        1
#> 39   39   fun0002                 fun0002        1
#> 40   40   fun0002                 fun0002        1
#> 41   41   fun0001         fun0000,fun0002        0
#> 42   42   fun0001         fun0000,fun0001        1
#> 43   43   fun0002                 fun0002        1
#> 44   44   fun0001                 fun0001        1
#> 45   45   fun0000         fun0000,fun0001        1
#> 46   46   fun0001         fun0000,fun0001        1
#> 47   47   fun0002         fun0001,fun0002        1
#> 48   48   fun0002 fun0000,fun0001,fun0002        1
#> 49   49   fun0002         fun0001,fun0002        1
#> 50   50   fun0002                 fun0002        1