IdTaxa {DECIPHER} | R Documentation |
Classifies sequences according to a training set by assigning a confidence to taxonomic labels for each taxonomic rank.
IdTaxa(test, trainingSet, type = "extended", strand = "both", threshold = 60, bootstraps = 100, samples = L^0.47, minDescend = 0.98, processors = 1, verbose = TRUE)
test |
A |
trainingSet |
An object of class |
type |
Character string indicating the type of output desired. This should be (an abbreviation of) one of |
strand |
Character string indicating the orientation of the |
threshold |
Numeric specifying the confidence at which to truncate the output taxonomic classifications. Lower values of |
bootstraps |
Integer giving the number of bootstrap replicates to perform for each sequence. |
samples |
A function or call written as a function of ‘L’, which will evaluate to a numeric vector the same length as ‘L’. Typically of the form “ |
minDescend |
Numeric giving the minimum fraction of |
processors |
The number of processors to use, or |
verbose |
Logical indicating whether to display progress. |
Sequences in test
are each assigned a taxonomic classification based on the trainingSet
created with LearnTaxa
. Each taxonomic level is given a confidence between 0% and 100%, and the taxonomy is truncated where confidence drops below threshold
. If the taxonomic classification was truncated, the last group is labeled with “unclassified_” followed by the final taxon's name. Note that the reported confidence is not a p-value but does directly relate to a given classification's probability of being wrong. The default threshold
of 60%
is intended to minimize the rate of incorrect classifications. Lower values of threshold
(e.g., 50%
) may be preferred to increase the taxonomic depth of classifications.
If type
is "extended"
(the default) then an object of class Taxa
and subclass Train is returned. This is stored as a list with elements corresponding to their respective sequence in test
. Each list element contains components:
taxon |
A character vector containing the taxa to which the sequence was assigned. |
confidence |
A numeric vector giving the corresponding percent confidence for each taxon. |
rank |
If the classifier was trained with a set of |
If type
is "collapsed"
then a character vector is returned with the taxonomic assignment for each sequence. This takes the repeating form “Taxon name [rank, confidence%]; ...” if rank
s were supplied during training, or “Taxon name [confidence%]; ...” otherwise.
Erik Wright eswright@pitt.edu
Murali, A., et al. (2018). IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences. Microbiome, 6, 140. https://doi.org/10.1186/s40168-018-0521-5
data("TrainingSet_16S") # import test sequences fas <- system.file("extdata", "Bacteria_175seqs.fas", package="DECIPHER") dna <- readDNAStringSet(fas) # remove any gaps in the sequences dna <- RemoveGaps(dna) # classify the test sequences ids <- IdTaxa(dna, TrainingSet_16S, strand="top") ids # view the results plot(ids, TrainingSet_16S)