fine_cluster_seqs {CellaRepertorium} | R Documentation |
The distances between AA sequences is defined to be 1-score/max(score) times the median length of the input sequences. The distances between nucleotide sequences is defined to be edit_distance/max(edit_distance) times the median length of input sequences.
fine_cluster_seqs( seqs, type = "AA", big_memory_brute = FALSE, method = "levenshtein", substitution_matrix = "BLOSUM100", cluster_fun = "none", cluster_method = "complete" )
seqs |
character vector, DNAStringSet or AAStringSet |
type |
character either |
big_memory_brute |
attempt to cluster more than 4000 sequences? Clustering is quadratic, so this will take a long time and might exhaust memory |
method |
one of 'substitutionMatrix' or 'levenshtein' |
substitution_matrix |
a character vector naming a substitution matrix available in Biostrings, or a substitution matrix itself |
cluster_fun |
|
cluster_method |
character passed to |
list
hclust()
, Biostrings::stringDist()
fasta_path = system.file('extdata', 'demo.fasta', package='CellaRepertorium') aaseq = Biostrings::readAAStringSet(fasta_path)[1:100] cls = fine_cluster_seqs(aaseq, cluster_fun = 'hclust') plot(cls$cluster)