cbDistMatrix {seqTools} | R Documentation |
Calculates pairwise distance matrix from DNA k-mer counts based on a modified Canberra distance. Before calculating canberra distances, read counts are normalized (in order to correct systematic effects on the distance) by scaling up read counts in each DNA k-mer count vector so that normalized read counts in each sample are nearly equal.
cbDistMatrix(object,nReadNorm=max(nReads(object)))
object |
|
nReadNorm |
|
The distance between two DNA k-mer normalized count vectors is calculated by
df (X,Y) = ∑ cbc(x_i, y_i) / 4^k
where cb is given by
cbd(x,y)=|x-y|/(x+y).
Square matrix
. The number of rows equals the number of files
(=nFiles(object)
).
The static size of the retured k-mer array is 4^k.
Wolfgang Kaisers
Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM The sanger FASTQ file format for sequences with quality scores and the Solexa/Illumina FASTQ variants. Nucleic Acids Research 2010 Vol.38 No.6 1767-1771
hclust
basedir<-system.file("extdata",package="seqTools") basenames<-c("g4_l101_n100.fq.gz","g5_l101_n100.fq.gz") filenames<-file.path(basedir,basenames) fq<-fastqq(filenames,6,c("g4","g5")) dm<-cbDistMatrix(fq)