odseq_unaligned {odseq}R Documentation

Outlier detection provided a distance/similarity matrix of sequences.

Description

Provided a similarity matrix (like the ones provided using string kernels in kebabs). It will then compute a score for each sequence and perform bootstrap to provide information on the distribution of the scores, which is used to distinguish outlier sequences.

Usage

odseq_unaligned(distance_matrix, B = 100, threshold = 0.025, type = "similarity")

Arguments

distance_matrix

A numeric matrix representing either similarity or distance among unaligned sequences. Package kebabs may be useful for this task.

B

Integer indicating the number of bootstrap replicates to be run. The higher the more robust the detection should be.

threshold

Float indicating the probability to be left at the right of the bootstrap scores distribution when computing outliers. This parameter may need some tuning depending on each specific problem

type

A string indicating the type of distance metric used. Either 'similarity' or 'distance'.

Value

Returns a logical vector, where TRUE indicates an outlier.

Author(s)

José Jiménez <jose@jimenezluna.com>

References

[1] OD-seq: outlier detection in multiple sequence alignments. Peter Jehl, Fabian Sievers and Desmond G. Higgins. BMC Bioinformatics. 2015.

See Also

odseq

Examples

library(kebabs)
data(seqs)
sp <- spectrumKernel(k = 3)
mat <- getKernelMatrix(sp, seqs)
odseq_unaligned(mat, B = 1000, threshold = 0.025, type = "similarity")

[Package odseq version 1.18.0 Index]