simulateClumpSizeDist {motifcounter}R Documentation

Empirical clump size distribution

Description

This function repeatedly simulates random DNA sequences according to the background model and subsequently counts the number of k-clump occurrences, where denotes the clump size. This function is only used for benchmarking analysis.

Usage

simulateClumpSizeDist(pfm, bg, seqlen, nsim = 10, singlestranded = FALSE)

Arguments

pfm

An R matrix that represents a position frequency matrix

bg

A Background object

seqlen

Integer-valued vector that defines the lengths of the individual sequences. For a given DNAStringSet, this information can be retrieved using numMotifHits.

nsim

Integer number of random samples.

singlestranded

Boolean that indicates whether a single strand or both strands shall be scanned for motif hits. Default: singlestranded = FALSE.

Value

A List that contains

dist

Empirical distribution of the clump sizes

See Also

compoundPoissonDist,combinatorialDist

Examples



# Load sequences
seqfile = system.file("extdata", "seq.fasta", package = "motifcounter")
seqs = Biostrings::readDNAStringSet(seqfile)

# Load background
bg = readBackground(seqs, 1)

# Load motif
motiffile = system.file("extdata", "x31.tab", package = "motifcounter")
motif = t(as.matrix(read.table(motiffile)))

# Study the clump size frequencies in one sequence of length 1 Mb
seqlen = 1000000

# scan both strands
simc = motifcounter:::simulateClumpSizeDist(motif, bg, seqlen)

# scan a single strand
simc = motifcounter:::simulateClumpSizeDist(motif, bg,
    seqlen, singlestranded = TRUE)


[Package motifcounter version 1.14.0 Index]