shuffle_sequences {universalmotif} | R Documentation |
Given a set of input sequences, shuffle the letters within those sequences with any k-let size.
shuffle_sequences(sequences, k = 1, method = "euler", nthreads = 1, rng.seed = sample.int(10000, 1))
sequences |
|
k |
|
method |
|
nthreads |
|
rng.seed |
|
If method = 'markov'
, then the Markov model is used to
generate sequences which will maintain (on average) the k-let
frequencies. Please note that this method is not a 'true' shuffling, and
for short sequences (e.g. <100bp) this can result in slightly more
dissimilar sequences versus true shuffling. See
Fitch (1983) for a discussion on the
topic.
If method = 'euler'
, then the sequence shuffling method proposed by
Altschul and Erickson (1985) is used. As opposed
to the 'markov' method, this one preserves exact k-let frequencies. This
is done by creating a k-let edge graph, then following a
random Eulerian walk through the graph. Not all walks will use up all
available letters however, so the cycle-popping algorithm proposed by
Propp and Wilson (1998) is used to find a
random Eulerian path. A side effect of using this method is that the
starting and ending sequence letters will remain unshuffled.
If method = 'linear'
, then the input sequences are split linearly
every k
letters. For example, for k = 3
'ACAGATAGACCC' becomes
'ACA GAT AGA CCC'; after which these 3
-lets are shuffled randomly.
Do note however, that the method
parameter is only relevant for k > 1
.
For k = 1
, a simple shuffling is performed using the shuffle
function
from the C++ standard library.
XStringSet
The input sequences will be returned with
identical names and lengths.
Benjamin Jean-Marie Tremblay, b2tremblay@uwaterloo.ca
Altschul SF, Erickson BW (1985). “Significance of Nucleotide Sequence Alignments: A Method for Random Sequence Permutation That Preserves Dinucleotide and Codon Usage.” Molecular Biology and Evolution, 2(6), 526–538.
Fitch WM (1983). “Random sequences.” Journal of Molecular Biology, 163(2), 171–176.
Propp JG, Wilson DW (1998). “How to get a perfectly random sample from a generic markov chain and generate a random spanning tree of a directed graph.” Journal of Algorithms, 27, 170–217.
create_sequences()
, scan_sequences()
, enrich_motifs()
,
shuffle_motifs()
if (R.Version()$arch != "i386") { sequences <- create_sequences() sequences.shuffled <- shuffle_sequences(sequences, k = 2) }