simulate_experiment_countmat {polyester} | R Documentation |
create FASTA files containing RNA-seq reads simulated from provided transcripts, with optional differential expression between two groups (designated via read count matrix)
simulate_experiment_countmat(fasta = NULL, gtf = NULL, seqpath = NULL, readmat, outdir = ".", paired = TRUE, seed = NULL, ...)
fasta |
path to FASTA file containing transcripts from which to simulate reads. See details. |
gtf |
path to GTF file or data frame containing transcript structures
from which reads should be simulated. See details and
|
seqpath |
path to folder containing one FASTA file ( |
readmat |
matrix with rows representing transcripts and columns representing samples. Entry i,j specifies how many reads to simulate from transcript i for sample j. |
outdir |
character, path to folder where simulated reads should be written, without a slash at the end of the folder name. By default, reads written to the working directory. |
paired |
If |
seed |
Optional seed to set before simulating reads, for reproducibility. |
... |
Additional arguments to add nuance to the simulation, as described
extensively in the details of |
Reads can either be simulated from a FASTA file of transcripts
(provided with the fasta
argument) or from a GTF file plus DNA
sequences (provided with the gtf
and seqpath
arguments).
Simulating from a GTF file and DNA sequences may be a bit slower: it took
about 6 minutes to parse the GTF/sequence files for chromosomes 1-22,
X, and Y in hg19.
No return, but simulated reads are written to outdir
.
Li W and Jiang T (2012): Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads. Bioinformatics 28(22): 2914-2921.
fastapath = system.file("extdata", "chr22.fa", package="polyester") numtx = count_transcripts(fastapath) readmat = matrix(20, ncol=10, nrow=numtx) readmat[1:30, 1:5] = 40 simulate_experiment_countmat(fasta=fastapath, readmat=readmat, outdir='simulated_reads_2', seed=5)