derfinder
advanced details and usageIf you wish, you can view this vignette online here.
This vignette explains in more detail the relationship between the different functions in derfinder
(Collado-Torres, Frazee, Love, Irizarry, et al., 2015)
as well as add-on packages derfinderPlot
(vignette), derfinderHelper
(vignette) and regionReport
(vignette). The vignette also includes some bash scripts for running derfinder
in a cluster, although there are probably other ways to do so using only R scripts. For example, by using BatchJobsParam()
from BiocParallel.
This vignette assumes that you have read through the introductory vignette.
Lets start by loading the derfinder
package.
## Load libraries library('derfinder')
If you explore the package, you will notice that all exported functions use camel case name and have aliases with underscore names. For example, analyzeChr()
and analyze_chr()
. You should use whichever naming style you prefer. Regrettably, implementing the same flexibility at the argument naming level is not straightforward.
Just like it is shown in the introductory vignette, you can find more information about the advanced arguments using advancedArg()
.
## URLs to advanced arguemtns sapply(c('analyzeChr', 'loadCoverage'), advancedArg, browse = FALSE)
## analyzeChr ## "https://github.com/search?utf8=%E2%9C%93&q=advanced_argument+filename%3Aanalyze+repo%3Alcolladotor%2Fderfinder+path%3A%2FR&type=Code" ## loadCoverage ## "https://github.com/search?utf8=%E2%9C%93&q=advanced_argument+filename%3Aload+repo%3Alcolladotor%2Fderfinder+path%3A%2FR&type=Code"
## Set browse = TRUE if you want to open them in your browser
These arguments are options we expect not all users will want to change and which would otherwise make the help pages confusing to them. However, all arguments are documented in roxygen2 style in the source code.
Furthermore, note that using the ...
argument allows you to specify some of the documented arguments. For example, you might want to control the maxClusterGap
from findRegions()
in the analyzeChr()
call.
If you are working with data from an organism that is not Homo sapiens, then set the global options defining the species
and the chrsStyle
used. For example, if you are working with Arabidopsis Thaliana and the NCBI naming style, then set the options using the following code:
## Set global species and chrsStyle options options(species = 'arabidopsis_thaliana') options(chrsStyle = 'NCBI') ## Then proceed to load and analyze the data
Internally derfinder
uses extendedMapSeqlevels()
to use the appropriate chromosome naming style given a species in all functions involving chromosome names.
Further note that the argument subject
from analyzeChr()
is passed to bumphunter::annotateNearest(subject)
. So if you are using a genome different from hg19 remember to provide the appropriate annotation data or simply use analyzeChr(runAnnotation = FALSE)
. You might find the discussion Using bumphunter with non-human genomes useful.
If you are loading data from BAM files, you might want to specify some criteria to decide which reads to include or not. For example, your data might have been generated by a strand-specific protocol. You can do so by specifying the arguments of scanBamFlag()
from Rsamtools.
You can also control whether to include or exclude bases with CIGAR
string D
(deletion from the reference) by setting the advanced argument drop.D = TRUE
in your fullCoverage()
or loadCoverage()
call.
Note that in most scenarios, the fullCov
object illustrated in the introductory vignette can be large in memory. When making plots or calculating the region level coverage, we don't need the full information. In such situations, it might pay off to create a smaller version by loading only the required data. This can be achieved using the advanced argument which
to fullCoverage()
or loadCoverage()
.
However, it is important to consider that when reading the data from BAM files, a read might align partially inside the region of interest. By default such a read would be discarded and thus the base-level coverage would be lower than what it is in reality. The advanced argument protectWhich
extends regions by 30 kbp (15 kbp each side) to help mitigate this issue.
We can illustrate this issue with the example data from derfinder
. First, we load in the data and generate some regions of interest.
## Find some regions to work with example('loadCoverage', 'derfinder')
## ## ldCvrg> datadir <- system.file('extdata', 'genomeData', package='derfinder') ## ## ldCvrg> files <- rawFiles(datadir = datadir, samplepatt = '*accepted_hits.bam$', ## ldCvrg+ fileterm = NULL) ## ## ldCvrg> ## Shorten the column names ## ldCvrg> names(files) <- gsub('_accepted_hits.bam', '', names(files)) ## ## ldCvrg> ## Read and filter the data, only for 2 files ## ldCvrg> dataSmall <- loadCoverage(files = files[1:2], chr = '21', cutoff = 0)
## ## ldCvrg> ## Not run: ## ldCvrg> ##D ## Export to BigWig files ## ldCvrg> ##D createBw(list('chr21' = dataSmall)) ## ldCvrg> ##D ## ldCvrg> ##D ## Load data from BigWig files ## ldCvrg> ##D dataSmall.bw <- loadCoverage(c(ERR009101 = 'ERR009101.bw', ERR009102 = ## ldCvrg> ##D 'ERR009102.bw'), chr = 'chr21') ## ldCvrg> ##D ## ldCvrg> ##D ## Compare them ## ldCvrg> ##D mapply(function(x, y) { x - y }, dataSmall$coverage, dataSmall.bw$coverage) ## ldCvrg> ##D ## ldCvrg> ##D ## Note that the only difference is the type of Rle (integer vs numeric) used ## ldCvrg> ##D ## to store the data. ## ldCvrg> ##D ## ldCvrg> ## End(Not run) ## ldCvrg> ## ldCvrg> ## ldCvrg>
example('getRegionCoverage', 'derfinder')
## ## gtRgnC> ## Obtain fullCov object ## gtRgnC> fullCov <- list('21'=genomeDataRaw$coverage) ## ## gtRgnC> ## Assign chr lengths using hg19 information, use only first two regions ## gtRgnC> library('GenomicRanges')
## ## gtRgnC> data(hg19Ideogram, package = 'biovizBase', envir = environment()) ## ## gtRgnC> regions <- genomeRegions$regions[1:2] ## ## gtRgnC> seqlengths(regions) <- seqlengths(hg19Ideogram)[names(seqlengths(regions))] ## ## gtRgnC> ## Finally, get the region coverage ## gtRgnC> regionCov <- getRegionCoverage(fullCov=fullCov, regions=regions)
Next, we load the coverage again using which
but without any padding. We can see how the coverage is not the same by looking at the maximum coverage for each sample.
## Illustrate reading data from a set of regions test <- loadCoverage(files = files, chr = '21', cutoff = NULL, which = regions, protectWhich = 0, fileStyle = 'NCBI')
## Some reads were ignored and thus the coverage is lower as can be seen below: sapply(test$coverage, max) - sapply(genomeDataRaw$coverage, max)
## ERR009101 ERR009102 ERR009105 ERR009107 ERR009108 ERR009112 ERR009115 ## 0 0 0 0 -1 0 -1 ## ERR009116 ERR009131 ERR009138 ERR009144 ERR009145 ERR009148 ERR009151 ## -2 -2 -2 -1 -1 -3 -3 ## ERR009152 ERR009153 ERR009159 ERR009161 ERR009163 ERR009164 ERR009167 ## 0 -3 -3 -3 -1 -3 -3 ## SRR031812 SRR031835 SRR031867 SRR031868 SRR031900 SRR031904 SRR031914 ## 0 -1 0 0 0 -1 0 ## SRR031936 SRR031958 SRR031960 ## 0 0 0
When we re-load the data using some padding to the regions, we find that the coverage matches at all the bases.
## Illustrate reading data from a set of regions test2 <- loadCoverage(files = files, chr = '21', cutoff = NULL, which = regions, protectWhich = 3e4, fileStyle = 'NCBI')
## Adding some padding to the regions helps get the same coverage identical(sapply(test2$coverage, max), sapply(genomeDataRaw$coverage, max))
## [1] TRUE
## A more detailed test reveals that the coverage matches at every base all(mapply(function(x, y) { identical(x, y) }, test2$coverage, genomeDataRaw$coverage))
## [1] TRUE
How much padding you need to use will depend on your specific data set, and you might be comfortable getting approximately the same coverage values for the sake of greatly reducing the memory resources needed.
If you are under the case where you like to use a specific chromosome naming style but the raw data files use another one, you might need to use the fileStyle
argument.
For example, you could be working with Homo sapiens data and your preferred naming style is UCSC (chr1, chr2, ..., chrX, chrY) but the raw data uses NCBI style names (1, 2, ..., X, Y). In that case, use fullCoverage(fileStyle = 'NCBI')
or loadCoverage(fileStyle = 'NCBI')
depending if you are loading one chromosome or multiple at a time.
If you prefer to do so, fullCoverage()
and loadCoverage()
can load the data of a chromosome in chunks using GenomicFiles. This is controlled by whether you specify the tilewidth
argument.
Notice that you might run into slight coverage errors near the borders of the tiles for the same reason that was illustrated previously when loading specific regions.
This approach is not necessarily more efficient and can be significantly time consuming if you use a small tilewidth
.
The following figure illustrates how most of derfinder
's functions interact when performing a base-level differential expression analysis by calculating base-level F-statistics.
Flow chart of the different processing steps (black boxes) that can be carried out using derfinder
and the functions that perform these actions (in red). Input and output is shown in green boxes. Functions in blue are those applied to the results from multiple chromosomes (mergeResults()
and derfinderReport
). regionReport
functions are shown in orange while derfinderPlot
functions are shown in dark purple. Purple dotted arrow marks functions that require unfiltered base-level coverage.
analyzeChr()
flow chartThis figure shows in more detail the processing flow in analyzeChr()
, which is the main function for identifying candidate differentially expressed regions (DERs) from the base-level F-statistics.
Many fine-tunning arguments can be passed to analyzeChr()
to feed into the other functions. For example, you might want to use a smaller chunksize
when pre-processing the coverage data (the default is 5 million): specially if you have hundreds of samples.
Another useful argument is scalefac
(by default it's 32) which controls the scaling factor to use before the log2 transformation.
Furthermore, you might want to specify maxClusterGap
to control the maximum gap between two regions before they are considered to be part of the same cluster.
regionMatrix()
flow chartThe above figure shows the functions used internally by regionMatrix()
and processing steps. Overall, it is much simpler than analyzeChr()
.
Currently, the following functions can use multiple cores, several of which are called inside analyzeChr()
.
calculatePvalues()
: 1 core per chunk of data to process.calculateStats()
: 1 core per chunk of data to process.coerceGR()
: 1 core per chromosome. This function is used by createBw()
.coverageToExon()
: 1 core per strand, then 1 core per chromosome.loadCoverage()
: up to 1 core per tile when loading the data with GenomicFiles. Otherwise, no parallelization is used.fullCoverage()
: 1 core per chromosome. In general, try to avoid using more than 10 cores as you might reach your maximum network speed and/or hard disk input/output seed. For the case described in loadCoverage()
, you can specify how many cores to use per chromosome for the tiles using the mc.cores.load
argument effectively resulting in mc.cores
times mc.cores.load
used (otherwise it's mc.cores
squared).getRegionCoverage()
: 1 core per chromosome.regionMatrix()
: 1 core per chromosome.All parallel operations use SnowParam()
from BiocParallel when more than 1 core is being used. Otherwise, SerialParam()
is used. Note that if you prefer to specify other types of parallelization you can do so by specifying the BPPARAM.custom
advanced argument.
Because SnowParam()
requires R
to load the necessary packages on each worker, the key function fstats.apply()
was isolated in the derfinderHelper
package. This package has much faster loading speeds than derfinder
which greatly impacts performance on cases where the actual step of calculating the F-statistics is fast.
You may prefer to use MulticoreParam()
described in the BiocParallel vignette. In that case, when using these functions use BPPARAM.custom = MulticoreParam(workers = x)
where x
is the number of cores you want to use. Note that in some systems, as is the case of the cluster used by derfinder
's developers, the system tools for assessing memory usage can be misleading, thus resulting in much higher memory loads when using MulticoreParam()
instead of the default SnowParam()
.
For each project, we recommend the following organization.
ProjectDir
|-derCoverageInfo
|-derAnalysis
|---analysis01
|---analysis02
That is, a main project directory with two initial directories. One for storing the coverage data, and one for storing each analysis: you might explore different models, cutoffs or other parameters.
You can then use fullCoverage()
, save the result and also save the filtered coverage information for each chromosome separately. Doing so will result in the following structure.
ProjectDir
|-derCoverageInfo
|---Chr1Cov.Rdata
|---Chr2Cov.Rdata
...
|---ChrYCov.Rdata
|---fullCov.Rdata
|-derAnalysis
|---analysis01
|---analysis02
Next, you can use analyzeChr()
for each of the chromosomes of a specific analysis (say analysis01). Doing so will create several Rdata files per chromosome as shown below. bash
scripts can be useful if you wish to submit one cluster job per chromosome. In general, you will use the same model and group information for each chromosome, so saving the information can be useful.
ProjectDir
|-derCoverageInfo
|---Chr1Cov.Rdata
|---Chr2Cov.Rdata
...
|---ChrYCov.Rdata
|---fullCov.Rdata
|-derAnalysis
|---analysis01
|-----models.Rdata
|-----groupInfo.Rdata
|-----chr1/
|-------chunksDir/
|-------logs/
|-------annotation.Rdata
|-------coveragePrep.Rdata
|-------fstats.Rdata
|-------optionsStats.Rdata
|-------regions.Rdata
|-------timeinfo.Rdata
|-----chr2/
...
|-----chrY/
|---analysis02
Then use mergeResults()
to pool together the results from all the chromosomes for a given analysis (here analysis01).
ProjectDir
|-derCoverageInfo
|---Chr1Cov.Rdata
|---Chr2Cov.Rdata
...
|---ChrYCov.Rdata
|---fullCov.Rdata
|-derAnalysis
|---analysis01
|-----logs/
|-----fullAnnotatedRegions.Rdata
|-----fullFstats.Rdata
|-----fullNullSummary.Rdata
|-----fullRegions.Rdata
|-----fullTime.Rdata
|-----optionsMerge.Rdata
|-----chr1/
|-------chunksDir/
|-------logs/
|-------annotation.Rdata
|-------coveragePrep.Rdata
|-------fstats.Rdata
|-------optionsStats.Rdata
|-------regions.Rdata
|-------timeinfo.Rdata
|-----chr2/
...
|-----chrY/
|---analysis02
Finally, you might want to use derfinderReport()
from regionReport
to create a HTML report of the results.
For interacting between bash
and R
we have found quite useful the getopt package. Here we include an example R
script that is controlled by a bash
script which submits a job for each chromosome to analyze for a given analysis.
The two files, derfinderAnalysis.R and derAnalysis.sh should live under the derAnalysis directory.
ProjectDir
|-derCoverageInfo
|---Chr1Cov.Rdata
|---Chr2Cov.Rdata
...
|---ChrYCov.Rdata
|---fullCov.Rdata
|-derAnalysis
|---derfinder-Analysis.R
|---derAnalysis.sh
|---analysis01
|---analysis02
Then, you can simply use:
cd /ProjectDir/derAnalysis
sh derAnalysis.sh analysis01
to run analyzeChr()
on all your chromosomes.
## Run derfinder's analysis steps with timing info ## Load libraries library("getopt") ## Available at https://github.com/lcolladotor/derfinder library("derfinder") ## Specify parameters spec <- matrix(c( 'DFfile', 'd', 1, "character", "path to the .Rdata file with the results from loadCoverage()", 'chr', 'c', 1, "character", "Chromosome under analysis. Use X instead of chrX.", 'mcores', 'm', 1, "integer", "Number of cores", 'verbose' , 'v', 2, "logical", "Print status updates", 'help' , 'h', 0, "logical", "Display help" ), byrow=TRUE, ncol=5) opt <- getopt(spec) ## Testing the script test <- FALSE if(test) { ## Speficy it using an interactive R session and testing test <- TRUE } ## Test values if(test){ opt <- NULL opt$DFfile <- "/ProjectDir/derCoverageInfo/chr21Cov.Rdata" opt$chr <- "21" opt$mcores <- 1 opt$verbose <- NULL } ## if help was asked for print a friendly message ## and exit with a non-zero error code if (!is.null(opt$help)) { cat(getopt(spec, usage=TRUE)) q(status=1) } ## Default value for verbose = TRUE if (is.null(opt$verbose)) opt$verbose <- TRUE if(opt$verbose) message("Loading Rdata file with the output from loadCoverage()") load(opt$DFfile) ## Make it easy to use the name later. Here I'm assuming the names were generated using output='auto' in loadCoverage() eval(parse(text=paste0("data <- ", "chr", opt$chr, "CovInfo"))) eval(parse(text=paste0("rm(chr", opt$chr, "CovInfo)"))) ## Just for testing purposes if(test) { tmp <- data tmp$coverage <- tmp$coverage[1:1e6, ] library("IRanges") tmp$position[which(tmp$pos)[1e6 + 1]:length(tmp$pos)] <- FALSE data <- tmp } ## Load the models load("models.Rdata") ## Load group information load("groupInfo.Rdata") ## Run the analysis with lowMemDir analyzeChr(chr=opt$chr, coverageInfo=data, models=models, cutoffFstat=1e-06, cutoffType="theoretical", nPermute=1000, seeds=seq_len(1000), maxClusterGap=3000, groupInfo=groupInfo, subject="hg19", mc.cores=opt$mcores, lowMemDir=file.path(tempdir(), paste0("chr", opt$chr) , "chunksDir")), verbose=opt$verbose, chunksize=1e5) ## Done if(opt$verbose) { print(proc.time()) print(sessionInfo(), locale=FALSE) }
Remember to modify the the script to fit your project.
#!/bin/sh
## Usage
# sh derAnalysis.sh analysis01
# Directories
MAINDIR=/ProjectDir
WDIR=${MAINDIR}/derAnalysis
DATADIR=${MAINDIR}/derCoverageInfo
# Define variables
SHORT='derA-01'
PREFIX=$1
# Construct shell files
for chrnum in 22 21 Y 20 19 18 17 16 15 14 13 12 11 10 9 8 X 7 6 5 4 3 2 1
do
echo "Creating script for chromosome ${chrnum}"
if [[ ${chrnum} == "Y" ]]
then
CORES=2
else
CORES=6
fi
chr="chr${chrnum}"
outdir="${PREFIX}/${chr}"
sname="${SHORT}.${PREFIX}.${chr}"
cat > ${WDIR}/.${sname}.sh <<EOF
#!/bin/bash
#$ -cwd
#$ -m e
#$ -l mem_free=120G,h_vmem=60G,h_fsize=10G
#$ -N ${sname}
#$ -pe local ${CORES}
echo "**** Job starts ****"
date
# Create output directory
mkdir -p ${WDIR}/${outdir}
# Make logs directory
mkdir -p ${WDIR}/${outdir}/logs
# run derfinder-analysis.R
cd ${WDIR}/${PREFIX}/
# specific to our cluster
# see http://www.jhpce.jhu.edu/knowledge-base/environment-modules/
module load R/3.1.x
Rscript ${WDIR}/derfinder2-analysis.R -d "${DATADIR}/${chr}Cov.Rdata" -c "${chrnum}" -m ${CORES} -v TRUE
# Move log files into the logs directory
mv ${WDIR}/${sname}.* ${WDIR}/${outdir}/logs/
echo "**** Job ends ****"
date
EOF
call="qsub .${sname}.sh"
$call
done
Your cluster might specify memory requirements differently and you might need to use fewer or more cores depending on your data set.
A fully documented example is available at derfinderExample.
TODO: update the example to use derfinder
version 0.99.0.
This vignette covered the most commonly used advanced arguments, details on how to load data, flow charts explaining the relationships between the functions, the recommended output organization, and example shell scripts for running the analysis.
This implementation of derfinder
has its origins in Alyssa C. Frazee's derfinder (Frazee, Sabunciyan, Hansen, Irizarry, et al., 2014)
. The statistical methods and implementation by now are very different.
derfinder
Please use:
## Citation info citation('derfinder')
## ## Collado-Torres L, Frazee AC, Love MI, Irizarry RA, Jaffe AE and ## Leek JT (2015). "derfinder: Software for annotation-agnostic ## RNA-seq differential expression analysis." _bioRxiv_. <URL: ## http://doi.org/10.1101/015370>, <URL: ## http://www.biorxiv.org/content/early/2015/02/19/015370.abstract>. ## ## Frazee AC, Sabunciyan S, Hansen KD, Irizarry RA and Leek JT ## (2014). "Differential expression analysis of RNA-seq data at ## single-base resolution." _Biostatistics_, *15 (3)*, pp. 413-426. ## <URL: http://doi.org/10.1093/biostatistics/kxt053>, <URL: ## http://biostatistics.oxfordjournals.org/content/15/3/413.long>.
This package was made possible thanks to:
(R Core Team, 2015)
(Pages, Carlson, Falcon, and Li, 2015)
(Morgan, Lang, and Thompson, 2015)
and (Jaffe, E, Murakami, Peter, et al., 2012)
(Collado-Torres, Jaffe, and Leek, 2015)
(Arora, Morgan, Carlson, and Pages, 2015)
(Lawrence, Huber, Pagès, Aboyoun, et al., 2013)
(Lawrence, Huber, Pagès, Aboyoun, et al., 2013)
(Obenchain, Love, and Morgan, 2015)
(Lawrence, Huber, Pagès, Aboyoun, et al., 2013)
(Jr, with contributions from
Charles Dupont, and many
others., 2015)
(Lawrence, Huber, Pagès, Aboyoun, et al., 2013)
(Storey, 2015)
(Morgan, Pagès, Obenchain, and Hayden, 2015)
(Lawrence, Gentleman, and Carey, 2009)
(Pages, Lawrence, and Aboyoun, 2015)
(Yin, Lawrence, and Cook, 2015)
(Wickham and Chang, 2015)
(Wickham, 2009)
(Boettiger, 2014)
(Xie, 2014)
(Hester, 2013)
(Allaire, Cheng, Xie, McPherson, et al., 2015)
(Wickham, 2011)
(Carlson, 2015)
Code for creating the vignette
## Create the vignette library('knitrBootstrap') knitrBootstrapFlag <- packageVersion('knitrBootstrap') < '1.0.0' if(knitrBootstrapFlag) { ## CRAN version library('knitrBootstrap') system.time(knit_bootstrap('derfinderAdvanced.Rmd', chooser=c('boot', 'code'), show_code = TRUE)) unlink('derfinder.md') } else { ## GitHub version library('rmarkdown') system.time(render('derfinderAdvanced.Rmd', 'knitrBootstrap::bootstrap_document')) } ## Note: if you prefer the knitr version use: # library('rmarkdown') # system.time(render('derfinderAdvanced.Rmd', 'html_document')) ## Clean up file.remove('derfinderAdvRef.bib') ## Extract the R code library('knitr') knit('derfinderAdvanced.Rmd', tangle = TRUE)
Date the vignette was generated.
## [1] "2015-04-01 21:46:22 PDT"
Wallclock time spent generating the vignette.
## Time difference of 14.174 secs
R
session information.
## setting value ## version R version 3.2.0 alpha (2015-03-20 r68043) ## system x86_64, linux-gnu ## ui X11 ## language en_US: ## collate C ## tz <NA>
## package * version date source ## AnnotationDbi * 1.29.20 2015-04-02 Bioconductor ## Biobase * 2.27.3 2015-04-02 Bioconductor ## BiocGenerics 0.13.11 2015-04-02 Bioconductor ## BiocParallel * 1.1.21 2015-04-02 Bioconductor ## Biostrings * 2.35.12 2015-04-02 Bioconductor ## DBI * 0.3.1 2014-09-24 CRAN (R 3.2.0) ## Formula * 1.2-0 2015-01-20 CRAN (R 3.2.0) ## GenomeInfoDb 1.3.16 2015-04-02 Bioconductor ## GenomicAlignments * 1.3.32 2015-04-02 Bioconductor ## GenomicFeatures * 1.19.36 2015-04-02 Bioconductor ## GenomicFiles * 1.3.15 2015-04-02 Bioconductor ## GenomicRanges 1.19.50 2015-04-02 Bioconductor ## Hmisc * 3.15-0 2015-02-16 CRAN (R 3.2.0) ## IRanges 2.1.43 2015-04-02 Bioconductor ## MASS * 7.3-40 2015-03-21 CRAN (R 3.2.0) ## Matrix * 1.1-5 2015-01-18 CRAN (R 3.2.0) ## RColorBrewer * 1.1-2 2014-12-07 CRAN (R 3.2.0) ## RCurl * 1.95-4.5 2014-12-28 CRAN (R 3.2.0) ## RJSONIO * 1.3-0 2014-07-28 CRAN (R 3.2.0) ## RSQLite * 1.0.0 2014-10-25 CRAN (R 3.2.0) ## Rcpp * 0.11.5 2015-03-06 CRAN (R 3.2.0) ## RefManageR * 0.8.45 2015-01-09 CRAN (R 3.2.0) ## Rsamtools * 1.19.49 2015-04-02 Bioconductor ## S4Vectors 0.5.22 2015-04-02 Bioconductor ## XML * 3.98-1.1 2013-06-20 CRAN (R 3.2.0) ## XVector * 0.7.4 2015-04-02 Bioconductor ## acepack * 1.3-3.3 2014-11-24 CRAN (R 3.2.0) ## bibtex * 0.4.0 2014-12-31 CRAN (R 3.2.0) ## biomaRt * 2.23.5 2015-04-02 Bioconductor ## bitops * 1.0-6 2013-08-17 CRAN (R 3.2.0) ## bumphunter * 1.7.6 2015-04-02 Bioconductor ## cluster * 2.0.1 2015-01-31 CRAN (R 3.2.0) ## codetools * 0.2-11 2015-03-10 CRAN (R 3.2.0) ## colorspace * 1.2-6 2015-03-11 CRAN (R 3.2.0) ## derfinder 1.1.18 2015-04-02 Bioconductor ## derfinderHelper * 1.1.6 2015-04-02 Bioconductor ## devtools 1.7.0 2015-01-17 CRAN (R 3.2.0) ## digest * 0.6.8 2014-12-31 CRAN (R 3.2.0) ## doRNG * 1.6 2014-03-07 CRAN (R 3.2.0) ## evaluate * 0.5.5 2014-04-29 CRAN (R 3.2.0) ## foreach * 1.4.2 2014-04-11 CRAN (R 3.2.0) ## foreign * 0.8-63 2015-02-20 CRAN (R 3.2.0) ## formatR * 1.1 2015-03-29 CRAN (R 3.2.0) ## futile.logger * 1.4 2015-03-21 CRAN (R 3.2.0) ## futile.options * 1.0.0 2010-04-06 CRAN (R 3.2.0) ## ggplot2 * 1.0.1 2015-03-17 CRAN (R 3.2.0) ## gtable * 0.1.2 2012-12-05 CRAN (R 3.2.0) ## highr * 0.4.1 2015-03-29 CRAN (R 3.2.0) ## httr * 0.6.1 2015-01-01 CRAN (R 3.2.0) ## iterators * 1.0.7 2014-04-11 CRAN (R 3.2.0) ## knitcitations 1.0.5 2014-11-26 CRAN (R 3.2.0) ## knitr * 1.9 2015-01-20 CRAN (R 3.2.0) ## knitrBootstrap 0.9.0 2013-10-17 CRAN (R 3.2.0) ## lambda.r * 1.1.7 2015-03-20 CRAN (R 3.2.0) ## lattice * 0.20-31 2015-03-30 CRAN (R 3.2.0) ## latticeExtra * 0.6-26 2013-08-15 CRAN (R 3.2.0) ## locfit * 1.5-9.1 2013-04-20 CRAN (R 3.2.0) ## lubridate * 1.3.3 2013-12-31 CRAN (R 3.2.0) ## markdown * 0.7.4 2014-08-24 CRAN (R 3.2.0) ## matrixStats * 0.14.0 2015-02-14 CRAN (R 3.2.0) ## memoise * 0.2.1 2014-04-22 CRAN (R 3.2.0) ## munsell * 0.4.2 2013-07-11 CRAN (R 3.2.0) ## nnet * 7.3-9 2015-02-11 CRAN (R 3.2.0) ## pkgmaker * 0.22 2014-05-14 CRAN (R 3.2.0) ## plyr * 1.8.1 2014-02-26 CRAN (R 3.2.0) ## proto * 0.3-10 2012-12-22 CRAN (R 3.2.0) ## qvalue * 1.99.0 2015-04-02 Bioconductor ## registry * 0.2 2012-01-24 CRAN (R 3.2.0) ## reshape2 * 1.4.1 2014-12-06 CRAN (R 3.2.0) ## rngtools * 1.2.4 2014-03-06 CRAN (R 3.2.0) ## rpart * 4.1-9 2015-02-24 CRAN (R 3.2.0) ## rstudioapi * 0.2 2014-12-31 CRAN (R 3.2.0) ## rtracklayer * 1.27.11 2015-04-02 Bioconductor ## scales * 0.2.4 2014-04-22 CRAN (R 3.2.0) ## stringr * 0.6.2 2012-12-06 CRAN (R 3.2.0) ## survival * 2.38-1 2015-02-24 CRAN (R 3.2.0) ## xtable * 1.7-4 2014-09-12 CRAN (R 3.2.0) ## zlibbioc * 1.13.3 2015-04-02 Bioconductor
This vignette was generated using knitrBootstrap
(Hester, 2013)
with knitr
(Xie, 2014)
and rmarkdown
(Allaire, Cheng, Xie, McPherson, et al., 2015)
running behind the scenes.
Citations made with knitcitations
(Boettiger, 2014)
.
[1] J. Allaire, J. Cheng, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 0.5.1. 2015. URL: http://CRAN.R-project.org/package=rmarkdown.
[2] S. Arora, M. Morgan, M. Carlson and H. Pages. GenomeInfoDb: Utilities for manipulating chromosome and other 'seqname' identifiers. R package version 1.3.16. 2015.
[3] C. Boettiger. knitcitations: Citations for knitr markdown files. R package version 1.0.5. 2014. URL: http://CRAN.R-project.org/package=knitcitations.
[4] M. Carlson. TxDb.Hsapiens.UCSC.hg19.knownGene: Annotation package for TxDb object(s). R package version 3.1.2. 2015.
[5] L. Collado-Torres, A. C. Frazee, M. I. Love, R. A. Irizarry, et al. “derfinder: Software for annotation-agnostic RNA-seq differential expression analysis”. In: bioRxiv (2015). DOI: 10.1101/015370. URL: http://www.biorxiv.org/content/early/2015/02/19/015370.abstract.
[6] L. Collado-Torres, A. E. Jaffe and J. T. Leek. derfinderHelper: derfinder helper package. https://github.com/lcolladotor/derfinderHelper - R package version 1.1.6. 2015. URL: http://www.bioconductor.org/packages/release/bioc/html/derfinderHelper.html.
[7] A. C. Frazee, S. Sabunciyan, K. D. Hansen, R. A. Irizarry, et al. “Differential expression analysis of RNA-seq data at single-base resolution”. In: Biostatistics 15 (3) (2014), pp. 413-426. DOI: 10.1093/biostatistics/kxt053. URL: http://biostatistics.oxfordjournals.org/content/15/3/413.long.
[8] J. Hester. knitrBootstrap: Knitr Bootstrap framework. R package version 0.9.0. 2013. URL: http://CRAN.R-project.org/package=knitrBootstrap.
[9] Jaffe, A. E, Murakami, Peter, et al. “Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies”. In: International Journal of Epidemiology (2012).
[10] F. E. H. Jr, with contributions from Charles Dupont and many others. Hmisc: Harrell Miscellaneous. R package version 3.15-0. 2015. URL: http://CRAN.R-project.org/package=Hmisc.
[11] M. Lawrence, R. Gentleman and V. Carey. “rtracklayer: an R package for interfacing with genome browsers”. In: Bioinformatics 25 (2009), pp. 1841-1842. DOI: 10.1093/bioinformatics/btp328. URL: http://bioinformatics.oxfordjournals.org/content/25/14/1841.abstract.
[12] M. Lawrence, W. Huber, H. Pagès, P. Aboyoun, et al. “Software for Computing and Annotating Genomic Ranges”. In: PLoS Computational Biology 9 (8 2013). DOI: 10.1371/journal.pcbi.1003118. URL: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118}.
[13] M. Morgan, M. Lang and R. Thompson. BiocParallel: Bioconductor facilities for parallel evaluation. R package version 1.1.21. 2015.
[14] M. Morgan, H. Pagès, V. Obenchain and N. Hayden. Rsamtools: Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import. R package version 1.19.49. 2015. URL: http://bioconductor.org/packages/release/bioc/html/Rsamtools.html.
[15] V. Obenchain, M. Love and M. Morgan. GenomicFiles: Distributed computing by file or by range. R package version 1.3.15. 2015.
[16] H. Pages, M. Carlson, S. Falcon and N. Li. AnnotationDbi: Annotation Database Interface. R package version 1.29.20. 2015.
[17] H. Pages, M. Lawrence and P. Aboyoun. S4Vectors: S4 implementation of vectors and lists. R package version 0.5.22. 2015.
[18] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2015. URL: http://www.R-project.org/.
[19] J. Storey. qvalue: Q-value estimation for false discovery rate control. R package version 1.99.0. 2015. URL: http://github.com/jdstorey/qvalue, http://qvalue.princeton.edu/.
[20] H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009. ISBN: 978-0-387-98140-6. URL: http://had.co.nz/ggplot2/book.
[21] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: http://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
[22] H. Wickham and W. Chang. devtools: Tools to Make Developing R Packages Easier. R package version 1.7.0. 2015. URL: http://CRAN.R-project.org/package=devtools.
[23] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.
[24] T. Yin, M. Lawrence and D. Cook. biovizBase: Basic graphic utilities for visualization of genomic data. R package version 1.15.3. 2015.