systemPipeR 1.20.0
Note: the most recent version of this tutorial can be found here.
Note: if you use systemPipeR
in published research, please cite:
Backman, T.W.H and Girke, T. (2016). systemPipeR
: NGS Workflow and Report Generation Environment. BMC Bioinformatics, 17: 388. 10.1186/s12859-016-1241-0.
The intended way of running sytemPipeR
workflows is via *.Rmd
files, which
can be executed either line-wise in interactive mode or with a single command from
R or the command-line. This way comprehensive and reproducible analysis reports
can be generated in PDF or HTML format in a fully automated manner by making use
of the highly functional reporting utilities available for R.
The following shows how to execute a workflow (e.g., systemPipeRNAseq.Rmd)
from the command-line.
Rscript -e "rmarkdown::render('systemPipeRNAseq.Rmd')"
Templates for setting up custom project reports are provided as *.Rmd
files by the helper package systemPipeRdata
and in the vignettes subdirectory of systemPipeR
. The corresponding HTML of these report templates are available here: systemPipeRNAseq
, systemPipeRIBOseq
, systemPipeChIPseq
and systemPipeVARseq
. To work with *.Rmd
files efficiently, basic knowledge of knitr
and Latex
or R Markdown v2
is required.
The working environment of the sample data loaded in the previous step contains the following pre-configured directory structure. Directory names are indicated in green. Users can change this structure as needed, but need to adjust the code in their workflows accordingly.
CWL param
and input.yml
files need to be in the same subdirectory.The following parameter files are included in each workflow template:
targets.txt
: initial one provided by user; downstream targets_*.txt
files are generated automatically*.param/cwl
: defines parameter for input/output file operations, e.g.:
hisat2-se/hisat2-mapping-se.cwl
hisat2-se/hisat2-mapping-se.yml
*_run.sh
: optional bash scripts.batchtools.conf.R
: defines the type of scheduler for batchtools
pointing to template file of cluster, and located in user’s home directory*.tmpl
: specifies parameters of scheduler used by a system, e.g. Torque, SGE, Slurm, etc.This workflow demonstrates how to use various utilities for building and running automated end-to-end analysis workflows for RNA-Seq
data.
The full workflow can be found here: HTML, .Rmd, and .R.
Load the RNA-Seq
sample workflow into your current working directory.
library(systemPipeRdata)
genWorkenvir(workflow = "rnaseq")
setwd("rnaseq")
Next, run the chosen sample workflow systemPipeRNAseq
(.Rmd) by executing from the command-line make -B
within the rnaseq
directory. Alternatively, one can run the code from the provided *.Rmd
template file from within R interactively.
Workflow includes following steps:
HISAT2
(or any other RNA-Seq aligner)This workflow demonstrates how to use various utilities for building and running automated end-to-end analysis workflows for ChIP-Seq
data.
The full workflow can be found here: HTML, .Rmd, and .R.
Load the ChIP-Seq
sample workflow into your current working directory.
library(systemPipeRdata)
genWorkenvir(workflow = "chipseq")
setwd("chipseq")
Next, run the chosen sample workflow systemPipeChIPseq
(.Rmd) by executing from the command-line make -B
within the chipseq
directory. Alternatively, one can run the code from the provided *.Rmd
template file from within R interactively.
Workflow includes following steps:
Bowtie2
or rsubread
MACS2
, BayesPeak
This workflow demonstrates how to use various utilities for building and running automated end-to-end analysis workflows for VAR-Seq
data.
The full workflow can be found here: HTML, .Rmd, and .R.
Load the VAR-Seq
sample workflow into your current working directory.
library(systemPipeRdata)
genWorkenvir(workflow = "varseq")
setwd("varseq")
Next, run the chosen sample workflow systemPipeVARseq
(.Rmd) by executing from the command-line make -B
within the varseq
directory. Alternatively, one can run the code from the provided *.Rmd
template file from within R interactively.
Workflow includes following steps:
gsnap
, bwa
VariantTools
, GATK
, BCFtools
VariantTools
and VariantAnnotation
VariantAnnotation
This workflow demonstrates how to use various utilities for building and running automated end-to-end analysis workflows for RIBO-Seq
data.
The full workflow can be found here: HTML, .Rmd, and .R.
Load the RIBO-Seq
sample workflow into your current working directory.
library(systemPipeRdata)
genWorkenvir(workflow = "riboseq")
setwd("riboseq")
Next, run the chosen sample workflow systemPipeRIBOseq
(.Rmd) by executing from the command-line make -B
within the ribseq
directory. Alternatively, one can run the code from the provided *.Rmd
template file from within R interactively.
Workflow includes following steps:
HISAT2
(or any other RNA-Seq aligner)sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] DESeq2_1.26.0 batchtools_0.9.11
## [3] data.table_1.12.6 ape_5.3
## [5] ggplot2_3.2.1 systemPipeR_1.20.0
## [7] ShortRead_1.44.0 GenomicAlignments_1.22.0
## [9] SummarizedExperiment_1.16.0 DelayedArray_0.12.0
## [11] matrixStats_0.55.0 Biobase_2.46.0
## [13] BiocParallel_1.20.0 Rsamtools_2.2.0
## [15] Biostrings_2.54.0 XVector_0.26.0
## [17] GenomicRanges_1.38.0 GenomeInfoDb_1.22.0
## [19] IRanges_2.20.0 S4Vectors_0.24.0
## [21] BiocGenerics_0.32.0 BiocStyle_2.14.0
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.4-1 rjson_0.2.20 hwriter_1.3.2
## [4] htmlTable_1.13.2 base64enc_0.1-3 rstudioapi_0.10
## [7] bit64_0.9-7 AnnotationDbi_1.48.0 codetools_0.2-16
## [10] splines_3.6.1 geneplotter_1.64.0 knitr_1.25
## [13] zeallot_0.1.0 Formula_1.2-3 annotate_1.64.0
## [16] cluster_2.1.0 GO.db_3.10.0 dbplyr_1.4.2
## [19] pheatmap_1.0.12 graph_1.64.0 BiocManager_1.30.9
## [22] compiler_3.6.1 httr_1.4.1 GOstats_2.52.0
## [25] backports_1.1.5 assertthat_0.2.1 Matrix_1.2-17
## [28] lazyeval_0.2.2 limma_3.42.0 formatR_1.7
## [31] acepack_1.4.1 htmltools_0.4.0 prettyunits_1.0.2
## [34] tools_3.6.1 gtable_0.3.0 glue_1.3.1
## [37] GenomeInfoDbData_1.2.2 Category_2.52.0 dplyr_0.8.3
## [40] rappdirs_0.3.1 Rcpp_1.0.2 vctrs_0.2.0
## [43] debugme_1.1.0 nlme_3.1-141 rtracklayer_1.46.0
## [46] xfun_0.10 stringr_1.4.0 XML_3.98-1.20
## [49] edgeR_3.28.0 zlibbioc_1.32.0 scales_1.0.0
## [52] BSgenome_1.54.0 VariantAnnotation_1.32.0 hms_0.5.1
## [55] RBGL_1.62.0 RColorBrewer_1.1-2 yaml_2.2.0
## [58] curl_4.2 gridExtra_2.3 memoise_1.1.0
## [61] rpart_4.1-15 biomaRt_2.42.0 latticeExtra_0.6-28
## [64] stringi_1.4.3 RSQLite_2.1.2 genefilter_1.68.0
## [67] checkmate_1.9.4 GenomicFeatures_1.38.0 rlang_0.4.1
## [70] pkgconfig_2.0.3 bitops_1.0-6 evaluate_0.14
## [73] lattice_0.20-38 purrr_0.3.3 labeling_0.3
## [76] htmlwidgets_1.5.1 bit_1.1-14 tidyselect_0.2.5
## [79] GSEABase_1.48.0 AnnotationForge_1.28.0 magrittr_1.5
## [82] bookdown_0.14 R6_2.4.0 Hmisc_4.2-0
## [85] base64url_1.4 DBI_1.0.0 foreign_0.8-72
## [88] pillar_1.4.2 withr_2.1.2 nnet_7.3-12
## [91] survival_2.44-1.1 RCurl_1.95-4.12 tibble_2.1.3
## [94] crayon_1.3.4 BiocFileCache_1.10.0 rmarkdown_1.16
## [97] progress_1.2.2 locfit_1.5-9.1 grid_3.6.1
## [100] blob_1.2.0 Rgraphviz_2.30.0 digest_0.6.22
## [103] xtable_1.8-4 brew_1.0-6 openssl_1.4.1
## [106] munsell_0.5.0 askpass_1.1
This project is funded by NSF award ABI-1661152.