sc_windows_pipeline_setup {FLAMES}R Documentation

Windows Single Cell FLAMES Pipeline

Description

An implementation of the FLAMES pipeline designed to run on Windows, or any OS without access to minimap2, for read realignment. This pipeline requires external read alignment, in betwen pipeline calls.

Usage

sc_windows_pipeline_setup(
  annot,
  fastq,
  in_bam = NULL,
  outdir,
  genome_fa,
  downsample_ratio = 1,
  config_file,
  match_barcode = TRUE,
  reference_csv = NULL,
  MAX_DIST = 0,
  UMI_LEN = 0
)

Arguments

annot

gene annotations file in gff3 format

fastq

file path to input fastq file

in_bam

optional bam file to replace fastq input files

outdir

directory to store all output files.

genome_fa

genome fasta file.

downsample_ratio

downsampling ratio if performing downsampling analysis.

config_file

JSON configuration file. If specified, config_file overrides all configuration parameters

match_barcode

boolean, specify if barcode matching should take place before the pipeline begins

reference_csv

reference csv for barcode matching

MAX_DIST

max dist

UMI_LEN

length of the UMI to find

Details

This function, sc_windows_pipeline_setup is the first step in the 3 step Windows FLAMES single cell pipeline, and should be run first, read alignment undertaken, then windows_pipline_isoforms should be run, read realignment performed, and finally windows_pipeline_quantification should be run. For each function, besides sc_windows_pipeline_setup, a list pipeline_variables is returned, which contains the information required to continue the pipeline. This list should be passed into each function, and updated with the returned list. In the case of sc_windows_pipeline_setup, pipeline_variables is the list returned. See the vignette 'Vignette for FLAMES bulk on Windows' for more details.

Value

a list pipeline_variables with the required variables for execution of later Windows pipeline steps. File paths required to perform minimap2 alignment are given in pipeline_variables$return_files. This list should be given as input for windows_pipeline_isoforms after minimap2 alignment has taken place; windows_pipeline_isoforms is the continuation of this pipeline.

Examples

## example windows pipeline for BULK data. See Vignette for single cell data.

# download the two fastq files, move them to a folder to be merged together
temp_path <- tempfile()
bfc <- BiocFileCache::BiocFileCache(temp_path, ask=FALSE)
file_url <- 
    "https://raw.githubusercontent.com/OliverVoogd/FLAMESData/master/data"
# download the required fastq files, and move them to new folder
fastq1 <- bfc[[names(BiocFileCache::bfcadd(bfc, "Fastq1", paste(file_url, "fastq/sample1.fastq.gz", sep="/")))]]
fastq2 <- bfc[[names(BiocFileCache::bfcadd(bfc, "Fastq2", paste(file_url, "fastq/sample2.fastq.gz", sep="/")))]]
fastq_dir <- paste(temp_path, "fastq_dir", sep="/") # the downloaded fastq files need to be in a directory to be merged together
dir.create(fastq_dir)
file.copy(c(fastq1, fastq2), fastq_dir)
unlink(c(fastq1, fastq2)) # the original files can be deleted

# run the FLAMES bulk pipeline setup
#pipeline_variables <- bulk_windows_pipeline_setup(annot=system.file("extdata/SIRV_anno.gtf", package="FLAMES"), 
#                   fastq=fastq_dir,
#                   outdir=tempdir(), genome_fa=system.file("extdata/SIRV_genomefa.fasta", package="FLAMES"),
#                   config_file=system.file("extdata/SIRV_config_default.json", package="FLAMES"))
# read alignment is handled externally (below downloads aligned bam for example)
# genome_bam <- paste0(temp_path, "/align2genome.bam")
# file.rename(bfc[[names(BiocFileCache::bfcadd(bfc, "Genome BAM", paste(file_url, "align2genome.bam", sep="/")))]], genome_bam)
# 
# genome_index <- paste0(temp_path, "/align2genome.bam.bai")
# file.rename(bfc[[names(BiocFileCache::bfcadd(bfc, "Genome BAM Index", paste(file_url, "align2genome.bam.bai", sep="/")))]], genome_index)
# pipeline_variables$genome_bam = genome_bam
# 
# # run the FLAMES bulk pipeline find isoforms step
# pipeline_variables <- windows_pipeline_isoforms(pipeline_variables)
# 
# # read realignment is handled externally
# realign_bam <- paste0(temp_path, "/realign2genome.bam")
# file.rename(bfc[[names(BiocFileCache::bfcadd(bfc, "Realign BAM", paste(file_url, "realign2transcript.bam", sep="/")))]], realign_bam)
# 
# realign_index <- paste0(temp_path, "/realign2genome.bam.bai")
# file.rename(bfc[[names(BiocFileCache::bfcadd(bfc, "Realign BAM Index", paste(file_url, "realign2transcript.bam.bai", sep="/")))]], realign_index)
# pipeline_variables$realign_bam <- realign_bam
# 
# # finally, quantification, which returns a Summarized Experiment object
# se <- windows_pipeline_quantification(pipeline_variables)

[Package FLAMES version 1.0.2 Index]