generic_long_pipeline {FLAMES} | R Documentation |
Generic implementation of the flames pipeline. Used for both bulk reads and single cell reads.
generic_long_pipeline( annot, fastq, in_bam, outdir, genome_fa, minimap2_dir, downsample_ratio, config_file, do_genome_align, do_isoform_id = TRUE, do_read_realign, do_transcript_quanti, gen_raw_isoform, has_UMI, MAX_DIST, MAX_TS_DIST, MAX_SPLICE_MATCH_DIST, min_fl_exon_len, Max_site_per_splice, Min_sup_cnt, Min_cnt_pct, Min_sup_pct, strand_specific, remove_incomp_reads, use_junctions, no_flank, use_annotation, min_tr_coverage, min_read_coverage )
annot |
The file path to gene annotations file in gff3 format |
fastq |
The file path to input fastq file |
in_bam |
optional BAM file which replaces fastq directory argument. This skips the genome alignment and realignment steps |
outdir |
The path to directory to store all output files. |
genome_fa |
The file path to genome fasta file. |
minimap2_dir |
Path to the directory containing minimap2, if it is not in PATH. Only required if either or both of
|
downsample_ratio |
Integer; downsampling ratio if performing downsampling analysis. |
config_file |
File path to the JSON configuration file. If specified, |
do_genome_align |
Boolean; specifies whether to run the genome alignment step. |
do_isoform_id |
Boolean; specifies whether to run the isoform identification step. |
do_read_realign |
Boolean; specifies whether to run the read realignment step. |
do_transcript_quanti |
Boolean; specifies whether to run the transcript quantification step. |
gen_raw_isoform |
Boolean; specifies whether a gff3 should be generated containing the raw isoform information in the isoform identification step |
has_UMI |
Boolean; specifies if the data contains UMI. |
MAX_DIST |
Real; maximum distance allowed when merging splicing sites in isoform consensus clustering. |
MAX_TS_DIST |
Real; maximum distance allowed when merging transcript start/end position in isoform consensus clustering. |
MAX_SPLICE_MATCH_DIST |
Real; maximum distance allowed when merging splice site called from the data and the reference annotation. |
min_fl_exon_len |
Real; minimum length for the first exon outside the gene body in reference annotation. This is to correct the alignment artifact |
Max_site_per_splice |
Real; maximum transcript start/end site combinations allowed per splice chain |
Min_sup_cnt |
Real; minimum number of read support an isoform. Decreasing this number will significantly increase the number of isoform detected. |
Min_cnt_pct |
Real; minimum percentage of count for an isoform relative to total count for the same gene. |
Min_sup_pct |
Real; minimum percentage of count for an splice chain that support a given transcript start/end site combination. |
strand_specific |
1, -1 or 0. 1 indicates if reads are in the same strand as mRNA, -1 indicates reads are reverse complemented, 0 indicates reads are not strand specific. |
remove_incomp_reads |
Real; determines the strength of truncated isoform filtering. Larger number means more stringent filtering. |
use_junctions |
Boolean; determiens whether to use known splice junctions to help correct the alignment results |
no_flank |
Boolean; passed to minimap2 for synthetic spike-in data. Refer to Minimap2 document for more details |
use_annotation |
Boolean; specifies whether to use reference to help annotate known isoforms |
min_tr_coverage |
Real; minimum percentage of isoform coverage for a read to be aligned to that isoform |
min_read_coverage |
Real; minimum percentage of read coverage for a read to be uniquely aligned to that isoform |
This generic function returns a named list containing the output file names of the provided output files in the given 'outdir' directory. These files are loaded into R in either a SummarizedExperiment or SingleCellExperiment object by the callers to this function, 'sc_long_pipeline()' and 'bulk_long_pipeline()' respectively.