STAR.align.folder {ORFik} | R Documentation |
Does either all files as paired end or single end,
so if you have mix, split them in two different folders.
#' If STAR halts at .... loading genome, it means the STAR
index was aborted early, then you need to run:
STAR.remove.crashed.genome(), with the genome that crashed, and rerun.
STAR.align.folder( input.dir, output.dir, index.dir, star.path = STAR.install(), fastp = install.fastp(), paired.end = "no", steps = "tr-ge", adapter.sequence = "auto", min.length = 15, trim.front = 0, alignment.type = "Local", max.cpus = min(90, detectCores() - 1), wait = TRUE, include.subfolders = "n", script.folder = system.file("STAR_Aligner", "RNA_Align_pipeline_folder.sh", package = "ORFik"), script.single = system.file("STAR_Aligner", "RNA_Align_pipeline.sh", package = "ORFik") )
input.dir |
path to fast files to align, can either be fasta files (.fastq, .fq, .fa etc) or compressed files with .gz. Also either paired end or single end reads. |
output.dir |
directory to save indices, default: paste0(dirname(arguments[1]), "/STAR_index/"), where arguments is the arguments input for this function. |
index.dir |
path to STAR index folder. Path returned from ORFik function STAR.index, when you created the index folders. |
star.path |
path to STAR, default: STAR.install(), if you don't have STAR installed at default location, it will install it there, set path to a runnable star if you already have it. |
fastp |
path to fastp trimmer, default: install.fastp(), if you have it somewhere else already installed, give the path. Only works for unix (linux or Mac OS), if not on unix, use your favorite trimmer and give the output files from that trimmer as input.dir here. |
paired.end |
default "no", alternative "yes". Will auto detect pairs by names. If yes running on a folder: The folder must then contain an even number of files and they must be named with the same prefix and sufix of either _1 and _2, 1 and 2, etc. |
steps |
a character, default: "tr-ge", trimming then genome alignment |
adapter.sequence |
character, default: "auto" (auto detect adapter, is not very reliable for Ribo-seq, so then you must include, else alignment will most likely fail!). Else manual assigned adapter like: "ATCTCGTATGCCGTCTTCTGCTTG" or "AAAAAAAAAAAAA". |
min.length |
15, minimum length of reads to pass filter. |
trim.front |
0, default trim 0 bases 5'. For Ribo-seq set use 0. Ignored if tr (trim) is not one of the arguments in "steps" |
alignment.type |
default: "Local": standard local alignment with soft-clipping allowed, "EndToEnd" (global): force end-to-end read alignment, does not soft-clip. |
max.cpus |
integer, default: min(90, detectCores() - 1), number of threads to use. Default is minimum of 90 and maximum cores - 1 |
wait |
a logical (not |
include.subfolders |
"n" (no), do recursive search downwards for fast files if "y". |
script.folder |
location of STAR index script, default internal ORFik file. You can change it and give your own if you need special alignments. |
script.single |
location of STAR single file alignment script, default internal ORFik file. You can change it and give your own if you need special alignments. |
Can only run on unix systems (Linux and Mac), and requires minimum 30GB memory on genomes like human, rat, zebrafish etc. The trimmer used is fastp (the fastest I could find), works on mac and linux. If you want to use your own trimmer set file1/file2 to the location of the trimmed files from your program.
output.dir, can be used as as input in ORFik::create.experiment
Other STAR:
STAR.align.single()
,
STAR.index()
,
STAR.install()
,
STAR.multiQC()
,
STAR.remove.crashed.genome()
,
getGenomeAndAnnotation()
,
install.fastp()
# Use your own paths for annotation or the ORFik way ## use ORFik way: output.dir <- "/Bio_data/references/Human" # arguments <- getGenomeAndAnnotation("Homo sapiens", output.dir) # index <- STAR.index(arguments, output.dir) # STAR.align.folder("data/raw_data/human_rna_seq", "data/processed/human_rna_seq", # index, paired.end = "no")