STAR.align.single {ORFik}R Documentation

Align single or paired end pair with STAR

Description

If you want more than two files use: STAR.align.folder
If genome aligner halts at .... loading genome, it means the star index was aborted early, then you need to run: STAR.remove.crashed.genome(), with the genome that crashed, and rerun.

Usage

STAR.align.single(
  file1,
  file2 = NULL,
  output.dir,
  index.dir,
  star.path = STAR.install(),
  fastp = install.fastp(),
  steps = "tr-ge",
  adapter.sequence = "auto",
  min.length = 15,
  trim.front = 0,
  alignment.type = "Local",
  max.cpus = min(90, detectCores() - 1),
  wait = TRUE,
  resume = NULL,
  script.single = system.file("STAR_Aligner", "RNA_Align_pipeline.sh", package =
    "ORFik")
)

Arguments

file1

library file, if paired must be R1 file

file2

default NULL, set if paired end to R2 file

output.dir

directory to save indices, default: paste0(dirname(arguments[1]), "/STAR_index/"), where arguments is the arguments input for this function.

index.dir

path to STAR index folder. Path returned from ORFik function STAR.index, when you created the index folders.

star.path

path to STAR, default: STAR.install(), if you don't have STAR installed at default location, it will install it there, set path to a runnable star if you already have it.

fastp

path to fastp trimmer, default: install.fastp(), if you have it somewhere else already installed, give the path. Only works for unix (linux or Mac OS), if not on unix, use your favorite trimmer and give the output files from that trimmer as input.dir here.

steps

a character, default: "tr-ge", trimming then genome alignment
steps of depletion and alignment wanted: The posible candidates you can use are: tr: trim reads, ph: phix depletion, rR: rrna depletion, nc: ncrna depletion, tR: trna depletion, ge: genome alignment, all: run all steps)
If not "all", a subset of these ("tr-ph-rR-nc-tR-ge")
In bash script it is reformated to this style: (trimming and genome do: "tr-ge", write "all" to get all: "tr-ph-rR-nc-tR-ge") the step where you align to the genome is usually always included, unless you are doing pure contaminant analysis. For Ribo-seq and TCP(RCP-seq) you should do rR (ribosomal RNA depletion), so when you made the STAR index you need the rRNA step (usually just download a Silva rRNA database for SSU&LSU at: https://www.arb-silva.de/)

adapter.sequence

character, default: "auto" (auto detect adapter, is not very reliable for Ribo-seq, so then you must include, else alignment will most likely fail!). Else manual assigned adapter like: "ATCTCGTATGCCGTCTTCTGCTTG" or "AAAAAAAAAAAAA".

min.length

15, minimum length of reads to pass filter.

trim.front

0, default trim 0 bases 5'. For Ribo-seq set use 0. Ignored if tr (trim) is not one of the arguments in "steps"

alignment.type

default: "Local": standard local alignment with soft-clipping allowed, "EndToEnd" (global): force end-to-end read alignment, does not soft-clip.

max.cpus

integer, default: min(90, detectCores() - 1), number of threads to use. Default is minimum of 90 and maximum cores - 1

wait

a logical (not NA) indicating whether the R interpreter should wait for the command to finish, or run it asynchronously. This will be ignored (and the interpreter will always wait) if intern = TRUE. When running the command asynchronously, no output will be displayed on the Rgui console in Windows (it will be dropped, instead).

resume

default: NULL, continue from step, lets say steps are "tr-ph-ge": (trim, phix depletion, genome alignment) and resume is "ph", you will use the trimmed data and continue from there starting at phix, usefull if something crashed.

script.single

location of STAR single file alignment script, default internal ORFik file. You can change it and give your own if you need special alignments.

Details

Can only run on unix systems (Linux and Mac), and requires minimum 30GB memory on genomes like human, rat, zebrafish etc.
The trimmer used is fastp (the fastest I could find), works on mac and linux. If you want to use your own trimmer set file1/file2 to the location of the trimmed files from your program.

Value

output.dir, can be used as as input in ORFik::create.experiment

See Also

Other STAR: STAR.align.folder(), STAR.index(), STAR.install(), STAR.multiQC(), STAR.remove.crashed.genome(), getGenomeAndAnnotation(), install.fastp()

Examples

# Use your own paths for annotation or the ORFik way

## use ORFik way:
output.dir <- "/Bio_data/references/Human"
# arguments <- getGenomeAndAnnotation("Homo sapiens", output.dir)
# index <- STAR.index(arguments, output.dir)
# STAR.align.single("data/raw_data/human_rna_seq/file1.bam", "data/processed/human_rna_seq",
#                    index)

[Package ORFik version 1.8.6 Index]