importIsoformExpression {IsoformSwitchAnalyzeR}R Documentation

Import expression data from Kallisto, Salmon or RSEM into R.

Description

A general-purpose import function which imports isoform expression data from Kallisto, Salmon or RSEM into R. This is a wrapper for the tximport package with some extra functionalities and is meant to be used to import the data and afterwards a switchAnalyzeRlist can be created with importRdata. It is highly reccomended that both the imported TxPM and counts values are used both in the creation of the switchAnalyzeRlist with importRdata (through the "isoformCountMatrix" and "isoformRepExpression" arguments). Importantly this import function also enables inter-library normalization (via edgeR) of the abundance estimates. Note that the pattern argument allows import of only a subset of files.

Usage

importIsoformExpression(
    parentDir,
    calculateCountsFromAbundance=TRUE,
    interLibNormTxPM=TRUE,
    normalizationMethod='TMM',
    pattern='',
    invertPattern=FALSE,
    ignore.case=FALSE,
    showProgress = TRUE,
    quiet = FALSE
)

Arguments

parentDir

Parrent directory where each quantified sample is in a sub-directory.

calculateCountsFromAbundance

A logic indicating whether to generate estimated counts using the estimated abundances. Recomended as it will incooperate the bias correction algorithms into the analysis. Default is TRUE.

interLibNormTxPM

A logic indicating whether to apply an inter-library normalization (via edgeR) to the imported abundances. Recomended as it allow comparison of abundances between samples. Default is TRUE.

normalizationMethod

A string indicating the method used for the inter-library normalization. Must be one of "TMM", "RLE", "upperquartile". See ?edgeR::calcNormFactors for more details. Default is "TMM".

pattern

Character string containing a regular expression for which files to import (applied to full path). Defailt is "" corresponding to all. See base::grepl for more details.

invertPattern

Logical. If TRUE return indices or values for elements that do not match..

ignore.case

if FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching.

showProgress

A logic indicating whether to make a progress bar (if TRUE) or not (if FALSE). Default is FALSE.

quiet

A logic indicating whether to avoid printing progress messages (incl. progress bar). Default is FALSE

Details

This function requires all data that should be imported is in a directory (as indicated by parentDir) where each quantified sample is in a seperate sub-directory.

For Kallisto the bias estimation is enabled by adding '–bias' to the function call. For Salmon the bias estimation is enabled by adding '–seqBias' and '–gcBias' to the function call. For RSEM the bias estimation is enabled by adding '–estimate-rspd' to the function call.

Inter library normalization is (almost always) nessesary due to small changes in the RNA composition between cells and is highly recommended for all analysis of RNAseq data. For more information please refere to the edgeR user guide.

The inter-library normalization of TxPM values is performed as a three step process: First the effective counts are calculated from the abundances using the library specific effective isoform lengths. The effective counts are then suppled to edgeR which calculates the normalization factors nesseary. Lastly the calculated normalization factors are applied to the imported TxPM values.

This function expexts the files produced by Kallisto/Salmon/RSEM to be called their default names (with possible costum prefix): Kallisto files are called 'abundance.tsv', Salmon files are called 'quant.sf' and RSEM files are called 'isoforms.results'.

Value

A list containing an abundance matrix, a count matrix and a matrix with the effective lengths for each isoform quantified (rows) in each sample (col) where the first column contains the isoform_ids. The options used for import are stored under the "importOptions" entry). The abundance estimates are in the unit of Transcripts Per Million (TPM) and measrung the relative abundance of a specic transcript.

Transcripts Per Million values are abbreviated to TPM by RSEM, Kallisto and Salmon but will here refered to as TxPM to avoid confusion with the commenly used Tags Per Million (which have been around for way longer). TxPM is an equivilent to RPKM/FPKM except it has been adjusted for as all the biases being modeled by the tools used for the quantifictation including the fragment length distribution and sequence-specific bias as well as GC-fragment bias (this is specific to each tool and how it was run so you need to look up the specific tool). The TxPM is optimal for expression comparison of abundances since most biases will be taking into account.

Author(s)

Kristoffer Vitting-Seerup

References

Vitting-Seerup et al. The Landscape of Isoform Switches in Human Cancers. Mol. Cancer Res. (2017). Soneson et al. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research 4, 1521 (2015). Robinson et al. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology (2010)

See Also

importRdata
createSwitchAnalyzeRlist
preFilter


[Package IsoformSwitchAnalyzeR version 1.2.0 Index]