extract_UTR3Anno {InPAS}R Documentation

extract 3' UTR information from a GenomicFeatures::TxDb object

Description

extract 3' UTR information from a GenomicFeatures::TxDb object. The 3'UTR is defined as the last 3'UTR fragment for each transcript and it will be cut if there is any overlaps with other exons.

Usage

extract_UTR3Anno(
  TxDb = NULL,
  edb = NULL,
  removeScaffolds = FALSE,
  MAX_EXONS_GAP = 10000
)

Arguments

TxDb

an object of GenomicFeatures::TxDb

edb

An object of ensembldb::EnsDb

removeScaffolds

A logical(1) vector, whether the scaffolds should be removed from the genome If you use a TxDb containing alternative scaffolds, you'd better to remove the scaffolds.

MAX_EXONS_GAP

An integer(1) vector, maximal gap sizes between last known CP sites to downstream exons

Details

A good practice is to perform read alignment using a reference genome from Ensembl/GenCode including only the primary assembly and build a TxDb using the GTF/GFF files downloaded from the same source as the reference genome, such as BioMart/Ensembl/GenCode. For instruction, see Vignette of the GenomicFeatures. The UCSC reference genomes and their annotation can be very cubersome.

Value

An object of GenomicRanges::GRangesList, containing GRanges for extracted 3' UTRs, and the corresponding last CDSs and next.exon.gap for each chromosome/scaffold.

Author(s)

Jianhong Ou, Haibo Liu

Examples

library("EnsDb.Hsapiens.v86")
library("GenomicFeatures")
samplefile <- system.file("extdata",
                          "hg19_knownGene_sample.sqlite",
                           package = "GenomicFeatures")
TxDb <- loadDb(samplefile)
edb <- EnsDb.Hsapiens.v86
utr3 <- extract_UTR3Anno(TxDb, edb,
                 removeScaffolds = TRUE,
                 MAX_EXONS_GAP = 10000)

[Package InPAS version 2.0.0 Index]