run_coverageQC {InPAS}R Documentation

Quality control on read coverage over gene bodies and 3UTRs

Description

Calculate coverage over gene bodies and 3UTRs. This function is used for quality control of the coverage.The coverage rate can show the complexity of RNA-seq library.

Usage

run_coverageQC(
  sqlite_db,
  TxDb,
  edb,
  genome,
  cutoff_readsNum = 1,
  cutoff_expdGene_cvgRate = 0.1,
  cutoff_expdGene_sampleRate = 0.5,
  removeScaffolds = FALSE,
  BPPARAM = NULL,
  which = NULL,
  ...
)

Arguments

sqlite_db

A path to the SQLite database for InPAS, i.e. the output of setup_sqlitedb().

TxDb

An object of GenomicFeatures::TxDb

edb

An object of ensembldb::EnsDb

genome

An object of BSgenome::BSgenome

cutoff_readsNum

cutoff reads number. If the coverage in the location is greater than cutoff_readsNum,the location will be treated as covered by signal

cutoff_expdGene_cvgRate

cutoff_expdGene_cvgRate and cutoff_expdGene_sampleRate are the parameters used to calculate which gene is expressed in all input dataset. cutoff_expdGene_cvgRateset the cutoff value for the coverage rate of each gene; cutoff_expdGene_sampleRateset the cutoff value for ratio of numbers of expressed and all samples for each gene. for example, by default, cutoff_expdGene_cvgRate=0.1 and cutoff_expdGene_sampleRate=0.5,suppose there are 4 samples, for one gene, if the coverage rates by base are:0.05, 0.12, 0.2, 0.17, this gene will be count as expressed gene because mean(c(0.05,0.12, 0.2, 0.17) > cutoff_expdGene_cvgRate) > cutoff_expdGene_sampleRate if the coverage rates by base are: 0.05, 0.12, 0.07, 0.17, this gene will be count as un-expressed gene because mean(c(0.05, 0.12, 0.07, 0.17) > cutoff_expdGene_cvgRate)<= cutoff_expdGene_sampleRate

cutoff_expdGene_sampleRate

See cutoff_expdGene_cvgRate

removeScaffolds

A logical(1) vector, whether the scaffolds should be removed from the genome If you use a TxDb containing alternative scaffolds, you'd better to remove the scaffolds.

BPPARAM

an optional BiocParallel::BiocParallelParam instance determining the parallel back-end to be used during evaluation, or a list of BiocParallelParam instances, to be applied in sequence for nested calls to bplapply. It can be set to NULL or bpparam()

which

an object of GenomicRanges::GRanges or NULL. If it is not NULL, only the exons overlapping the given ranges are used. For fast data quality control, set which to Granges for one or a few large chromosomes.

...

Not used yet

Value

A data frame with colnames: gene.coverage.rate: coverage per base for all genes, expressed.gene.coverage.rate: coverage per base for expressed genes, UTR3.coverage.rate: coverage per base for all 3' UTRs,UTR3.expressed.gene.subset.coverage. rate: coverage per base for 3' UTRs of expressed genes. and rownames: the names of coverage

Author(s)

Jianhong Ou, Haibo Liu

Examples

if (interactive()) {
   library("BSgenome.Mmusculus.UCSC.mm10")
   library("TxDb.Mmusculus.UCSC.mm10.knownGene")
   library("EnsDb.Mmusculus.v79")
   
   genome <- BSgenome.Mmusculus.UCSC.mm10
   TxDb <- TxDb.Mmusculus.UCSC.mm10.knownGene
   edb <- EnsDb.Mmusculus.v79
   
   bedgraphs <- system.file("extdata",c("Baf3.extract.bedgraph",
                                        "UM15.extract.bedgraph"), 
                           package = "InPAS")
   tags <- c("Baf3", "UM15")
   metadata <- data.frame(tag = tags, 
                          condition = c("Baf3", "UM15"),
                          bedgraph_file = bedgraphs)
   outdir = tempdir()
   write.table(metadata, file =file.path(outdir, "metadata.txt"), 
               sep = "\t", quote = FALSE, row.names = FALSE)
   
   sqlite_db <- setup_sqlitedb(metadata = file.path(outdir, 
                                                    "metadata.txt"),
                               outdir)
   coverage <- list()
   for (i in seq_along(bedgraphs)){
   coverage[[tags[i]]] <- get_ssRleCov(bedgraph = bedgraphs[i],
                            tag = tags[i],
                            genome = genome,
                            sqlite_db = sqlite_db,
                            outdir = outdir,
                            removeScaffolds = TRUE,
                            BPPARAM = NULL)
   }
   coverage_files <- assemble_allCov(sqlite_db, 
                                    outdir, 
                                    genome, 
                                    removeScaffolds = FALSE)
   run_coverageQC(sqlite_db, TxDb, edb, genome,
                  removeScaffolds = TRUE,
                  which = GRanges("chr6",
                     ranges = IRanges(98013000, 140678000)))
}

[Package InPAS version 2.2.0 Index]