BadRegionFinder-package {BadRegionFinder}R Documentation

BadRegionFinder: an R/Bioconductor package for identifying regions with bad coverage

Description

BadRegionFinder is a package for identifying regions with a bad, acceptable and good coverage in sequence alignment data available as bam files. The whole genome may be considered as well as a set of target regions. Various visual and textual types of output are available.

Details

Package: BadRegionFinder
Type: Package
Title: BadRegionFinder: an R/Bioconductor package for identifying regions with bad coverage
Version: 1.18.0
Date: 2016-03-07
Author: Sarah Sandmann
Maintainer: Sarah Sandmann <sarah.sandmann@uni-muenster.de>
Description: BadRegionFinder is a package for identifying regions with a bad, acceptable and good coverage in sequence alignment data available as bam files. The whole genome may be considered as well as a set of target regions. Various visual and textual types of output are available.
License: LGPL-3
Imports: VariantAnnotation, Rsamtools, biomaRt, GenomicRanges, S4Vectors, utils, stats, grDevices, graphics
Suggests: BSgenome.Hsapiens.UCSC.hg19
biocViews: Coverage, Sequencing, Alignment, WholeGenome, Classification
NeedsCompilation: no
git_url: https://git.bioconductor.org/packages/BadRegionFinder
git_branch: RELEASE_3_12
git_last_commit: 54deba4
git_last_commit_date: 2020-10-27
Date/Publication: 2020-10-27

In the use case of targeted sequencing it is most important to design the set of used primers in a way that the targeted regions are sequenced with a sufficient coverage. Yet, due to e.g. high GC-content the aimed at coverage may not always be obtained. Thus, a tool performing a detailed coverage analysis comparing many samples at a time – and not considering all available samples individually – appears to be most useful. Furthermore, with regards to reads mapping off target, it seems helpful to have a tool for investigating those regions, which show a relatively high coverage, but which were not originally targeted.

BadRegionFinder is a package for classifying a selection of regions or the whole genome into the user-definable categories of bad, acceptable and good coverage in any sequence alignment data available as bam files. Various visual and textual types of output are available including detailed output files considering every base that is or should be covered and an overview file considering the coverage of the different genes that were targeted.

Index of help topics:

BadRegionFinder-package
                        BadRegionFinder: an R/Bioconductor package for
                        identifying regions with bad coverage
determineCoverage       Determines the coverage (recommended for
                        whole-genome analyses)
determineCoverageQuality
                        Classifies the determined coverage
determineQuantiles      Determines basewise user-defined quantiles
determineRegionsOfInterest
                        Determines the regions of interest
plotDetailed            Plots a more detailed overview of the coverage
                        quality
plotSummary             Plots a summary of the coverage quality
plotSummaryGenes        Plots a summary of the coverage quality
                        concerning the genes only
reportBadRegionsDetailed
                        Gives a detailed report on the coverage quality
reportBadRegionsGenes   Sums up the coverage quality on a gene basis
reportBadRegionsSummary
                        Sums up the coverage quality

The package contains a function performing the coverage determination - determineCoverage (switch for whole-genome- and target region analyses). The actual classification of the coverage is performed by the function determineCoverageQuality. If any subsets of regions are of interest, these may be selected by the function determineRegionsOfInterest.

There are three different forms of textual reports available: a summary variant (reportBadRegionsSummary), a detailed variant (reportBadRegionsDetailed) and a summary variant focussing on the coverage of the genes (reportBadRegionsGenes).

Furthermore, there exist three different forms of visual reports: a summary variant (plotSummary), a detailed variant (plotDetailed) and a summary variant visualizing the coverage of the genes as a barplot (plotSummaryGenes).

Additionally, BadRegionFinder may be used to determine user-definable, basewise quantiles over all samples at any position (determineQuantiles).

Author(s)

Sarah Sandmann

Maintainer: Sarah Sandmann <sarah.sandmann@uni-muenster.de>

References

More information on the bam format can be found at: http://samtools.github.io/hts-specs/SAMv1.pdf

See Also

determineCoverage, determineCoverageQuality, determineRegionsOfInterest, reportBadRegionsSummary, reportBadRegionsDetailed, reportBadRegionsGenes, plotSummary, plotDetailed, plotSummaryGenes, determineQuantiles

Examples

library("BSgenome.Hsapiens.UCSC.hg19")

threshold1 <- 20
threshold2 <- 100
percentage1 <- 0.80
percentage2 <- 0.90
sample_file <- system.file("extdata", "SampleNames.txt", 
                           package = "BadRegionFinder")
samples <- read.table(sample_file)
bam_input <- system.file("extdata", package = "BadRegionFinder")
output <- system.file("extdata", package = "BadRegionFinder")
target_regions <- system.file("extdata", "targetRegions.bed",
                              package = "BadRegionFinder")
targetRegions <- read.table(target_regions, header = FALSE,
                            stringsAsFactors = FALSE)

coverage_summary <- determineCoverage(samples, bam_input, targetRegions, 
                                      output, TRonly = FALSE)
coverage_indicators <- determineCoverageQuality(threshold1, threshold2,
                                                percentage1, percentage2,
                                                coverage_summary)
badCoverageSummary <- reportBadRegionsSummary(threshold1, threshold2, 
                                              percentage1, percentage2,
                                              coverage_indicators, "", output)
coverage_indicators_temp <- reportBadRegionsDetailed(threshold1, threshold2,
                                                     percentage1, percentage2,
                                                     coverage_indicators, "",
                                                     samples, output)
badCoverageOverview <- reportBadRegionsGenes(threshold1, threshold2, percentage1,
                                            percentage2, badCoverageSummary,
                                            output)

plotSummary(threshold1, threshold2, percentage1, percentage2,
            badCoverageSummary, output)
plotDetailed(threshold1, threshold2, percentage1, percentage2,
             coverage_indicators_temp, output)
plotSummaryGenes(threshold1, threshold2, percentage1, percentage2,
                 badCoverageOverview, output)

quantiles <- c(0.5)
coverage_summary2 <- determineQuantiles(coverage_summary, quantiles, output)


[Package BadRegionFinder version 1.18.0 Index]