generateBedEcdf {epialleleR} | R Documentation |
This function computes empirical cumulative distribution functions (eCDF) for per-read beta values of the sequencing reads.
generateBedEcdf( bam, bed, bed.type = c("amplicon", "capture"), bed.rows = c(1), zero.based.bed = FALSE, match.tolerance = 1, match.min.overlap = 1, ecdf.context = c("CG", "CHG", "CHH", "CxG", "CX"), min.mapq = 0, skip.duplicates = FALSE, verbose = TRUE )
bam |
BAM file location string OR preprocessed output of
|
bed |
Browser Extensible Data (BED) file location string OR object of
class |
bed.type |
character string for the type of assay that was used to produce sequencing reads:
|
bed.rows |
integer vector specifying what 'bed' regions should be included in the output. If 'c(1)' (the default), then function returns eCDFs for the first region of 'bed', if NULL - eCDF functions for all 'bed' genomic regions as well as for the reads that didn't match any of the regions (last element of the return value; only if there are such reads). |
zero.based.bed |
boolean defining if BED coordinates are zero based (default: FALSE). |
match.tolerance |
integer for the largest difference between read's and
BED |
match.min.overlap |
integer for the smallest overlap between read's and
BED |
ecdf.context |
string defining cytosine methylation context used for computing within-the-context and out-of-context eCDFs:
|
min.mapq |
non-negative integer threshold for minimum read mapping quality (default: 0). Option has no effect if preprocessed BAM data was supplied as an input. |
skip.duplicates |
boolean defining if duplicate aligned reads should be skipped (default: FALSE). Option has no effect if preprocessed BAM data was supplied as an input OR duplicate reads were not marked by alignment software. |
verbose |
boolean to report progress and timings (default: TRUE). |
The function matches reads (or read pairs as a single entity) to the genomic
regions provided in a BED file/GRanges
object, computes
average per-read beta values according to the cytosine context parameter
'ecdf.context', and returns a list of eCDFs for within- and out-of-context
average per-read beta values, which can be used for plotting.
The resulting eCDFs and their plots can be used to characterise the methylation pattern of a particular genomic region, e.g. if reads that match to that region are methylated in an "all-or-none" manner or if some intermediate methylation levels are more frequent.
list with a number of elements equal to the length of 'bed.rows' (if not NULL), or to the number of genomic regions within 'bed' (if 'bed.rows==NULL') plus one item for all reads not matching 'bed' genomic regions (if any). Every list item is a list on it's own, consisting of two eCDF functions for within- and out-of-context per-read beta values.
preprocessBam
for preloading BAM data,
generateCytosineReport
for methylation statistics at the level
of individual cytosines, generateBedReport
for genomic
region-based statistics, generateVcfReport
for evaluating
epiallele-SNV associations, and 'epialleleR' vignettes for the description of
usage and sample data.
# amplicon data amplicon.bam <- system.file("extdata", "amplicon010meth.bam", package="epialleleR") amplicon.bed <- system.file("extdata", "amplicon.bed", package="epialleleR") # let's compute eCDF amplicon.ecdfs <- generateBedEcdf(bam=amplicon.bam, bed=amplicon.bed, bed.rows=NULL) # there are 5 items in amplicon.ecdfs, let's plot all of them par(mfrow=c(1,length(amplicon.ecdfs))) # cycle through items for (x in 1:length(amplicon.ecdfs)) { # four of them have names corresponding to amplicon.bed genomic regions, # fifth - NA for all the reads that don't match to any of those regions main <- if (is.na(names(amplicon.ecdfs[x]))) "unmatched" else names(amplicon.ecdfs[x]) # plotting eCDF for within-the-context per-read beta values (in red) plot(amplicon.ecdfs[[x]]$context, col="red", verticals=TRUE, do.points=FALSE, xlim=c(0,1), xlab="per-read beta value", ylab="cumulative density", main=main) # adding eCDF for out-of-context per-read beta values (in blue) plot(amplicon.ecdfs[[x]]$out.of.context, add=TRUE, col="blue", verticals=TRUE, do.points=FALSE) }