dba.normalize {DiffBind} | R Documentation |
Specify parameters for normalizing a dataset;
calculate library sizes and normalization factors.
Description
Enables normalization of datasets using a variety of methods,
including background, spike-in, and parallel factor normalization.
Alternatively, allows a user to specify library sizes
and normalization factors directly, or retrieve computed ones.
Usage
dba.normalize(DBA, method = DBA$config$AnalysisMethod,
normalize = DBA_NORM_DEFAULT, library = DBA_LIBSIZE_DEFAULT,
background = FALSE, spikein = FALSE, offsets = FALSE,
libFun=mean, bRetrieve=FALSE, ...)
Arguments
DBA |
DBA object that includes count data for a consensus peakset.
|
method |
Underlying method, or vector of methods, for which to normalize.
Supported methods:
|
normalize |
Either user-supplied normalization factors in a numeric vector,
or a specification of a method to use to calculate normalization factors.
Methods can be specified using one of the following:
DBA_NORM_RLE ("RLE")
RLE normalization (native to DBA_DESEQ2 ,
and available for DBA_EDGER ).
DBA_NORM_TMM ("TMM")
TMM normalization (native to DBA_EDGER ,
and available for DBA_DESEQ2 ).
DBA_NORM_NATIVE ("native")
Use native method based on method :
DBA_NORM_RLE for DBA_DESEQ2 or
DBA_NORM_TMM for DBA_EDGER .
DBA_NORM_LIB ("lib")
Normalize by library size only.
Library sizes can be specified using the library parameter.
Normalization factors will be calculated to give each equal weight
in a manner appropriate for the analysis method .
See also the libFun parameter, which can be used to
scale the normalization factors for DESeq2.
DBA_NORM_DEFAULT ("default")
Default method: The "preferred" normalization approach
depending on method and whether an explicit
design is present. See Details below.
DBA_NORM_OFFSETS ("offsets")
Indicates that offsets have been specified using the offsets
parameter, and they should be used without alteration.
DBA_NORM_OFFSETS_ADJUST ("adjust offsets")
Indicates that offsets have been specified using the offsets
parameter, and they should be adjusted for library size and mean centering
before being used in a DBA_DESEQ2 analysis.
|
library |
Either user-supplied library sizes in a numeric vector,
or a specification of a method to use to calculate library sizes.
Library sizes can be based on one of the following:
DBA_LIBSIZE_FULL ("full")
Use the full library size (total number of reads in BAM/SAM/BED file)
DBA_LIBSIZE_PEAKREADS ("RiP")
Use the number of reads that overlap consensus peaks.
DBA_LIBSIZE_BACKGROUND ("background")
Use the total number of reads aligned to the chromosomes for which there is
at least one peak. This required a background bin calculation (see parameter
background ). These values are usually the same or similar to
DBA_LIBSIZE_FULL .
DBA_LIBSIZE_DEFAULT ("default")
Default method: The "preferred" library size
depending on method , background ,
and whether an explicit design is present.
See Details below.
|
background |
This parameter controls the option to use "background" bins, which
should not have differential enrichment between samples,
as the basis for normalizing (instead of using reads counts
overlapping consensus peaks).
When enabled, the chromosomes for which there are peaks
in the consensus peakset are tiled into large bins and reads
overlapping these bins are counted.
If present, background can either be a logical value, a numeric value,
or a previously computed $background object.
If background is a logical value and set to TRUE , background bins will be
computed using the default bin size of 15000bp.
Setting this value to FALSE will prevent background mode
from being used in any default settings.
If background is a numeric value, it will be used as the bin size.
If background is a previously computed
$background object, these counts will be used as the background.
A $background object can be obtained by calling
dba.normalize with bRetrieve=TRUE
and method=DBA_ALL_METHODS .
After counting (or setting) background bins,
both the normalize and library parameters
will be used to determine how the
final normalization factors are calculated.
If background is missing, it will be set to TRUE if
library=DBA_LIBSIZE_BACKGROUND , or if
library=DBA_LIBSIZE_DEFAULT and certain conditions are met
(see Details below).
If background is not FALSE , then
the library size will be set to library=DBA_LIBSIZE_BACKGROUND
|
spikein |
Either a logical value, a character vector of chromosome names,
a GRanges object containing peaks for a parallel factor,
or a $background object containing previously computed
spike-in read counts.
If spikein is a logical value set to FALSE ,
no spike-in normalization is performed.
If spikein is a logical value set to TRUE ,
background normalization is performed using spike-in tracks.
There must be a spike-in track for each sample.
see dba and/or dba.peakset for
details on how to include a spike-in track with a sample
(eg. by including a Spikein column in the sample sheet.)
All chromosomes in the spike-in bam files will be used.
If spikein is a character vector of one or more chromosome names,
only reads on the named chromosome(s) will be used for background normalization.
If spike-in tracks are available, reads on chromosomes with these names in the
spike-in track will be counted.
If no spike-in tracks are available, reads on chromosomes with these names
in the main bamReads bam files will be counted.
If spikein is a GRanges object containing peaks
for a parallel factor, then background normalization is performed counting reads
in the spike-in tracks overlapping peaks in this object.
If spikein is a previously computed
$background object, these counts will be used as the spikein background.
A $background object can be obtained by calling
dba.normalize with bRetrieve=TRUE
and method=DBA_ALL_METHODS .
Note that if spikein is not FALSE , then
the library size will be set to library=DBA_LIBSIZE_BACKGROUND
|
offsets |
This parameter controls the use of offsets (matrix of normalization factors)
instead of a single normalization factor for each sample. It can either
be a logical value, a matrix , or a SummarizedExperiment .
If it is a logical value and set to FALSE ,
no offsets will be computed or used. A value of TRUE
indicates that an offset matrix should be computed using a loess fit.
Alternatively, user-calculated normalization offsets can be supplied
as a matrix or as a SummarizedExperiment
(containing an assay named "offsets").
In this case, the user may also set the
normalize parameter to indicate whether the
offsets should be applied as-is to a DESeq2 analysis
(DBA_NORM_OFFSETS , default),
or if they should be adjusted for library size and mean centering
(DBA_NORM_OFFSETS_ADJUST ).
|
libFun |
When normalize=DBA_NORM_LIB , normalization factors are
calculated by dividing the library sizes for each sample by
a common denominator, obtained by applying libFun
to the vector of library sizes.
For method=DBA_EDGER , the normalization factors are further
adjusted so as to make all the effective library sizes (library sizes multiplied by normalization factors) the same, and adjusted to multiply to 1.
|
bRetrieve |
If set to TRUE , information about the current normalization will be
returned.
The only other relevant parameter in this case is the method .
If method=DBA_DESEQ2 or method=DBA_EDGER ,
a record will be returned including normalization values
for the appropriate analysis method. This record is a list
consists of the following elements:
$norm.method
A character string corresponding to the normalization method,
generally one of the values that can be supplied as a value to
normalize .
$norm.factors
A vector containing the computed normalization factors.
$lib.method
A character string corresponding to the value of the method used to
calculate the library size,
generally one of the values that can be supplied as a value to
library .
$lib.sizes
A vector containing the computed library sizes.
$background
If the normalization if based on binned background reads,
this field will be TRUE .
$control.subtract
If control reads were subtracted from the read counts,
this field will be TRUE .
If method=DBA_ALL_METHODS , the record be a list
with one of the above records for each method for
which normalization factors have been computed
($DESeq2 and edgeR ).
If background bins have been calculated,
this will include an element called $background .
This element can be passed in as the value to background or
spikein to re-use a previously computed set of reads.
It contains three subfields:
$background$binned
a SummarizedExperiment object containing the binned counts.
$background$bin.size
a numeric value with the bin size used.
$background$back.calc
character string indicating how the background was calculated
(bins, spike-ins, or parallel factor).
If offsets are available,
this will include an element called $offsets with two
subfields:
$offsets$offsets
a matrix or a SummarizedExperiment object
containing the offsets.
offsets$offset.method
a character string indicating the source of the offsets, either
"loess" or "user" .
|
... |
Extra parameters to be passed to limma::loessFit
when computing offsets.
|
Details
The default normalization parameters are as follows:
normalize=DBA_NORM_LIB
library=DBA_LIBSIZE_FULL
background=FALSE
If background=TRUE
, then the default becomes
library=DBA_LIBSIZE_BACKGROUND
.
If dba.contrast
has been
used to set up contrasts with design=FALSE
(pre-3.0 mode),
then the defaults are:
In this case, normalize=DBA_NORM_LIB
will be set for
method=DBA_DESEQ2
for backwards compatibility.
Value
Either a DBA
object with normalization terms added,
or (if bRetrieve=TRUE
), a record
or normalization details.
Note
The csaw
package is used to compute
background
bins and offsets
based on
limma::loessFit
.
See the DiffBind
vignette for technical details of how this
is done, and the csaw
vignette for details on
background bins and loess offsets can be used to address
different biases in ChIP-seq data.
Author(s)
Rory Stark
See Also
dba.count
, dba.analyze
, dba.save
Examples
# load DBA object with counts
data(tamoxifen_counts)
tamoxifen <- dba.contrast(tamoxifen,design="~Tissue + Condition")
# default normalization: Full library sizes
tamoxifen <- dba.normalize(tamoxifen)
dba.normalize(tamoxifen, bRetrieve=TRUE)
dba.analyze(tamoxifen)
# RLE/TMM using Reads in Peaks
tamoxifen <- dba.normalize(tamoxifen, method=DBA_ALL_METHODS,
normalize=DBA_NORM_NATIVE,
library=DBA_LIBSIZE_PEAKREADS)
dba.normalize(tamoxifen, method=DBA_DESEQ2, bRetrieve=TRUE)
dba.normalize(tamoxifen, method=DBA_EDGER, bRetrieve=TRUE)
tamoxifen <- dba.analyze(tamoxifen, method=DBA_ALL_METHODS)
dba.show(tamoxifen,bContrasts=TRUE)
dba.plotVenn(tamoxifen,contrast=1,method=DBA_ALL_METHODS,bDB=TRUE)
# TMM in Background using precomputed background
norm <- dba.normalize(tamoxifen,method=DBA_ALL_METHODS,bRetrieve=TRUE)
tamoxifen <- dba.normalize(tamoxifen, background=norm$background,
normalize="TMM", method=DBA_ALL_METHODS)
tamoxifen <- dba.analyze(tamoxifen)
dba.show(tamoxifen,bContrasts=TRUE)
dba.plotMA(tamoxifen)
# LOESS offsets
tamoxifen <- dba.normalize(tamoxifen, method=DBA_ALL_METHODS, offsets=TRUE)
tamoxifen <- dba.analyze(tamoxifen, method=DBA_ALL_METHODS)
dba.show(tamoxifen,bContrasts=TRUE)
par(mfrow=c(3,1))
dba.plotMA(tamoxifen,th=0,bNormalized=FALSE)
dba.plotMA(tamoxifen,method=DBA_DESEQ2)
dba.plotMA(tamoxifen,method=DBA_EDGER)
[Package
DiffBind version 3.0.15
Index]