estimateSCV {XBSeq} | R Documentation |
Estimate squared coefficient of variation for each gene
Description
A similar method is applied to estimate the SCV for each gene based on the method used in DESeq
Usage
## S4 method for signature 'XBSeqDataSet'
estimateSCV( object, method = c( "pooled", "per-condition", "blind" ), sharingMode = c( "maximum", "fit-only", "gene-est-only" ),
fitType = c("local","parametric"),
locfit_extra_args=list(), lp_extra_args=list(), ... )
Arguments
object |
a XBSeqDataSet with size factors.
|
method |
There are three ways how the empirical dispersion can be computed:
-
pooled - Use the samples from all conditions with
replicates to estimate a single pooled empirical dispersion value,
called "pooled", and assign it to all samples.
-
per-condition - For each condition with replicates, compute
a gene's empirical dispersion value by considering the data from samples for this
condition. For samples of unreplicated conditions, the maximum
of empirical dispersion values from the other conditions is used.
-
blind - Ignore the sample labels and compute a
gene's empirical dispersion value as if all samples were replicates of a
single condition. This can be done even if there are no biological
replicates. This method can lead to loss of power.
|
sharingMode |
After the empirical dispersion values have been computed for each
gene, a dispersion-mean relationship is fitted for sharing
information across genes in order to reduce variability of the
dispersion estimates. After that, for each gene, we have two values: the
empirical value (derived only from this gene's data), and the
fitted value (i.e., the dispersion value typical for genes with an
average expression similar to those of this gene). The
sharingMode argument specifies which of these two values
will be written to the dispEst and hence will be used by the
functions XBSeqTest
-
fit-only - use only the fitted value, i.e., the
empirical value is used only as input to the fitting, and then
ignored. Use this only with very few replicates, and when
you are not too concerned about false positives from dispersion outliers, i.e. genes
with an unusually high variability.
-
maximum - take the maximum of the two values. This is
the conservative or prudent choice, recommended once you have at
least three or four replicates and maybe even with only two replicates.
-
gene-est-only - No fitting or sharing, use only the
empirical value. This method is preferable when the number of
replicates is large and the empirical dispersion values are
sufficiently reliable. If the number of replicates is small, this
option may lead to many cases where the dispersion
of a gene is accidentally underestimated and a false positive arises in
the subsequent testing.
|
fitType |
|
locfit_extra_args, lp_extra_args |
(only for fitType=local )
Options to be passed to the locfit and to the lp
function of the locfit package. Use this to adjust the local
fitting. For example, you may pass a value for nn different
from the default (0.7) if the fit seems too smooth or too rough by
setting lp_extra_agrs=list(nn=0.9) . As another example, you
can set locfit_extra_args=list(maxk=200) if you get the
error that locfit ran out of nodes. See the documentation of the
locfit package for details. In most cases, you will not
need to provide these parameters, as the defaults seem to work
quite well.
|
... |
extra arguments are ignored
|
Details
The details regarding which option to choose can be found in the DESeq help page. Generally
speaking, if you have less number of replicates (<=3), set method="pooled"
. Otherwise,
try method="per-condition"
. We revised the code to estimate the variance of the true
signal by using variance sum law rather than calculate the variance directly.
Value
The XBSeqDataSet
cds, with the slots fitInfo
and
dispEst
updated.
Author(s)
Yuanhang Liu
References
H. I. Chen, Y. Liu, Y. Zou, Z. Lai, D. Sarkar, Y. Huang, et al.,
"Differential expression analysis of RNA sequencing data by
incorporating non-exonic mapped reads," BMC Genomics, vol. 16
Suppl 7, p. S14, Jun 11 2015.
See Also
XBSeqDataSet
Examples
conditions <- factor(c(rep('C1', 3), rep('C2', 3)))
data(ExampleData)
XB <- XBSeqDataSet(Observed, Background, conditions)
XB <- estimateRealCount(XB)
XB <- estimateSizeFactors(XB)
XB <- estimateSCV(XB, fitType='local')
str(fitInfo(XB))
[Package
XBSeq version 1.22.0
Index]