regressBatches {batchelor} | R Documentation |
Fit a linear model to regress out uninteresting factors of variation.
regressBatches( ..., batch = NULL, restrict = NULL, subset.row = NULL, assay.type = "logcounts" )
... |
Two or more log-expression matrices where genes correspond to rows and cells correspond to columns. Each matrix should contain the same number of rows, corresponding to the same genes (in the same order). Alternatively, one or more SingleCellExperiment objects can be supplied containing a count matrix in the If multiple objects are supplied, each object is assumed to contain all and only cells from a single batch.
Objects of different types can be mixed together.
If a single object is supplied, |
batch |
A factor specifying the batch of origin for all cells when only a single object is supplied in |
restrict |
A list of length equal to the number of objects in |
subset.row |
A vector specifying which features to use for correction. |
assay.type |
A string or integer scalar specifying the assay containing the log-expression values, if SingleCellExperiment objects are present in |
This function fits a linear model to the log-expression values for each gene and returns the residuals. The model is parameterized as a one-way layout with the batch of origin, so the residuals represent the expression values after correcting for the batch effect.
The novelty of this function is that it returns a ResidualMatrix in as the "corrected"
assay.
This avoids explicitly computing the residuals, which would result in a loss of sparsity or similar problems.
Rather, the residuals are either computed as needed or are never explicitly computed as all (e.g., during matrix multiplication).
All genes are used with the default setting of subset.row=NULL
.
Users can set subset.row
to subset the inputs, though this is purely for convenience as each gene is processed independently of other genes.
See ?"batchelor-restrict"
for a description of the restrict
argument.
Specifically, this function will compute the model coefficients using only the specified subset of cells.
The regression will then be applied to all cells in each batch.
A SingleCellExperiment object containing the corrected
assay.
This contains corrected log-expression values for each gene (row) in each cell (column) in each batch.
A batch
field is present in the column data, specifying the batch of origin for each cell.
Cells in the output object are always ordered in the same manner as supplied in ...
.
For a single input object, cells will be reported in the same order as they are arranged in that object.
In cases with multiple input objects, the cell identities are simply concatenated from successive objects,
i.e., all cells from the first object (in their provided order), then all cells from the second object, and so on.
Aaron Lun
rescaleBatches
, for another approach to regressing out the batch effect.
means <- 2^rgamma(1000, 2, 1) A1 <- matrix(rpois(10000, lambda=means), ncol=50) # Batch 1 A2 <- matrix(rpois(10000, lambda=means*runif(1000, 0, 2)), ncol=50) # Batch 2 B1 <- log2(A1 + 1) B2 <- log2(A2 + 1) out <- regressBatches(B1, B2)