regressBatches {batchelor}R Documentation

Regress out batch effects

Description

Fit a linear model to regress out uninteresting factors of variation.

Usage

regressBatches(
  ...,
  batch = NULL,
  design = NULL,
  restrict = NULL,
  subset.row = NULL,
  assay.type = "logcounts"
)

Arguments

...

One or more log-expression matrices where genes correspond to rows and cells correspond to columns. Alternatively, one or more SingleCellExperiment objects can be supplied containing a log-expression matrix in the assay.type assay. Each object should contain the same number of rows, corresponding to the same genes in the same order. Objects of different types can be mixed together.

If multiple objects are supplied, each object is assumed to contain all and only cells from a single batch. If a single object is supplied, it is assumed to contain cells from all batches, so batch should also be specified.

Alternatively, one or more lists of matrices or SingleCellExperiments can be provided; this is flattened as if the objects inside each list were passed directly to ....

batch

A factor specifying the batch of origin for all cells when only a single object is supplied in .... This is ignored if multiple objects are present.

design

A numeric design matrix with number of rows equal to the total number of cells, specifying the experimental factors to remove. Each row corresponds to a cell in the order supplied in ....

restrict

A list of length equal to the number of objects in .... Each entry of the list corresponds to one batch and specifies the cells to use when computing the correction.

subset.row

A vector specifying which features to use for correction.

assay.type

A string or integer scalar specifying the assay containing the log-expression values. Only used for SingleCellExperiment inputs.

Details

This function fits a linear model to the log-expression values for each gene and returns the residuals. By default, the model is parameterized as a one-way layout with the batch of origin, so the residuals represent the expression values after correcting for the batch effect. More complex designs should be explicitly specified with the design argument, e.g., to regress out a covariate,

The novelty of this function is that it returns a ResidualMatrix in as the "corrected" assay. This avoids explicitly computing the residuals, which would result in a loss of sparsity or similar problems. Rather, the residuals are either computed as needed or are never explicitly computed as all (e.g., during matrix multiplication). This means that regressBatches is faster and lighter than naive regression or even rescaleBatches.

Like rescaleBatches, this function assumes that the uninteresting factors described in design are orthogonal to the interesting factors of variation. For example, each batch is assumed to have the same composition of cell types. In the continuous case (e.g., regression of cell cycle), the assumption is that all cell types are similarly distributed across cell cycle phases. If this is not true, the correction will not only be incomplete but can introduce spurious differences.

All genes are used with the default setting of subset.row=NULL. Users can set subset.row to subset the inputs, though this is purely for convenience as each gene is processed independently of other genes.

See ?"batchelor-restrict" for a description of the restrict argument. Specifically, this function will compute the model coefficients using only the specified subset of cells. The regression will then be applied to all cells in each batch.

Value

A SingleCellExperiment object containing the corrected assay. This contains corrected log-expression values for each gene (row) in each cell (column) in each batch. A batch field is present in the column data, specifying the batch of origin for each cell.

Cells in the output object are always ordered in the same manner as supplied in .... For a single input object, cells will be reported in the same order as they are arranged in that object. In cases with multiple input objects, the cell identities are simply concatenated from successive objects, i.e., all cells from the first object (in their provided order), then all cells from the second object, and so on.

Author(s)

Aaron Lun

See Also

rescaleBatches, for another approach to regressing out the batch effect.

Examples

means <- 2^rgamma(1000, 2, 1)
A1 <- matrix(rpois(10000, lambda=means), ncol=50) # Batch 1 
A2 <- matrix(rpois(10000, lambda=means*runif(1000, 0, 2)), ncol=50) # Batch 2

B1 <- log2(A1 + 1)
B2 <- log2(A2 + 1)
out <- regressBatches(B1, B2) 


[Package batchelor version 1.4.0 Index]