regressBatches {batchelor}R Documentation

Regress out batch effects

Description

Fit a linear model to regress out uninteresting factors of variation.

Usage

regressBatches(
  ...,
  batch = NULL,
  restrict = NULL,
  subset.row = NULL,
  assay.type = "logcounts"
)

Arguments

...

Two or more log-expression matrices where genes correspond to rows and cells correspond to columns. Each matrix should contain the same number of rows, corresponding to the same genes (in the same order).

Alternatively, one or more SingleCellExperiment objects can be supplied containing a count matrix in the assay.type assay. Note the same restrictions described above for gene expression matrix inputs.

If multiple objects are supplied, each object is assumed to contain all and only cells from a single batch. Objects of different types can be mixed together. If a single object is supplied, batch should also be specified.

batch

A factor specifying the batch of origin for all cells when only a single object is supplied in .... This is ignored if multiple objects are present.

restrict

A list of length equal to the number of objects in .... Each entry of the list corresponds to one batch and specifies the cells to use when computing the correction.

subset.row

A vector specifying which features to use for correction.

assay.type

A string or integer scalar specifying the assay containing the log-expression values, if SingleCellExperiment objects are present in ....

Details

This function fits a linear model to the log-expression values for each gene and returns the residuals. The model is parameterized as a one-way layout with the batch of origin, so the residuals represent the expression values after correcting for the batch effect.

The novelty of this function is that it returns a ResidualMatrix in as the "corrected" assay. This avoids explicitly computing the residuals, which would result in a loss of sparsity or similar problems. Rather, the residuals are either computed as needed or are never explicitly computed as all (e.g., during matrix multiplication).

All genes are used with the default setting of subset.row=NULL. Users can set subset.row to subset the inputs, though this is purely for convenience as each gene is processed independently of other genes.

See ?"batchelor-restrict" for a description of the restrict argument. Specifically, this function will compute the model coefficients using only the specified subset of cells. The regression will then be applied to all cells in each batch.

Value

A SingleCellExperiment object containing the corrected assay. This contains corrected log-expression values for each gene (row) in each cell (column) in each batch. A batch field is present in the column data, specifying the batch of origin for each cell.

Cells in the output object are always ordered in the same manner as supplied in .... For a single input object, cells will be reported in the same order as they are arranged in that object. In cases with multiple input objects, the cell identities are simply concatenated from successive objects, i.e., all cells from the first object (in their provided order), then all cells from the second object, and so on.

Author(s)

Aaron Lun

See Also

rescaleBatches, for another approach to regressing out the batch effect.

Examples

means <- 2^rgamma(1000, 2, 1)
A1 <- matrix(rpois(10000, lambda=means), ncol=50) # Batch 1 
A2 <- matrix(rpois(10000, lambda=means*runif(1000, 0, 2)), ncol=50) # Batch 2

B1 <- log2(A1 + 1)
B2 <- log2(A2 + 1)
out <- regressBatches(B1, B2) 


[Package batchelor version 1.2.4 Index]