downsampleBatches {DropletUtils}R Documentation

Downsample batches to equal coverage

Description

A convenience function to downsample all batches so that the average per-cell total count is the same across batches. This mimics the downsampling functionality of cellranger aggr.

Usage

downsampleBatches(..., batch = NULL, method = c("median", "mean",
  "geomean"), bycol = TRUE)

Arguments

...

Two or more count matrices, where each matrix represents data from a separate batch. Each matrix should already have been filtered to only contain cells, e.g., using emptyDrops.

Alternatively, a single filtered count matix containing cells from all batches, in which case batch should be specified.

batch

A factor of length equal to the number of columns in the sole entry of ..., specifying the batch of origin for each column of the matrix. Ignored if there are multiple entries in ....

method

String indicating how the average total should be computed. The geometric mean is computed with a pseudo-count of 1.

bycol

A logical scalar indicating whether downsampling should be performed on a column-by-column basis, see ?downsampleMatrix for more details.

Details

Downsampling batches with strong differences in sequencing coverage can make it easier to compare them to each other, reducing the burden on the normalization and batch correction steps. This is especially true when the number of cells cannot be easily controlled across batches, resulting in large differences in per-cell coverage even when the total sequencing depth is the same.

Value

If ... contains two or more matrices, a List of downsampled matrices is returned.

Otherwise, if ... contains only one matrix, the downsampled matrix is returned directly.

Author(s)

Aaron Lun

See Also

downsampleMatrix, which is called by this function under the hood.

Examples

# Mocking up some 10X genomics output.
example(write10xCounts, echo=FALSE)
sce10x <- read10xCounts(tmpdir)

# Making another copy with fewer counts:
sce10x2 <- sce10x
counts(sce10x2) <- round(counts(sce10x2)/2)

# Downsampling for multiple batches in a single matrix:
combined <- cbind(sce10x, sce10x2)
batches <- rep(1:2, c(ncol(sce10x), ncol(sce10x2)))
downsampled <- downsampleBatches(counts(combined), batch=batches)
downsampled[1:10,1:10]

# Downsampling for multiple matrices:
downsampled2 <- downsampleBatches(counts(sce10x), counts(sce10x2))
downsampled2


[Package DropletUtils version 1.6.1 Index]