prepSim {muscat}R Documentation

SCE preparation for simData

Description

prepSim prepares an input SCE for simulation with muscat's simData function by

  1. basic filtering of genes and cells

  2. (optional) filtering of subpopulation-sample instances

  3. estimation of cell (library sizes) and gene parameters (dispersions and sample-specific means), respectively.

Usage

prepSim(
  x,
  min_count = 1,
  min_cells = 10,
  min_genes = 100,
  min_size = 100,
  group_keep = NULL,
  verbose = TRUE
)

Arguments

x

a SingleCellExperiment.

min_count, min_cells

used for filtering of genes; only genes with a count > min_count in >= min_cells will be retained.

min_genes

used for filtering cells; only cells with a count > 0 in >= min_genes will be retained.

min_size

used for filtering subpopulation-sample combinations; only instances with >= min_size cells will be retained. Specifying min_size = NULL skips this step.

group_keep

character string; if nlevels(x$group_id) > 1, specifies which group of samples to keep (see details). The default NULL retains samples from levels(x$group_id)[1]; otherwise, if 'colData(x)$group_id' is not specified, all samples will be kept.

verbose

logical; should information on progress be reported?

Details

For each gene g, prepSim fits a model to estimate sample-specific means β_g^s, for each sample s, and dispersion parameters φ_g using edgeR's estimateDisp function with default parameters. Thus, the reference count data is modeled as NB distributed:

Y_{gc} \sim NB(μ_{gc}, φ_g)

for gene g and cell c, where the mean μ_{gc} = \exp(β_{g}^{s(c)}) \cdot λ_c. Here, β_{g}^{s(c)} is the relative abundance of gene g in sample s(c), λ_c is the library size (total number of counts), and φ_g is the dispersion.

Value

a SingleCellExperiment containing, for each cell, library size (colData(x)$offset) and, for each gene, dispersion and sample-specific mean estimates (rowData(x)$dispersion and $beta.sample_id, respectively).

Author(s)

Helena L Crowell

References

Crowell, HL, Soneson, C, Germain, P-L, Calini, D, Collin, L, Raposo, C, Malhotra, D & Robinson, MD: On the discovery of population-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data. bioRxiv 713412 (2018). doi: https://doi.org/10.1101/713412

Examples

# estimate simulation parameters
data(example_sce)
ref <- prepSim(example_sce)

# tabulate number of genes/cells before vs. after
ns <- cbind(
  before = dim(example_sce), 
  after = dim(ref)) 
rownames(ns) <- c("#genes", "#cells")
ns

library(SingleCellExperiment)
head(rowData(ref)) # gene parameters
head(colData(ref)) # cell parameters


[Package muscat version 1.8.2 Index]