aggregateData {muscat} | R Documentation |
...
aggregateData( x, assay = NULL, by = c("cluster_id", "sample_id"), fun = c("sum", "mean", "median"), scale = FALSE )
x |
|
assay |
character string specifying the assay slot to use as
input data. Defaults to the 1st available ( |
by |
character vector specifying which
|
fun |
a character string. Specifies the function to use as summary statistic. |
scale |
logical. Should pseudo-bulks be scaled with the effective library size & multiplied by 1M? |
If length(by) == 2
, each sheet (assay
) contains
pseudobulks for each of by[1]
, e.g., for each cluster when
by = "cluster_id"
. Rows correspond to genes, columns to
by[2]
, e.g., samples when by = "sample_id"
.
If length(by) == 1
, the returned SCE will contain only
a single assay
with rows = genes and colums = by
.
Aggregation parameters (assay, by, fun, scaled
) are stored in
metadata()$agg_pars
, and the number of cells that were aggregated
are accessible in metadata()$n_cells
.
Helena L Crowell & Mark D Robinson
Crowell, HL, Soneson, C, Germain, P-L, Calini, D, Collin, L, Raposo, C, Malhotra, D & Robinson, MD: On the discovery of population-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data. bioRxiv 713412 (2018). doi: https://doi.org/10.1101/713412
data(sce) library(SingleCellExperiment) # pseudobulk counts by cluster-sample pb <- aggregateData(sce) assayNames(sce) # one sheet per cluster head(assay(sce)) # n_genes x n_samples # scaled CPM assays(sce)$cpm <- edgeR::cpm(assay(sce)) pb <- aggregateData(sce, assay = "cpm", scale = TRUE) head(assay(pb)) # aggregate by cluster only pb <- aggregateData(sce, by = "cluster_id") length(assays(pb)) # single assay head(assay(pb)) # n_genes x n_clusters