Sccomp is a generalised method for differential composition and variability analyses.
0.1 Characteristics
- Modelling counts
- Modelling proportionality
- Modelling cell-type specific variability
- Cell-type information share for variability shrinkage
- Testing differential variability
- Probabilistic outlier identification
- Cross-dataset learning (hyperpriors).
1 Installation
Bioconductor
if (!requireNamespace("BiocManager")) install.packages("BiocManager")
BiocManager::install("sccomp")
Github
2 Analysis
sccomp
can model changes in composition and variability. By default, the formula for variability is either ~1
, which assumes that the cell-group variability is independent of any covariate or ~ factor_of_interest
, which assumes that the model is dependent on the factor of interest only. The variability model must be a subset of the model for composition.
2.1 Binary factor
Of the output table, the estimate columns start with the prefix c_
indicate composition
, or with v_
indicate variability
(when formula_variability is set).
2.1.1 From Seurat, SingleCellExperiment, metadata objects
2.1.2 From counts
sccomp_result =
counts_obj |>
sccomp_estimate(
formula_composition = ~ type,
.sample = sample,
.cell_group = cell_group,
.count = count,
bimodal_mean_variability_association = TRUE,
cores = 1
) |>
sccomp_remove_outliers(cores = 1) |> # Optional
sccomp_test()
## Chain 1: ------------------------------------------------------------
## Chain 1: EXPERIMENTAL ALGORITHM:
## Chain 1: This procedure has not been thoroughly tested and may be unstable
## Chain 1: or buggy. The interface is subject to change.
## Chain 1: ------------------------------------------------------------
## Chain 1:
## Chain 1:
## Chain 1:
## Chain 1: Gradient evaluation took 0.000596 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 5.96 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Begin eta adaptation.
## Chain 1: Iteration: 1 / 250 [ 0%] (Adaptation)
## Chain 1: Iteration: 50 / 250 [ 20%] (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%] (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%] (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%] (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1:
## Chain 1: Begin stochastic gradient ascent.
## Chain 1: iter ELBO delta_ELBO_mean delta_ELBO_med notes
## Chain 1: 100 -4272.552 1.000 1.000
## Chain 1: 200 -3742.719 0.571 1.000
## Chain 1: 300 -3709.873 0.383 0.142
## Chain 1: 400 -3705.402 0.288 0.142
## Chain 1: 500 -3704.010 0.230 0.009 MEDIAN ELBO CONVERGED
## Chain 1:
## Chain 1: Drawing a sample of size 1000 from the approximate posterior...
## Chain 1: COMPLETED.
## Chain 1: ------------------------------------------------------------
## Chain 1: EXPERIMENTAL ALGORITHM:
## Chain 1: This procedure has not been thoroughly tested and may be unstable
## Chain 1: or buggy. The interface is subject to change.
## Chain 1: ------------------------------------------------------------
## Chain 1:
## Chain 1:
## Chain 1:
## Chain 1: Gradient evaluation took 0.000607 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 6.07 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Begin eta adaptation.
## Chain 1: Iteration: 1 / 250 [ 0%] (Adaptation)
## Chain 1: Iteration: 50 / 250 [ 20%] (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%] (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%] (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%] (Adaptation)
## Chain 1: Iteration: 250 / 250 [100%] (Adaptation)
## Chain 1: Success! Found best value [eta = 0.1].
## Chain 1:
## Chain 1: Begin stochastic gradient ascent.
## Chain 1: iter ELBO delta_ELBO_mean delta_ELBO_med notes
## Chain 1: 100 -4515.959 1.000 1.000
## Chain 1: 200 -4049.480 0.558 1.000
## Chain 1: 300 -3850.922 0.389 0.115
## Chain 1: 400 -3731.342 0.300 0.115
## Chain 1: 500 -3657.844 0.244 0.052
## Chain 1: 600 -3589.933 0.206 0.052
## Chain 1: 700 -3551.145 0.178 0.032
## Chain 1: 800 -3509.516 0.158 0.032
## Chain 1: 900 -3478.433 0.141 0.020
## Chain 1: 1000 -3457.034 0.128 0.020
## Chain 1: 1100 -3436.835 0.028 0.019
## Chain 1: 1200 -3421.675 0.017 0.012
## Chain 1: 1300 -3409.318 0.012 0.011
## Chain 1: 1400 -3397.661 0.009 0.009 MEAN ELBO CONVERGED MEDIAN ELBO CONVERGED
## Chain 1:
## Chain 1: Drawing a sample of size 1000 from the approximate posterior...
## Chain 1: COMPLETED.
## Chain 1: ------------------------------------------------------------
## Chain 1: EXPERIMENTAL ALGORITHM:
## Chain 1: This procedure has not been thoroughly tested and may be unstable
## Chain 1: or buggy. The interface is subject to change.
## Chain 1: ------------------------------------------------------------
## Chain 1:
## Chain 1:
## Chain 1:
## Chain 1: Gradient evaluation took 0.000558 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 5.58 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Begin eta adaptation.
## Chain 1: Iteration: 1 / 250 [ 0%] (Adaptation)
## Chain 1: Iteration: 50 / 250 [ 20%] (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%] (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%] (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%] (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1:
## Chain 1: Begin stochastic gradient ascent.
## Chain 1: iter ELBO delta_ELBO_mean delta_ELBO_med notes
## Chain 1: 100 -4058.648 1.000 1.000
## Chain 1: 200 -3722.507 0.545 1.000
## Chain 1: 300 -3713.569 0.364 0.090
## Chain 1: 400 -3705.909 0.274 0.090
## Chain 1: 500 -3703.828 0.219 0.002 MEDIAN ELBO CONVERGED
## Chain 1:
## Chain 1: Drawing a sample of size 1000 from the approximate posterior...
## Chain 1: COMPLETED.
2.2 Summary plots
## Joining with `by = join_by(cell_group, sample)`
## Joining with `by = join_by(cell_group, type)`
A plot of group proportion, faceted by groups. The blue boxplots represent the posterior predictive check. If the model is likely to be descriptively adequate to the data, the blue box plot should roughly overlay with the black box plot, which represents the observed data. The outliers are coloured in red. A box plot will be returned for every (discrete) covariate present in formula_composition
. The colour coding represents the significant associations for composition and/or variability.
## [[1]]
A plot of estimates of differential composition (c_) on the x-axis and differential variability (v_) on the y-axis. The error bars represent 95% credible intervals. The dashed lines represent the minimal effect that the hypothesis test is based on. An effect is labelled as significant if bigger than the minimal effect according to the 95% credible interval. Facets represent the covariates in the model.
We can plot the relationship between abundance and variability. As we can see below, they are positively correlated, you also appreciate that this relationship is by model for single cell RNA sequencing data.
sccomp
models, these relationship to obtain a shrinkage effect on the estimates of both the abundance and the variability. This shrinkage is adaptive as it is modelled jointly, thanks for Bayesian inference.
2.3 Contrasts
seurat_obj |>
sccomp_estimate(
formula_composition = ~ 0 + type,
.sample = sample,
.cell_group = cell_group,
bimodal_mean_variability_association = TRUE,
cores = 1
) |>
sccomp_test( contrasts = c("typecancer - typehealthy", "typehealthy - typecancer"))
## Chain 1: ------------------------------------------------------------
## Chain 1: EXPERIMENTAL ALGORITHM:
## Chain 1: This procedure has not been thoroughly tested and may be unstable
## Chain 1: or buggy. The interface is subject to change.
## Chain 1: ------------------------------------------------------------
## Chain 1:
## Chain 1:
## Chain 1:
## Chain 1: Gradient evaluation took 0.000484 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 4.84 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Begin eta adaptation.
## Chain 1: Iteration: 1 / 250 [ 0%] (Adaptation)
## Chain 1: Iteration: 50 / 250 [ 20%] (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%] (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%] (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%] (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1:
## Chain 1: Begin stochastic gradient ascent.
## Chain 1: iter ELBO delta_ELBO_mean delta_ELBO_med notes
## Chain 1: 100 -3439.210 1.000 1.000
## Chain 1: 200 -3165.372 0.543 1.000
## Chain 1: 300 -3164.012 0.362 0.087
## Chain 1: 400 -3157.586 0.272 0.087
## Chain 1: 500 -3157.611 0.218 0.002 MEDIAN ELBO CONVERGED
## Chain 1:
## Chain 1: Drawing a sample of size 1000 from the approximate posterior...
## Chain 1: COMPLETED.
2.4 Categorical factor (e.g. Bayesian ANOVA)
This is achieved through model comparison with loo
. In the following example, the model with association with factors better fits the data compared to the baseline model with no factor association. For comparisons check_outliers
must be set to FALSE as the leave-one-out must work with the same amount of data, while outlier elimination does not guarantee it.
If elpd_diff
is away from zero of > 5 se_diff
difference of 5, we are confident that a model is better than the other reference. In this case, -79.9 / 11.5 = -6.9, therefore we can conclude that model one, the one with factor association, is better than model two.
library(loo)
# Fit first model
model_with_factor_association =
seurat_obj |>
sccomp_estimate(
formula_composition = ~ type,
.sample = sample,
.cell_group = cell_group,
bimodal_mean_variability_association = TRUE,
cores = 1,
enable_loo = TRUE
)
# Fit second model
model_without_association =
seurat_obj |>
sccomp_estimate(
formula_composition = ~ 1,
.sample = sample,
.cell_group = cell_group,
bimodal_mean_variability_association = TRUE,
cores = 1 ,
enable_loo = TRUE
)
# Compare models
loo_compare(
model_with_factor_association |> attr("fit") |> loo(),
model_without_association |> attr("fit") |> loo()
)
2.5 Differential variability, binary factor
We can model the cell-group variability also dependent on the type, and so test differences in variability
res =
seurat_obj |>
sccomp_estimate(
formula_composition = ~ type,
formula_variability = ~ type,
.sample = sample,
.cell_group = cell_group,
bimodal_mean_variability_association = TRUE,
cores = 1
)
## Chain 1: ------------------------------------------------------------
## Chain 1: EXPERIMENTAL ALGORITHM:
## Chain 1: This procedure has not been thoroughly tested and may be unstable
## Chain 1: or buggy. The interface is subject to change.
## Chain 1: ------------------------------------------------------------
## Chain 1:
## Chain 1:
## Chain 1:
## Chain 1: Gradient evaluation took 0.000574 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 5.74 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Begin eta adaptation.
## Chain 1: Iteration: 1 / 250 [ 0%] (Adaptation)
## Chain 1: Iteration: 50 / 250 [ 20%] (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%] (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%] (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%] (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1:
## Chain 1: Begin stochastic gradient ascent.
## Chain 1: iter ELBO delta_ELBO_mean delta_ELBO_med notes
## Chain 1: 100 785.784 1.000 1.000
## Chain 1: 200 -164.750 3.385 5.770
## Chain 1: 300 -1710.301 2.558 1.000
## Chain 1: 400 -1451.632 1.963 1.000
## Chain 1: 500 -1808.996 1.610 0.904
## Chain 1: 600 -1251.184 1.416 0.904
## Chain 1: 700 621.878 1.644 0.904
## Chain 1: 800 383.331 1.516 0.904
## Chain 1: 900 -3072.361 1.473 0.904
## Chain 1: 1000 -3454.178 1.336 0.904
## Chain 1: 1100 -3185.728 1.245 0.622 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 1200 -3173.274 0.668 0.446 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 1300 -3165.925 0.578 0.198 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 1400 -3164.576 0.560 0.198 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 1500 -3166.030 0.541 0.111 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 1600 -3164.826 0.496 0.084
## Chain 1: 1700 -3165.258 0.195 0.004 MEDIAN ELBO CONVERGED
## Chain 1: Informational Message: The ELBO at a previous iteration is larger than the ELBO upon convergence!
## Chain 1: This variational approximation may not have converged to a good optimum.
## Chain 1:
## Chain 1: Drawing a sample of size 1000 from the approximate posterior...
## Chain 1: COMPLETED.
3 Suggested settings
3.1 For single-cell RNA sequencing
We recommend setting bimodal_mean_variability_association = TRUE
. The bimodality of the mean-variability association can be confirmed from the plots$credible_intervals_2D (see below).
3.2 For CyTOF and microbiome data
We recommend setting bimodal_mean_variability_association = FALSE
(Default).
3.3 Visualisation of the MCMC chains from the posterior distribution
It is possible to directly evaluate the posterior distribution. In this example, we plot the Monte Carlo chain for the slope parameter of the first cell type. We can see that it has converged and is negative with probability 1.
Plot 1D significance plot
## Joining with `by = join_by(cell_group, sample)`
## Joining with `by = join_by(cell_group, type)`
Plot 2D significance plot. Data points are cell groups. Error bars are the 95% credible interval. The dashed lines represent the default threshold fold change for which the probabilities (c_pH0, v_pH0) are calculated. pH0 of 0 represent the rejection of the null hypothesis that no effect is observed.
This plot is provided only if differential variability has been tested. The differential variability estimates are reliable only if the linear association between mean and variability for (intercept)
(left-hand side facet) is satisfied. A scatterplot (besides the Intercept) is provided for each category of interest. The for each category of interest, the composition and variability effects should be generally uncorrelated.
## R version 4.4.0 beta (2024-04-15 r86425)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rstan_2.32.6 StanHeaders_2.32.7 tidyr_1.3.1 forcats_1.0.0
## [5] ggplot2_3.5.1 sccomp_1.8.0 dplyr_1.1.4
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 farver_2.1.1
## [3] loo_2.7.0 fastmap_1.1.1
## [5] SingleCellExperiment_1.26.0 digest_0.6.35
## [7] dotCall64_1.1-1 lifecycle_1.0.4
## [9] SeuratObject_5.0.1 magrittr_2.0.3
## [11] compiler_4.4.0 rlang_1.1.3
## [13] sass_0.4.9 tools_4.4.0
## [15] utf8_1.2.4 yaml_2.3.8
## [17] knitr_1.46 labeling_0.4.3
## [19] S4Arrays_1.4.0 curl_5.2.1
## [21] sp_2.1-4 pkgbuild_1.4.4
## [23] DelayedArray_0.30.0 RColorBrewer_1.1-3
## [25] abind_1.4-5 withr_3.0.0
## [27] purrr_1.0.2 BiocGenerics_0.50.0
## [29] grid_4.4.0 stats4_4.4.0
## [31] fansi_1.0.6 colorspace_2.1-0
## [33] future_1.33.2 inline_0.3.19
## [35] progressr_0.14.0 globals_0.16.3
## [37] scales_1.3.0 SummarizedExperiment_1.34.0
## [39] cli_3.6.2 rmarkdown_2.26
## [41] crayon_1.5.2 generics_0.1.3
## [43] RcppParallel_5.1.7 future.apply_1.11.2
## [45] httr_1.4.7 tzdb_0.4.0
## [47] cachem_1.0.8 stringr_1.5.1
## [49] zlibbioc_1.50.0 parallel_4.4.0
## [51] XVector_0.44.0 matrixStats_1.3.0
## [53] vctrs_0.6.5 V8_4.4.2
## [55] boot_1.3-30 Matrix_1.7-0
## [57] jsonlite_1.8.8 IRanges_2.38.0
## [59] hms_1.1.3 patchwork_1.2.0
## [61] S4Vectors_0.42.0 ggrepel_0.9.5
## [63] listenv_0.9.1 jquerylib_0.1.4
## [65] glue_1.7.0 parallelly_1.37.1
## [67] spam_2.10-0 codetools_0.2-20
## [69] stringi_1.8.3 gtable_0.3.5
## [71] QuickJSR_1.1.3 GenomeInfoDb_1.40.0
## [73] GenomicRanges_1.56.0 UCSC.utils_1.0.0
## [75] munsell_0.5.1 tibble_3.2.1
## [77] pillar_1.9.0 htmltools_0.5.8.1
## [79] GenomeInfoDbData_1.2.12 R6_2.5.1
## [81] evaluate_0.23 lattice_0.22-6
## [83] Biobase_2.64.0 highr_0.10
## [85] readr_2.1.5 bslib_0.7.0
## [87] rstantools_2.4.0 Rcpp_1.0.12
## [89] gridExtra_2.3 SparseArray_1.4.0
## [91] xfun_0.43 MatrixGenerics_1.16.0
## [93] prettydoc_0.4.1 pkgconfig_2.0.3