This package implements alternative inference methods for BASiCS. The original package uses adaptive Metropolis within Gibbs sampling, while BASiCStan uses Stan to enable the use of maximum a posteriori estimation, variational inference, and Hamiltonian Monte Carlo. These each have advantages for different use cases.
To use BASiCStan
, we can first use BASICS_MockSCE()
to generate an toy example dataset. We will also set a seed for reproducibility.
suppressPackageStartupMessages(library("BASiCStan"))
set.seed(42)
BASiCS_MockSCE() sce <-
The interface for running MCMC using BASiCS and using alternative inference methods using Stan is very similar. It is worth noting that the joint prior between mean and over-dispersion parameters, corresponding to Regression = TRUE
, is the only supported mode in BASiCStan()
.
BASiCS_MCMC(
amwg_fit <-
sce,N = 200,
Thin = 10,
Burn = 100,
Regression = TRUE
)#> altExp 'spike-ins' is assumed to contain spike-in genes.
#> see help(altExp) for details.
#> Running with spikes BASiCS sampler (regression case) ...
#> -------------------------------------------------------------
#> MCMC running time
#> -------------------------------------------------------------
#> user: 0.341
#> system: 0.003
#> elapsed: 0.344
#> -------------------------------------------------------------
#> Output
#> -------------------------------------------------------------
#> -------------------------------------------------------------
#> BASiCS version 2.20.0 :
#> vertical integration (spikes case)
#> -------------------------------------------------------------
BASiCStan(sce, Method = "sampling", iter = 10)
stan_fit <-#> Warning: There were 5 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
The output of BASiCStan()
is an object of class BASiCS_Chain
, similar to the output of BASiCS_MCMC()
. Therefore, you could use these as you would an object generated using Metropolis within Gibbs sampling. For example, we can plot the relationship between mean and over-dispersion estimated using the joint regression prior:
BASiCS_ShowFit(amwg_fit)
BASiCS_ShowFit(stan_fit)
Using Stan has advantages outside of the variety of inference engines available. By returning a Stan object that we can later convert to a BASiCS_Chain
object, we can leverage an even broader set of functionality. For example, Stan has the ability to easily generate MCMC diagnostics using simple functions. For example, summary()
outputs a number of useful per-chain and across-chain diagnostics:
BASiCStan(sce, ReturnBASiCS = FALSE, Method = "sampling", iter = 10)
stan_obj <-#>
#> SAMPLING FOR MODEL 'basics_regression' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 0.002996 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 29.96 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: WARNING: No variance estimation is
#> Chain 1: performed for num_warmup < 20
#> Chain 1:
#> Chain 1: Iteration: 1 / 10 [ 10%] (Warmup)
#> Chain 1: Iteration: 2 / 10 [ 20%] (Warmup)
#> Chain 1: Iteration: 3 / 10 [ 30%] (Warmup)
#> Chain 1: Iteration: 4 / 10 [ 40%] (Warmup)
#> Chain 1: Iteration: 5 / 10 [ 50%] (Warmup)
#> Chain 1: Iteration: 6 / 10 [ 60%] (Sampling)
#> Chain 1: Iteration: 7 / 10 [ 70%] (Sampling)
#> Chain 1: Iteration: 8 / 10 [ 80%] (Sampling)
#> Chain 1: Iteration: 9 / 10 [ 90%] (Sampling)
#> Chain 1: Iteration: 10 / 10 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 0.046 seconds (Warm-up)
#> Chain 1: 0.029 seconds (Sampling)
#> Chain 1: 0.075 seconds (Total)
#> Chain 1:
#> Warning: There were 5 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
head(summary(stan_obj)$summary)
#> mean se_mean sd 2.5% 25% 50% 75% 97.5%
#> log_mu[1] 3.098420 NaN 0 3.098420 3.098420 3.098420 3.098420 3.098420
#> log_mu[2] 1.058488 NaN 0 1.058488 1.058488 1.058488 1.058488 1.058488
#> log_mu[3] 2.107672 NaN 0 2.107672 2.107672 2.107672 2.107672 2.107672
#> log_mu[4] 2.245089 NaN 0 2.245089 2.245089 2.245089 2.245089 2.245089
#> log_mu[5] 2.002693 NaN 0 2.002693 2.002693 2.002693 2.002693 2.002693
#> log_mu[6] 1.457520 NaN 0 1.457520 1.457520 1.457520 1.457520 1.457520
#> n_eff Rhat
#> log_mu[1] NaN NaN
#> log_mu[2] NaN NaN
#> log_mu[3] NaN NaN
#> log_mu[4] NaN NaN
#> log_mu[5] NaN NaN
#> log_mu[6] NaN NaN
We can then convert this object to a BASiCS_Chain
and carry on a workflow as before:
Stan2BASiCS(stan_obj)
#> An object of class BASiCS_Chain
#> 5 MCMC samples.
#> Dataset contains 100 biological genes and 100 cells (2 batches).
#> Object stored using BASiCS version: 2.20.0
#> Parameters: mu delta s nu theta epsilon phi beta
sessionInfo()
#> R version 4.5.0 RC (2025-04-04 r88126)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] BASiCStan_1.10.0 rstan_2.32.7
#> [3] StanHeaders_2.32.10 BASiCS_2.20.0
#> [5] SingleCellExperiment_1.30.0 SummarizedExperiment_1.38.0
#> [7] Biobase_2.68.0 GenomicRanges_1.60.0
#> [9] GenomeInfoDb_1.44.0 IRanges_2.42.0
#> [11] S4Vectors_0.46.0 BiocGenerics_0.54.0
#> [13] generics_0.1.3 MatrixGenerics_1.20.0
#> [15] matrixStats_1.5.0
#>
#> loaded via a namespace (and not attached):
#> [1] gridExtra_2.3 inline_0.3.21
#> [3] rlang_1.1.6 magrittr_2.0.3
#> [5] compiler_4.5.0 DelayedMatrixStats_1.30.0
#> [7] loo_2.8.0 vctrs_0.6.5
#> [9] pkgconfig_2.0.3 crayon_1.5.3
#> [11] fastmap_1.2.0 backports_1.5.0
#> [13] XVector_0.48.0 labeling_0.4.3
#> [15] scuttle_1.18.0 promises_1.3.2
#> [17] rmarkdown_2.29 UCSC.utils_1.4.0
#> [19] xfun_0.52 bluster_1.18.0
#> [21] cachem_1.1.0 beachmat_2.24.0
#> [23] jsonlite_2.0.0 later_1.4.2
#> [25] DelayedArray_0.34.0 BiocParallel_1.42.0
#> [27] irlba_2.3.5.1 parallel_4.5.0
#> [29] cluster_2.1.8.1 R6_2.6.1
#> [31] bslib_0.9.0 limma_3.64.0
#> [33] jquerylib_0.1.4 Rcpp_1.0.14
#> [35] assertthat_0.2.1 knitr_1.50
#> [37] splines_4.5.0 httpuv_1.6.15
#> [39] Matrix_1.7-3 igraph_2.1.4
#> [41] tidyselect_1.2.1 abind_1.4-8
#> [43] yaml_2.3.10 viridis_0.6.5
#> [45] codetools_0.2-20 miniUI_0.1.1.1
#> [47] curl_6.2.2 pkgbuild_1.4.7
#> [49] lattice_0.22-7 tibble_3.2.1
#> [51] withr_3.0.2 shiny_1.10.0
#> [53] posterior_1.6.1 coda_0.19-4.1
#> [55] evaluate_1.0.3 RcppParallel_5.1.10
#> [57] pillar_1.10.2 tensorA_0.36.2.1
#> [59] checkmate_2.3.2 distributional_0.5.0
#> [61] ggplot2_3.5.2 sparseMatrixStats_1.20.0
#> [63] rstantools_2.4.0 munsell_0.5.1
#> [65] scales_1.3.0 xtable_1.8-4
#> [67] glue_1.8.0 metapod_1.16.0
#> [69] tools_4.5.0 hexbin_1.28.5
#> [71] BiocNeighbors_2.2.0 ScaledMatrix_1.16.0
#> [73] locfit_1.5-9.12 scran_1.36.0
#> [75] cowplot_1.1.3 grid_4.5.0
#> [77] QuickJSR_1.7.0 edgeR_4.6.0
#> [79] colorspace_2.1-1 GenomeInfoDbData_1.2.14
#> [81] BiocSingular_1.24.0 cli_3.6.4
#> [83] rsvd_1.0.5 S4Arrays_1.8.0
#> [85] viridisLite_0.4.2 dplyr_1.1.4
#> [87] V8_6.0.3 glmGamPoi_1.20.0
#> [89] gtable_0.3.6 sass_0.4.10
#> [91] digest_0.6.37 SparseArray_1.8.0
#> [93] dqrng_0.4.1 farver_2.1.2
#> [95] htmltools_0.5.8.1 lifecycle_1.0.4
#> [97] httr_1.4.7 statmod_1.5.0
#> [99] mime_0.13 ggExtra_0.10.1
#> [101] MASS_7.3-65