Spatial transcriptomics is a rapidly evolving field that has recently led to a better understanding of spatial and intra-sample heterogeneity, which could be crucial to understand the molecular basis of human diseases. However, the lack of computational tools for exploiting cross-regional information and the limited spatial resolution of current technologies present major obstacles to elucidating tissue heterogeneity.Here we present RegionalST, an R software package that enables quantifying cell type mixture and interactions, identifying sub-regions of interest, and performing cross-region cell type-specific differential analysis. RegionalST provides a one-stop destination for researchers seeking to better understand the complexities of spatial transcriptomics data.
RegionalST 1.2.0
To install this package, start R (version “4.3”) and enter:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("RegionalST")
The data input step of RegionalST package relies on the package BayesSpace. BayesSpace supports three ways of loading the data for analysis.
First, reading Visium data through readVisium()
:
This function takes only
the path to the Space Ranger output directory (containing the spatial/
and
filtered_feature_bc_matrix/
subdirectories) and returns a
SingleCellExperiment
.
sce <- readVisium("path/to/spaceranger/outs/")
Second, you can create a SingleCellExperiment object directly from the count matrix:
library(Matrix)
rowData <- read.csv("path/to/rowData.csv", stringsAsFactors=FALSE)
colData <- read.csv("path/to/colData.csv", stringsAsFactors=FALSE, row.names=1)
counts <- read.csv("path/to/counts.csv.gz",
row.names=1, check.names=F, stringsAsFactors=FALSE))
sce <- SingleCellExperiment(assays=list(counts=as(counts, "dgCMatrix")),
rowData=rowData,
colData=colData)
Lastly is to use the getRDS()
function. Please check the manual of BayesSpace if this step runs into any question.
For Visium platform, a single spot is usually consisting of multiple cells and thus analyzing it as a whole could reduce the accuracy. As a result, we suggest perform cell deconvolution analysis using CARD or RCTD (spacexr) or cell2location. Below we show some example code of obtaining cell type proportions using CARD:
### read in spatial transcriptomics data for analysis
library(BayesSpace)
outdir = "/Dir/To/Data/BreastCancer_10x"
sce <- readVisium(outdir)
sce <- spatialPreprocess(sce, platform="Visium", log.normalize=TRUE)
spatial_count <- assays(sce)$counts
spatial_location <- data.frame(x = sce$imagecol,
y = max(sce$imagerow) - sce$imagerow)
rownames(spatial_location) <- colnames(spatial_count)
### assuming the single cell reference data for BRCA has been loaded
### BRCA_countmat: the count matrix of the BRCA single cell reference
### cellType: the cell types of the BRCA reference data
sc_count <- BRCA_countmat
sc_meta <- data.frame(cellID = colnames(BRCA_countmat),
cellType = cellType)
rownames(sc_meta) <- colnames(BRCA_countmat)
library(CARD)
CARD_obj <- createCARDObject(
sc_count = sc_count,
sc_meta = sc_meta,
spatial_count = spatial_count,
spatial_location = spatial_location,
ct.varname = "cellType",
ct.select = unique(sc_meta$cellType),
sample.varname = "sampleInfo",
minCountGene = 100,
minCountSpot = 5)
CARD_obj <- CARD_deconvolution(CARD_object = CARD_obj)
## add proportion to the sce object
S4Vectors::metadata(sce)$Proportions <- RegionalST::getProportions(CARD_obj)
In our package, we create a small example dataset by subsetting the breast cancer Visium data from 10X. We already added the cell type proportion from deconvolution. In case deconvolution couldn’t be performed or the data is of single cell resolution, we also provided the cell type label for each spot. Note that the Visium data is actually not single cell resolution, so the cell type label indicates the major cell type for each spot.
set.seed(1234)
library(RegionalST)
library("gridExtra")
data(example_sce)
## the proportion information is saved under the metadata
S4Vectors::metadata(example_sce)$Proportions[seq_len(5),seq_len(5)]
## Cancer Epithelial CAFs T-cells Endothelial
## GTAGACAACCGATGAA-1 0.04610997 0.2684636 0.11682355 0.07809853
## ACAGATTAGGTTAGTG-1 0.09078458 0.3380722 0.06542484 0.05633140
## TGGTATCGGTCTGTAT-1 0.05897943 0.4562020 0.02192799 0.09059294
## ATTATCTCGACAGATC-1 0.07374128 0.5029240 0.03206525 0.02884060
## TGAGATCAAATACTCA-1 0.09849148 0.5737657 0.01429222 0.02761076
## PVL
## GTAGACAACCGATGAA-1 0.040103174
## ACAGATTAGGTTAGTG-1 0.098474025
## TGGTATCGGTCTGTAT-1 0.056373840
## ATTATCTCGACAGATC-1 0.006084852
## TGAGATCAAATACTCA-1 0.014253759
## the cell type information is saved under a cell type variable
head(example_sce$celltype)
## [1] "CAFs" "CAFs" "CAFs" "CAFs" "CAFs" "CAFs"
First, we want to preprocess the data using the functions from BayesSpace:
library(BayesSpace)
example_sce <- example_sce[, colSums(counts(example_sce)) > 0]
example_sce <- mySpatialPreprocess(example_sce, platform="Visium")
Second, we assign weights to each cell type and check the entropy at different radii.
weight <- data.frame(celltype = c("Cancer Epithelial", "CAFs", "T-cells", "Endothelial",
"PVL", "Myeloid", "B-cells", "Normal Epithelial", "Plasmablasts"),
weight = c(0.25,0.05,
0.25,0.05,
0.025,0.05,
0.25,0.05,0.025))
OneRad <- GetOneRadiusEntropy_withProp(example_sce,
selectN = length(example_sce$spot),
weight = weight,
radius = 5,
doPlot = TRUE,
mytitle = "Radius 5 weighted entropy")
Note: Here the GetOneRadiusEntropy()
will calculate the entropy for all the spots (as length(example_sce$spot)
is the length of all the spots). If this is too slow with a large dataset, you can specify to compute only a subset of the spots by argument, e.g., selectN = round(length(example_sce$spot)/10)
. I use one tenth as an example, depending on the size of your data, you can try 1/3, 1/5, or 1/20 to generate entropy figures with different sparsities. The smaller selectN
is, the faster this function will be.
Then, we can use automatic functions to select ROIs:
example_sce <- RankCenterByEntropy_withProp(example_sce,
weight,
selectN = round(length(example_sce$spot)/5),
topN = 3,
min_radius = 10,
radius_vec = c(5,10),
doPlot = TRUE)
### visualize one selected ROI:
PlotOneSelectedCenter(example_sce,ploti = 1)
Note: The min_radius
argument controls the minimum distance between two closest identified ROI centers. If you specify a large min_radius, the ROIs will tend to have no or less overlaps.
Let’s visualize all the selected ROIs:
## let's visualize the selected regions:
palette <- colorspace::qualitative_hcl(9,
palette = "Set2")
selplot <- list()
topN <- 3
for(i in seq_len(topN)) {
selplot[[i]] <- print(PlotOneSelectedCenter(example_sce,ploti = i))
}
selplot[[topN+1]] <- print(clusterPlot(example_sce, palette=palette, label = "celltype", size=0.1) )
do.call("grid.arrange", c(selplot, ncol = 2))
You can save your selected ROIs to an .RData
file for future analysis and reproducible purposes:
thisSelection <- S4Vectors::metadata(sce)$selectCenters
save(thisSelection, file = "/Your/Directory/SelectionResults_withProportions.RData")
In addition to automatic selections, we can also manually select ROIs through a shiny app:
example_sce <- ManualSelectCenter(example_sce)
S4Vectors::metadata(example_sce)$selectCenters
### Draw cell type proportions for the selected ROIs
DrawRegionProportion_withProp(example_sce,
label = "CellType",
selCenter = c(1,2,3))
One important downloaded analysis after we identified the ROIs is to understand the differentially expressed genes comparing one ROI to the other. Let’s compare the first and the second ROIs.
CR12_DE <- GetCrossRegionalDE_withProp(example_sce,
twoCenter = c(1,2),
label = "celltype",
angle = 30,
hjust = 0,
size = 3,
padj_filter = 0.05,
doHeatmap = TRUE)
dim(CR12_DE$allDE)
## [1] 4 9
table(CR12_DE$allDE$Comparison)
##
## Cancer_Epithelial: Region1 vs Region2 Myeloid: Region1 vs Region2
## 1 1
## Normal_Epithelial: Region1 vs Region2 PVL: Region1 vs Region2
## 1 1
## we find very few DE genes in the current dataset as the example data is very small and truncated.
We can similarly compare the first and third, the second and the third ROIs.
CR13_DE <- GetCrossRegionalDE_withProp(example_sce,
twoCenter = c(1,3),
label = "celltype",
padj_filter = 0.05,
doHeatmap = TRUE)
CR23_DE <- GetCrossRegionalDE_withProp(example_sce,
twoCenter = c(2,3),
label = "celltype",
padj_filter = 0.05,
doHeatmap = TRUE)
exampleRes <- list(CR12_DE,
CR13_DE,
CR23_DE)
As our current dataset is very small, we couldn’t find much signals from it. We prepared another example DE output. This result has been truncated and thus it is not a full list of genes. Let’s take a look:
data("exampleRes")
## check the number of DEs for each cell type-specific comparison
table(exampleRes[[1]]$allDE$Comparison)
##
## B_cells: Region1 vs Region2 CAFs: Region1 vs Region2
## 11 35
## Cancer_Epithelial: Region1 vs Region2 Endothelial: Region1 vs Region2
## 21 12
## Myeloid: Region1 vs Region2 Normal_Epithelial: Region1 vs Region2
## 57 14
## Plasmablasts: Region1 vs Region2 PVL: Region1 vs Region2
## 15 27
## T_cells: Region1 vs Region2
## 8
table(exampleRes[[2]]$allDE$Comparison)
##
## B_cells: Region1 vs Region3 CAFs: Region1 vs Region3
## 29 32
## Cancer_Epithelial: Region1 vs Region3 Endothelial: Region1 vs Region3
## 9 20
## Myeloid: Region1 vs Region3 Normal_Epithelial: Region1 vs Region3
## 30 6
## Plasmablasts: Region1 vs Region3 PVL: Region1 vs Region3
## 14 19
## T_cells: Region1 vs Region3
## 41
table(exampleRes[[3]]$allDE$Comparison)
##
## B_cells: Region2 vs Region3 CAFs: Region2 vs Region3
## 15 34
## Cancer_Epithelial: Region2 vs Region3 Endothelial: Region2 vs Region3
## 2 25
## Myeloid: Region2 vs Region3 Normal_Epithelial: Region2 vs Region3
## 85 8
## Plasmablasts: Region2 vs Region3 PVL: Region2 vs Region3
## 6 20
## T_cells: Region2 vs Region3
## 5
Here is some example code of annotating cell types for Visium 10X data when the cell type deconvolution is not feasible.
outdir = "/Users/zli16/Dropbox/TrySTData/Ovarian_10x"
sce <- readVisium(outdir)
sce <- sce[, colSums(counts(sce)) > 0]
sce <- spatialPreprocess(sce, platform="Visium", log.normalize=TRUE)
sce <- qTune(sce, qs=seq(2, 10), platform="Visium", d=15)
sce <- spatialCluster(sce, q=10, platform="Visium", d=15,
init.method="mclust", model="t", gamma=2,
nrep=50000, burn.in=1000,
save.chain=FALSE)
clusterPlot(sce)
markers <- list()
markers[["Epithelial"]] <- c("EPCAM")
markers[["Tumor"]] <- c("EPCAM","MUC6", "MMP7")
markers[["Macrophages"]] <- c("CD14", "CSF1R")
markers[["Dendritic cells"]] <- c("CCR7")
markers[["Immune cells"]] <- c("CD19", "CD79A", "CD79B", "PTPRC")
sum_counts <- function(sce, features) {
if (length(features) > 1) {
colSums(logcounts(sce)[features, ])
} else {
logcounts(sce)[features, ]
}
}
spot_expr <- purrr::map(markers, function(xs) sum_counts(sce, xs))
library(ggplot2)
plot_expression <- function(sce, expr, name, mylimits) {
# fix.sc <- scale_color_gradientn(colours = c('lightgrey', 'blue'), limits = c(0, 6))
featurePlot(sce, expr, color=NA) +
viridis::scale_fill_viridis(option="A") +
labs(title=name, fill="Log-normalized\nexpression")
}
spot_plots <- purrr::imap(spot_expr, function(x, y) plot_expression(sce, x, y))
patchwork::wrap_plots(spot_plots, ncol=3)
#### assign celltype based on marker distribution
sce$celltype <- sce$spatial.cluster
sce$celltype[sce$spatial.cluster %in% c(1,2,6,8)] <- "Epithelial"
sce$celltype[sce$spatial.cluster %in% c(3)] <- "Macrophages"
sce$celltype[sce$spatial.cluster %in% c(4,5)] <- "Immune"
sce$celltype[sce$spatial.cluster %in% c(9,7,10)] <- "Tumor"
colData(sce)$celltype <- sce$celltype
We still use the same example dataset as the section above for illustration.
library(RegionalST)
data(example_sce)
## the cell type information is saved under a cell type variable
table(example_sce$celltype)
weight <- data.frame(celltype = c("Cancer Epithelial", "CAFs", "T-cells", "Endothelial",
"PVL", "Myeloid", "B-cells", "Normal Epithelial", "Plasmablasts"),
weight = c(0.25,0.05,
0.25,0.05,
0.025,0.05,
0.25,0.05,0.025))
OneRad <- GetOneRadiusEntropy(example_sce,
selectN = length(example_sce$spot),
weight = weight,
label = "celltype",
radius = 5,
doPlot = TRUE,
mytitle = "Radius 5 weighted entropy")
example_sce <- RankCenterByEntropy(example_sce,
weight,
selectN = round(length(example_sce$spot)/2),
label = "celltype",
topN = 3,
min_radius = 10,
radius_vec = c(5,10),
doPlot = TRUE)
Note: The min_radius
argument controls the minimum distance between two closest identified ROI centers. If you specify a large min_radius, the ROIs will tend to have no or less overlaps.
Let’s visualize all the selected ROIs:
## let's visualize the selected regions:
palette <- colorspace::qualitative_hcl(9,
palette = "Set2")
selplot <- list()
topN = 3
for(i in seq_len(topN)) {
selplot[[i]] <- print(PlotOneSelectedCenter(example_sce,ploti = i))
}
selplot[[topN+1]] <- print(clusterPlot(example_sce, palette=palette, label = "celltype", size=0.1) )
do.call("grid.arrange", c(selplot, ncol = 2))
You can save your selected ROIs to an .RData
file for future analysis and reproducible purposes:
thisSelection <- S4Vectors::metadata(example_sce)$selectCenters
save(thisSelection, file = "/Your/Directory/SelectionResults_withProportions.RData")
This section is exactly the same with or without proportion information. See Section 3.3.2.
## I didn't run this in the vignette as the current dataset has been truncated and couldn't find any DE genes
CR12_DE <- GetCrossRegionalDE_raw(example_sce,
twoCenter = c(1,2),
label = "celltype",
logfc.threshold = 0.1,
min.pct = 0.1,
angle = 30,
hjust = 0,
size = 3,
padj_filter = 0.05,
doHeatmap = FALSE)
Similarly we can perform cross regional analysis for other pairs:
CR13_DE <- GetCrossRegionalDE_raw(example_sce,
twoCenter = c(1,3),
label = "celltype",
padj_filter = 0.05,
doHeatmap = FALSE)
CR23_DE <- GetCrossRegionalDE_raw(example_sce,
twoCenter = c(2,3),
label = "celltype",
padj_filter = 0.05,
doHeatmap = FALSE)
allfigure <- list()
allCTres <- DoGSEA(exampleRes, whichDB = "hallmark", withProp = TRUE)
for(i in seq_len(3)) {
allfigure[[i]] <- DrawDotplot(allCTres, CT = i, angle = 15, vjust = 1, chooseP = "padj")
}
do.call("grid.arrange", c(allfigure[c(1,2,3)], ncol = 3))
### draw each cell type individually, here I am drawing cell type = 3
DrawDotplot(allCTres, CT = 3, angle = 15, vjust = 1, chooseP = "padj")
Note: In addition to “hallmark”, the pathway database can also be “kegg” or “reactome”. If you prefer other databases, you can set the gmtdir=
argument as the directory to the gmt file of another database in DoGSEA()
function.
allCTres <- DoGSEA(exampleRes, whichDB = "kegg", withProp = TRUE)
DrawDotplot(allCTres, CT = 3, angle = 15, vjust = 1, chooseP = "padj")
allCTres <- DoGSEA(exampleRes, whichDB = "reactome", withProp = TRUE)
DrawDotplot(allCTres, CT = 3, angle = 15, vjust = 1, chooseP = "padj")
## R version 4.4.0 beta (2024-04-15 r86425)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] BayesSpace_1.14.0 SingleCellExperiment_1.26.0
## [3] SummarizedExperiment_1.34.0 Biobase_2.64.0
## [5] GenomicRanges_1.56.0 GenomeInfoDb_1.40.0
## [7] IRanges_2.38.0 S4Vectors_0.42.0
## [9] BiocGenerics_0.50.0 MatrixGenerics_1.16.0
## [11] matrixStats_1.3.0 gridExtra_2.3
## [13] RegionalST_1.2.0 BiocStyle_2.32.0
##
## loaded via a namespace (and not attached):
## [1] spatstat.sparse_3.0-3 bitops_1.0-7
## [3] httr_1.4.7 RColorBrewer_1.1-3
## [5] doParallel_1.0.17 sctransform_0.4.1
## [7] tools_4.4.0 utf8_1.2.4
## [9] R6_2.5.1 DirichletReg_0.7-1
## [11] uwot_0.2.2 lazyeval_0.2.2
## [13] rhdf5filters_1.16.0 withr_3.0.0
## [15] sp_2.1-4 GGally_2.2.1
## [17] progressr_0.14.0 cli_3.6.2
## [19] spatstat.explore_3.2-7 fastDummies_1.7.3
## [21] sandwich_3.1-0 labeling_0.4.3
## [23] sass_0.4.9 Seurat_5.0.3
## [25] spatstat.data_3.0-4 nnls_1.5
## [27] proxy_0.4-27 ggridges_0.5.6
## [29] pbapply_1.7-2 scater_1.32.0
## [31] parallelly_1.37.1 limma_3.60.0
## [33] RSQLite_2.3.6 generics_0.1.3
## [35] spatstat.random_3.2-3 ica_1.0-3
## [37] dplyr_1.1.4 Matrix_1.7-0
## [39] ggbeeswarm_0.7.2 fansi_1.0.6
## [41] abind_1.4-5 lifecycle_1.0.4
## [43] yaml_2.3.8 edgeR_4.2.0
## [45] rhdf5_2.48.0 SparseArray_1.4.0
## [47] BiocFileCache_2.12.0 Rtsne_0.17
## [49] grid_4.4.0 blob_1.2.4
## [51] promises_1.3.0 dqrng_0.3.2
## [53] crayon_1.5.2 miniUI_0.1.1.1
## [55] lattice_0.22-6 beachmat_2.20.0
## [57] cowplot_1.1.3 magick_2.8.3
## [59] pillar_1.9.0 knitr_1.46
## [61] metapod_1.12.0 fgsea_1.30.0
## [63] xgboost_1.7.7.1 corpcor_1.6.10
## [65] future.apply_1.11.2 codetools_0.2-20
## [67] fastmatch_1.1-4 leiden_0.4.3.1
## [69] glue_1.7.0 EpiDISH_2.20.0
## [71] data.table_1.15.4 vctrs_0.6.5
## [73] png_0.1-8 spam_2.10-0
## [75] locfdr_1.1-8 gtable_0.3.5
## [77] assertthat_0.2.1 cachem_1.0.8
## [79] TOAST_1.18.0 xfun_0.43
## [81] S4Arrays_1.4.0 mime_0.12
## [83] coda_0.19-4.1 survival_3.6-4
## [85] iterators_1.0.14 tinytex_0.50
## [87] maxLik_1.5-2.1 statmod_1.5.0
## [89] bluster_1.14.0 fitdistrplus_1.1-11
## [91] ROCR_1.0-11 nlme_3.1-164
## [93] bit64_4.0.5 filelock_1.0.3
## [95] RcppAnnoy_0.0.22 bslib_0.7.0
## [97] irlba_2.3.5.1 vipor_0.4.7
## [99] KernSmooth_2.23-22 colorspace_2.1-0
## [101] DBI_1.2.2 tidyselect_1.2.1
## [103] bit_4.0.5 compiler_4.4.0
## [105] curl_5.2.1 BiocNeighbors_1.22.0
## [107] DelayedArray_0.30.0 plotly_4.10.4
## [109] bookdown_0.39 scales_1.3.0
## [111] lmtest_0.9-40 quadprog_1.5-8
## [113] goftest_1.2-3 stringr_1.5.1
## [115] digest_0.6.35 spatstat.utils_3.0-4
## [117] rmarkdown_2.26 XVector_0.44.0
## [119] htmltools_0.5.8.1 pkgconfig_2.0.3
## [121] sparseMatrixStats_1.16.0 highr_0.10
## [123] dbplyr_2.5.0 fastmap_1.1.1
## [125] rlang_1.1.3 htmlwidgets_1.6.4
## [127] UCSC.utils_1.0.0 shiny_1.8.1.1
## [129] DelayedMatrixStats_1.26.0 farver_2.1.1
## [131] jquerylib_0.1.4 zoo_1.8-12
## [133] jsonlite_1.8.8 BiocParallel_1.38.0
## [135] mclust_6.1.1 BiocSingular_1.20.0
## [137] RCurl_1.98-1.14 magrittr_2.0.3
## [139] Formula_1.2-5 scuttle_1.14.0
## [141] GenomeInfoDbData_1.2.12 dotCall64_1.1-1
## [143] patchwork_1.2.0 Rhdf5lib_1.26.0
## [145] munsell_0.5.1 Rcpp_1.0.12
## [147] viridis_0.6.5 reticulate_1.36.1
## [149] stringi_1.8.3 zlibbioc_1.50.0
## [151] MASS_7.3-60.2 plyr_1.8.9
## [153] ggstats_0.6.0 parallel_4.4.0
## [155] listenv_0.9.1 ggrepel_0.9.5
## [157] deldir_2.0-4 splines_4.4.0
## [159] tensor_1.5 locfit_1.5-9.9
## [161] igraph_2.0.3 spatstat.geom_3.2-9
## [163] RcppHNSW_0.6.0 reshape2_1.4.4
## [165] ScaledMatrix_1.12.0 evaluate_0.23
## [167] SeuratObject_5.0.1 scran_1.32.0
## [169] BiocManager_1.30.22 foreach_1.5.2
## [171] httpuv_1.6.15 miscTools_0.6-28
## [173] polyclip_1.10-6 RANN_2.6.1
## [175] tidyr_1.3.1 purrr_1.0.2
## [177] future_1.33.2 scattermore_1.2
## [179] ggplot2_3.5.1 rsvd_1.0.5
## [181] xtable_1.8-4 e1071_1.7-14
## [183] RSpectra_0.16-1 later_1.3.2
## [185] viridisLite_0.4.2 class_7.3-22
## [187] tibble_3.2.1 memoise_2.0.1
## [189] beeswarm_0.4.0 cluster_2.1.6
## [191] globals_0.16.3