The SpatialExperiment
package is available via Bioconductor.
if(!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SpatialExperiment")
Load the package as follows:
library(SpatialExperiment)
The SpatialExperiment
class is designed to represent spatially
resolved transcriptomics (ST) data. It inherits from the
SingleCellExperiment
class and is used in the same manner. In
addition, the class supports storage of spatial information via
spatialData
and spatialCoords
, and storage of
images via imgData
.
The SpatialExperiment
class constructor is defined with several arguments to
provide maximum flexibility to the user.
In particular, we distinguish between spatialData
and spatialCoords
as
follows:
spatialData
expects a DataFrame
with all the data associated with the
spatial information (optionally including spatial coordinates from spatialCoords
).spatialCoords
is a numeric matrix
containing only the defined
(via spatialCoordsNames
) spatial coordinates.When building a SpatialExperiment
object, the columns of spatial coordinates from the
spatialData
DFrame
are identified via the spatialCoordsNames
argument and
stored separately as a numeric matrix
within spatialCoords
.
Following is an example of spatialCoords
built via spatialData
and
spatialCoordsNames
.
cd <- DataFrame(x=1:26, y=1:26, z=letters)
mat <- matrix(nrow=26, ncol=26)
spe <- SpatialExperiment(assay=mat,
spatialData=cd,
spatialCoordsNames=c("x", "y"))
head(spatialCoords(spe))
## x y
## [1,] 1 1
## [2,] 2 2
## [3,] 3 3
## [4,] 4 4
## [5,] 5 5
## [6,] 6 6
spatialCoordsNames(spe)
## [1] "x" "y"
head(spatialData(spe))
## DataFrame with 6 rows and 1 column
## z
## <character>
## 1 a
## 2 b
## 3 c
## 4 d
## 5 e
## 6 f
spatialDataNames(spe)
## [1] "z"
head(colData(spe))
## DataFrame with 6 rows and 2 columns
## sample_id z
## <character> <character>
## 1 sample01 a
## 2 sample01 b
## 3 sample01 c
## 4 sample01 d
## 5 sample01 e
## 6 sample01 f
It is also possible to display the combined spatial information with
spatialData()
using the spatialCoords=TRUE
argument:
spatialData(spe, spatialCoords=TRUE)
## DataFrame with 26 rows and 3 columns
## z x y
## <character> <integer> <integer>
## 1 a 1 1
## 2 b 2 2
## 3 c 3 3
## 4 d 4 4
## 5 e 5 5
## ... ... ... ...
## 22 v 22 22
## 23 w 23 23
## 24 x 24 24
## 25 y 25 25
## 26 z 26 26
Alternatively, it is possible to define the spatialDataNames
to define
the spatialData
DFrame
from the columns of colData
.
cd <- DataFrame(x=1:26, y=1:26, z=letters)
mat <- matrix(nrow=26, ncol=26)
spe <- SpatialExperiment(
assay=mat, colData=cd, spatialCoordsNames=c("x", "y"), spatialDataNames="z")
head(spatialData(spe))
## DataFrame with 6 rows and 1 column
## z
## <character>
## 1 a
## 2 b
## 3 c
## 4 d
## 5 e
## 6 f
head(spatialCoords(spe))
## x y
## [1,] 1 1
## [2,] 2 2
## [3,] 3 3
## [4,] 4 4
## [5,] 5 5
## [6,] 6 6
head(colData(spe))
## DataFrame with 6 rows and 2 columns
## z sample_id
## <character> <character>
## 1 a sample01
## 2 b sample01
## 3 c sample01
## 4 d sample01
## 5 e sample01
## 6 f sample01
Also, it is possible to load a numeric matrix
of coordinates with the
spatialCoords
argument.
y <- diag(n <- 10)
mat <- matrix(0, n, m <- 2)
spe <- SpatialExperiment(assays = y, spatialCoords = mat)
Finally, it is possible to set spatialData
, spatialCoords
, and colData
separately.
mat <- as.matrix(cd[,1:2])
colnames(mat) <- c("ecs","uai")
spad <- DataFrame(a=1:26, b=1:26, z=letters)
asy <- matrix(nrow=26, ncol=26)
spe <- SpatialExperiment(assays = asy, spatialCoords = mat,
spatialData=spad, colData=cd)
head(spatialData(spe))
## DataFrame with 6 rows and 3 columns
## a b z
## <integer> <integer> <character>
## 1 1 1 a
## 2 2 2 b
## 3 3 3 c
## 4 4 4 d
## 5 5 5 e
## 6 6 6 f
head(spatialCoords(spe))
## ecs uai
## [1,] 1 1
## [2,] 2 2
## [3,] 3 3
## [4,] 4 4
## [5,] 5 5
## [6,] 6 6
head(colData(spe))
## DataFrame with 6 rows and 7 columns
## x y z sample_id a b z
## <integer> <integer> <character> <character> <integer> <integer> <character>
## 1 1 1 a sample01 1 1 a
## 2 2 2 b sample01 2 2 b
## 3 3 3 c sample01 3 3 c
## 4 4 4 d sample01 4 4 d
## 5 5 5 e sample01 5 5 e
## 6 6 6 f sample01 6 6 f
To work with multiple samples, the SpatialExperiment
class provides the cbind
method, which assumes unique sample_id
(s) are provided for each sample.
In case the sample_id
(s) are duplicated across multiple samples, the cbind
method takes care of this by appending indices to create unique sample identifiers.
spe1 <- spe2 <- spe
spe3 <- cbind(spe1, spe2)
## 'sample_id's are duplicated across 'SpatialExperiment' objects to cbind; appending sample indices.
unique(spe3$sample_id)
## [1] "sample01.1" "sample01.2"
Otherwise, it is possible to create unique sample_id
(s) as follows.
# make sample identifiers unique
spe1 <- spe2 <- spe
spe1$sample_id <- paste(spe1$sample_id, "sample1", sep = ".")
spe2$sample_id <- paste(spe2$sample_id, "sample2", sep = ".")
# combine into single object
spe3 <- cbind(spe1, spe2)
spe3
## class: SpatialExperiment
## dim: 26 52
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(4): x y sample_id z.1
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialData names(3) : a b z
## spatialCoords names(2) : ecs uai
## imgData names(1): sample_id
Subsetting objects is automatically defined to synchronize across all attributes of the objects, as for any other Bioconductor Experiment class.
For example, it is possible to subset
by sample_id
as follows:
spe3[, colData(spe)$sample_id=="sample1.sample1"]
## class: SpatialExperiment
## dim: 26 0
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(4): x y sample_id z.1
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialData names(3) : a b z
## spatialCoords names(2) : ecs uai
## imgData names(1): sample_id
In particular, when trying to replace the sample_id
(s) of a SpatialExperiment
object, these must map uniquely with the already existing ones, otherwise an
error is returned.
new <- spe3$sample_id;
new[1] <- "sample1.sample2"
spe3$sample_id <- new
## Error in .local(x, ..., value): Number of unique 'sample_id's is 2, but 3 were provided.
new[1] <- "third.one.of.two"
spe3$sample_id <- new
## Error in .local(x, ..., value): Number of unique 'sample_id's is 2, but 3 were provided.
When working with spot-based ST data, such as 10x Genomics Visium or other
platforms providing images, it is possible to store the image information in the
dedicated imgData
structure.
Also, the SpatialExperiment
class stores a sample_id
value in the
spatialData
structure, which is possible to set with the sample_id
argument
(default is “sample_01”).
Here we show how to load the default Space Ranger data files from a 10x Genomics
Visium experiment, and build a SpatialExperiment
object.
In particular, the readImgData()
function is used to build an imgData
DataFrame
to be passed to the SpatialExperiment
constructor.
The sample_id
used to build the imgData
object must be the
same one used to build the SpatialExperiment
object, otherwise an error is
returned.
dir <- system.file(
file.path("extdata", "10xVisium", "section1"),
package = "SpatialExperiment")
# read in counts
fnm <- file.path(dir, "raw_feature_bc_matrix")
sce <- DropletUtils::read10xCounts(fnm)
# read in image data
img <- readImgData(
path = file.path(dir, "spatial"),
sample_id="foo")
# read in spatial coordinates
fnm <- file.path(dir, "spatial", "tissue_positions_list.csv")
xyz <- read.csv(fnm, header = FALSE,
col.names = c(
"barcode", "in_tissue", "array_row", "array_col",
"pxl_row_in_fullres", "pxl_col_in_fullres"))
# construct observation & feature metadata
rd <- S4Vectors::DataFrame(
symbol = rowData(sce)$Symbol)
# construct 'SpatialExperiment'
(spe <- SpatialExperiment(
assays = list(counts = assay(sce)),
colData = colData(sce), rowData = rd, imgData = img,
spatialData=DataFrame(xyz),
spatialCoordsNames=c("pxl_col_in_fullres", "pxl_row_in_fullres"),
sample_id="foo"))
## class: SpatialExperiment
## dim: 50 50
## metadata(0):
## assays(1): counts
## rownames(50): ENSMUSG00000051951 ENSMUSG00000089699 ...
## ENSMUSG00000005886 ENSMUSG00000101476
## rowData names(1): symbol
## colnames: NULL
## colData names(3): Sample Barcode sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialData names(4) : barcode in_tissue array_row array_col
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor
Alternatively, the read10xVisium()
function facilitates the import of
10x Genomics Visium data to handle one or more samples organized in
folders reflecting the default Space Ranger folder tree organization:
sample
|—outs
· · |—raw/filtered_feature_bc_matrix.h5
· · |—raw/filtered_feature_bc_matrix
· · · · |—barcodes.tsv
· · · · |—features.tsv
· · · · |—matrix.mtx
· · |—spatial
· · · · |—scalefactors_json.json
· · · · |—tissue_lowres_image.png
· · · · |—tissue_positions_list.csv
dir <- system.file(
file.path("extdata", "10xVisium"),
package = "SpatialExperiment")
sample_ids <- c("section1", "section2")
samples <- file.path(dir, sample_ids)
(spe <- read10xVisium(samples, sample_ids,
type = "sparse", data = "raw",
images = "lowres", load = FALSE))
## class: SpatialExperiment
## dim: 50 99
## metadata(0):
## assays(1): counts
## rownames(50): ENSMUSG00000051951 ENSMUSG00000089699 ...
## ENSMUSG00000005886 ENSMUSG00000101476
## rowData names(1): symbol
## colnames(99): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
## AAAGTCGACCCTCAGT-1 AAAGTGCCATCAATTA-1
## colData names(1): sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialData names(3) : in_tissue array_row array_col
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor
To demonstrate how to accommodate molecule-based ST data (e.g. seqFISH platform)
inside a SpatialExperiment
object, we generate some mock data of
molecule coordinates across and cells.
These should be formatted into a data.frame
where each row corresponds to a
molecule, and columns specify the xy-position as well as which gene/cell the
molecule has been assigned to:
# sample xy-coordinates in [0,1]
x <- runif(n)
y <- runif(n)
# assign each molecule to some gene-cell pair
gs <- paste0("gene", seq(ng))
cs <- paste0("cell", seq(nc))
gene <- sample(gs, n, TRUE)
cell <- sample(cs, n, TRUE)
# construct data.frame of molecule coodinates
df <- data.frame(gene, cell, x, y)
head(df)
## gene cell x y
## 1 gene2 cell1 0.8808741 0.320652463
## 2 gene13 cell3 0.6192341 0.543930311
## 3 gene35 cell16 0.8563139 0.249875277
## 4 gene12 cell16 0.1921838 0.740945357
## 5 gene49 cell1 0.8762683 0.001373172
## 6 gene40 cell4 0.2871124 0.811706067
Next, it is possible to re-shape the above table into a
BumpyMatrix using splitAsBumpyMatrix()
, which takes
as input the xy-coordinates, as well as arguments specifying the row and column
index of each observation:
# (assure gene & cell are factor so that
# missing observations aren't dropped)
df$gene <- factor(df$gene, gs)
df$cell <- factor(df$cell, cs)
# construct BumpyMatrix
library(BumpyMatrix)
mol <- splitAsBumpyMatrix(
df[, c("x", "y")],
row = gs, col = cs)
Finally, it is possible to construct a SpatialExperiment
object with two data
slots:
- The counts
assay stores the number of molecules per gene and cell
(equivalent to transcript counts in spot-based data).
- The molecules
assay holds the spatial molecule positions (xy-coordinates).
Here, each entry is a DFrame
that contains the positions of all molecules
from a given gene that have been assigned to a given cell.
# get count matrix
y <- with(df, table(gene, cell))
y <- as.matrix(unclass(y))
y[1:5, 1:5]
## cell
## gene cell1 cell2 cell3 cell4 cell5
## gene1 2 0 0 1 0
## gene2 3 2 4 1 0
## gene3 1 1 0 1 3
## gene4 1 0 1 1 1
## gene5 2 1 1 0 0
# construct SpatialExperiment
spe <- SpatialExperiment(
assays = list(
counts = y,
molecules = mol))
spe
## class: SpatialExperiment
## dim: 50 20
## metadata(0):
## assays(2): counts molecules
## rownames(50): gene1 gene2 ... gene49 gene50
## rowData names(0):
## colnames(20): cell1 cell2 ... cell19 cell20
## colData names(1): sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialData names(0) :
## spatialCoords names(0) :
## imgData names(0):
The BumpyMatrix
of molecule locations can be accessed using the dedicated
molecules()
accessor:
molecules(spe)
## 50 x 20 BumpyDataFrameMatrix
## rownames: gene1 gene2 ... gene49 gene50
## colnames: cell1 cell2 ... cell19 cell20
## preview [1,1]:
## DataFrame with 20 rows and 2 columns
## x y
## <numeric> <numeric>
## 1 0.8808741 0.320652
## 2 0.0260193 0.469372
## 3 0.8526963 0.294299
## 4 0.0341913 0.918395
## 5 0.3358630 0.748199
## ... ... ...
## 16 0.38597319 0.799529
## 17 0.50525161 0.116668
## 18 0.62342685 0.925884
## 19 0.00431502 0.625905
## 20 0.34600773 0.944289
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] BumpyMatrix_1.0.0 SpatialExperiment_1.2.1
## [3] SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0
## [5] Biobase_2.52.0 GenomicRanges_1.44.0
## [7] GenomeInfoDb_1.28.0 IRanges_2.26.0
## [9] S4Vectors_0.30.0 BiocGenerics_0.38.0
## [11] MatrixGenerics_1.4.0 matrixStats_0.59.0
## [13] BiocStyle_2.20.0
##
## loaded via a namespace (and not attached):
## [1] locfit_1.5-9.4 xfun_0.23
## [3] bslib_0.2.5.1 beachmat_2.8.0
## [5] HDF5Array_1.20.0 lattice_0.20-44
## [7] rhdf5_2.36.0 htmltools_0.5.1.1
## [9] yaml_2.2.1 rlang_0.4.11
## [11] R.oo_1.24.0 jquerylib_0.1.4
## [13] scuttle_1.2.0 R.utils_2.10.1
## [15] BiocParallel_1.26.0 dqrng_0.3.0
## [17] GenomeInfoDbData_1.2.6 stringr_1.4.0
## [19] zlibbioc_1.38.0 R.methodsS3_1.8.1
## [21] codetools_0.2-18 evaluate_0.14
## [23] knitr_1.33 Rcpp_1.0.6
## [25] edgeR_3.34.0 formatR_1.11
## [27] BiocManager_1.30.15 limma_3.48.0
## [29] DelayedArray_0.18.0 magick_2.7.2
## [31] jsonlite_1.7.2 XVector_0.32.0
## [33] rjson_0.2.20 digest_0.6.27
## [35] stringi_1.6.2 bookdown_0.22
## [37] grid_4.1.0 tools_4.1.0
## [39] bitops_1.0-7 rhdf5filters_1.4.0
## [41] magrittr_2.0.1 sass_0.4.0
## [43] RCurl_1.98-1.3 Matrix_1.3-4
## [45] DelayedMatrixStats_1.14.0 sparseMatrixStats_1.4.0
## [47] rmarkdown_2.8 Rhdf5lib_1.14.1
## [49] R6_2.5.0 DropletUtils_1.12.1
## [51] compiler_4.1.0