This vignette will guide you through how accessing and manipulating
the SCoPE2 data sets available from the SingleCellMultimodal
package.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SingleCellMultiModal")
library(SingleCellMultiModal)
library(MultiAssayExperiment)
SCoPE2 is a mass spectrometry (MS)-based single-cell proteomics protocol to quantify the proteome of single-cells in an untargeted fashion. It was initially developed by Specht et al. (2021).
The user can see the available data set by using the default options.
SCoPE2("macrophage_differentiation",
mode = "*",
version = "1.0.0",
dry.run = TRUE)
## snapshotDate(): 2021-10-18
## ah_id mode file_size rdataclass rdatadateadded
## 1 EH4694 protein 33.1 Mb SingleCellExperiment 2020-09-24
## 2 EH4695 rna_assays 68.7 Mb HDF5Matrix 2020-09-24
## 3 EH4696 rna_se 0.2 Mb SingleCellExperiment 2020-09-24
## rdatadateremoved
## 1 <NA>
## 2 <NA>
## 3 <NA>
Or by simply running:
SCoPE2("macrophage_differentiation")
## snapshotDate(): 2021-10-18
## ah_id mode file_size rdataclass rdatadateadded
## 1 EH4694 protein 33.1 Mb SingleCellExperiment 2020-09-24
## 2 EH4695 rna_assays 68.7 Mb HDF5Matrix 2020-09-24
## 3 EH4696 rna_se 0.2 Mb SingleCellExperiment 2020-09-24
## rdatadateremoved
## 1 <NA>
## 2 <NA>
## 3 <NA>
Currently, only the macrophage_differentiation
is available.
You can retrieve the actual data from ExperimentHub
by setting
dry.run = FALSE
. This example retrieves the complete data set
(transcriptome and proteome) for the macrophage_differentiation
project:
scope2 <- SCoPE2("macrophage_differentiation",
modes = "rna|protein",
dry.run = FALSE)
## Warning in split.default(filepaths, fact): data length is not a multiple of
## split variable
scope2
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] protein: SingleCellExperiment with 3042 rows and 1490 columns
## [2] rna: SingleCellExperiment with 32738 rows and 20274 columns
## Functionality:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample coordination DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
## exportClass() - save data to flat files
This data set has been acquired by the Slavov Lab ((???)). It contains single-cell proteomics and single-cell RNA sequencing data for macrophages and monocytes. The objective of the research that led to generate the data is to understand whether homogeneous monocytes differentiate in the absence of cytokines to macrophages with homogeneous or heterogeneous profiles. The transcriptomic and proteomic acquisitions are conducted on two separate subset of similar cells (same experimental design). The cell type of the samples are known only for the proteomics data. The proteomics data was retrieved from the authors’ website and the transcriptomic data was retrieved from the GEO database (accession id: GSE142392).
For more information on the protocol, see (???).
Only version 1.0.0
is currently available.
The macrophage_differentiation
data set in this package contains two
assays: rna
and protein
.
The single-cell proteomics data contains cell type annotation
(celltype
), sample preparation batch (batch_digest
and
batch_sort
), chromatography batch (batch_chromatography
), and the
MS acquisition run (batch_MS
). The single-cell transcriptomics data
was acquired in two batches (batch_Chromium
). Note that because the
cells that compose the two assays are distinct, there is no common
cell annotation available for both proteomics and transcriptomics. The
annotation were therefore filled with NA
s accordingly.
colData(scope2)
## DataFrame with 21764 rows and 6 columns
## celltype batch_digest batch_sort batch_chromatography
## <character> <character> <character> <character>
## AAACCTGAGAAACCAT-1.1 NA NA NA NA
## AAACCTGAGACTAGGC-1.2 NA NA NA NA
## AAACCTGAGAGGTAGA-1.2 NA NA NA NA
## AAACCTGAGATGCGAC-1.1 NA NA NA NA
## AAACCTGAGGCTAGGT-1.1 NA NA NA NA
## ... ... ... ... ...
## i985 Macrophage Q s8 LCA10
## i986 Monocyte Q s8 LCA10
## i987 Monocyte Q s8 LCA10
## i998 Monocyte R s9 LCB3
## i999 Monocyte R s9 LCB3
## batch_MS batch_Chromium
## <character> <factor>
## AAACCTGAGAAACCAT-1.1 NA 1
## AAACCTGAGACTAGGC-1.2 NA 2
## AAACCTGAGAGGTAGA-1.2 NA 2
## AAACCTGAGATGCGAC-1.1 NA 1
## AAACCTGAGGCTAGGT-1.1 NA 1
## ... ... ...
## i985 X190321S_LCA10_X_FP9.. NA
## i986 X190321S_LCA10_X_FP9.. NA
## i987 X190321S_LCA10_X_FP9.. NA
## i998 X190914S_LCB3_X_16pl.. NA
## i999 X190914S_LCB3_X_16pl.. NA
You can extract and check the transcriptomic data through subsetting:
scope2[["rna"]]
## class: SingleCellExperiment
## dim: 32738 20274
## metadata(0):
## assays(1): counts
## rownames(32738): MIR1302-10 FAM138A ... AC002321.2 AC002321.1
## rowData names(0):
## colnames(20274): AAACCTGAGAAACCAT-1.1 AAACCTGAGATGCGAC-1.1 ...
## TTTGTCATCGCTTAGA-1.2 TTTGTCATCGTAGATC-1.2
## colData names(1): Batch
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
The data is rather large and is therefore stored on-disk using the HDF5 backend. You can verify this by looking at the assay data matrix. Note that the counts are UMI counts.
assay(scope2[["rna"]])[1:5, 1:5]
## <5 x 5> sparse matrix of class DelayedMatrix and type "integer":
## AAACCTGAGAAACCAT-1.1 ... AAACCTGCAATAACGA-1.1
## MIR1302-10 0 . 0
## FAM138A 0 . 0
## OR4F5 0 . 0
## RP11-34P13.7 0 . 0
## RP11-34P13.8 0 . 0
The protein
assay contains MS-based proteomic data.
The data have been passed sample and feature quality control,
normalized, log transformed, imputed and batch corrected. Detailed
information about the data processing is available in
another vignette. You can extract the proteomic data similarly to the
transcriptomic data:
scope2[["protein"]]
## class: SingleCellExperiment
## dim: 3042 1490
## metadata(0):
## assays(1): logexprs
## rownames(3042): A0A075B6H9 A0A0B4J1V0 ... Q9Y6X6 Q9Y6Z7
## rowData names(0):
## colnames(1490): i4 i5 ... i2766 i2767
## colData names(5): celltype batch_digest batch_sort batch_chromatography
## batch_MS
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
In this case, the protein data have reasonable size and are loaded
directly into memory. The data matrix is stored in logexprs
. We
decided to not use the traditional logcounts
because MS proteomics
measures intensities rather than counts as opposed to scRNA-Seq.
assay(scope2[["protein"]])[1:5, 1:5]
## i4 i5 i7 i10 i11
## A0A075B6H9 -0.01366062 -0.1824640 0.12977307 0.08940234 0.05711272
## A0A0B4J1V0 0.13875137 0.5383824 -0.35823777 -0.10122993 -0.10688821
## A0A0B4J237 0.54897085 -0.2247036 0.50132075 -0.14652437 -0.40384721
## A0A1B0GTH6 0.05392801 -0.3811629 0.07532757 0.29708811 -0.03003732
## A0A1B0GUA6 -0.16910887 0.1542946 -0.14959632 0.14745758 0.03578250
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] rhdf5_2.38.0 RaggedExperiment_1.18.0
## [3] SingleCellMultiModal_1.6.0 MultiAssayExperiment_1.20.0
## [5] SummarizedExperiment_1.24.0 Biobase_2.54.0
## [7] GenomicRanges_1.46.0 GenomeInfoDb_1.30.0
## [9] IRanges_2.28.0 S4Vectors_0.32.0
## [11] BiocGenerics_0.40.0 MatrixGenerics_1.6.0
## [13] matrixStats_0.61.0 BiocStyle_2.22.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-7 bit64_4.0.5
## [3] filelock_1.0.2 httr_1.4.2
## [5] tools_4.1.1 bslib_0.3.1
## [7] utf8_1.2.2 R6_2.5.1
## [9] HDF5Array_1.22.0 DBI_1.1.1
## [11] rhdf5filters_1.6.0 withr_2.4.2
## [13] tidyselect_1.1.1 bit_4.0.4
## [15] curl_4.3.2 compiler_4.1.1
## [17] formatR_1.11 DelayedArray_0.20.0
## [19] bookdown_0.24 sass_0.4.0
## [21] rappdirs_0.3.3 stringr_1.4.0
## [23] digest_0.6.28 SpatialExperiment_1.4.0
## [25] rmarkdown_2.11 R.utils_2.11.0
## [27] XVector_0.34.0 pkgconfig_2.0.3
## [29] htmltools_0.5.2 sparseMatrixStats_1.6.0
## [31] limma_3.50.0 dbplyr_2.1.1
## [33] fastmap_1.1.0 rlang_0.4.12
## [35] RSQLite_2.2.8 shiny_1.7.1
## [37] DelayedMatrixStats_1.16.0 jquerylib_0.1.4
## [39] generics_0.1.1 jsonlite_1.7.2
## [41] BiocParallel_1.28.0 R.oo_1.24.0
## [43] dplyr_1.0.7 RCurl_1.98-1.5
## [45] magrittr_2.0.1 scuttle_1.4.0
## [47] GenomeInfoDbData_1.2.7 Matrix_1.3-4
## [49] Rcpp_1.0.7 Rhdf5lib_1.16.0
## [51] fansi_0.5.0 lifecycle_1.0.1
## [53] R.methodsS3_1.8.1 edgeR_3.36.0
## [55] stringi_1.7.5 yaml_2.2.1
## [57] zlibbioc_1.40.0 BiocFileCache_2.2.0
## [59] AnnotationHub_3.2.0 grid_4.1.1
## [61] blob_1.2.2 dqrng_0.3.0
## [63] parallel_4.1.1 promises_1.2.0.1
## [65] ExperimentHub_2.2.0 crayon_1.4.1
## [67] lattice_0.20-45 beachmat_2.10.0
## [69] Biostrings_2.62.0 KEGGREST_1.34.0
## [71] magick_2.7.3 locfit_1.5-9.4
## [73] knitr_1.36 pillar_1.6.4
## [75] rjson_0.2.20 glue_1.4.2
## [77] BiocVersion_3.14.0 evaluate_0.14
## [79] BiocManager_1.30.16 vctrs_0.3.8
## [81] png_0.1-7 httpuv_1.6.3
## [83] purrr_0.3.4 assertthat_0.2.1
## [85] cachem_1.0.6 xfun_0.27
## [87] DropletUtils_1.14.0 mime_0.12
## [89] xtable_1.8-4 later_1.3.0
## [91] SingleCellExperiment_1.16.0 tibble_3.1.5
## [93] AnnotationDbi_1.56.1 memoise_2.0.0
## [95] ellipsis_0.3.2 interactiveDisplayBase_1.32.0
Specht, Harrison, Edward Emmott, Aleksandra A Petelski, R Gray Huffman, David H Perlman, Marco Serra, Peter Kharchenko, Antonius Koller, and Nikolai Slavov. 2021. “Single-Cell Proteomic and Transcriptomic Analysis of Macrophage Heterogeneity Using SCoPE2.” Genome Biol. 22 (1): 50.