LRBaseDbi
, LRBase.XXX.eg.db
, and scTensor
packagescTensor 2.0.0
Due to the rapid development of single-cell RNA-Seq (scRNA-Seq) technologies, wide variety of cell types such as multiple organs of a healthy person, stem cell niche and cancer stem cell have been found. Such complex systems are composed of communication between cells (cell-cell interaction or CCI).
Many CCI studies are based on the ligand-receptor (L-R)-pair list of FANTOM5 project1 Jordan A. Ramilowski, A draft network of ligand-receptor-mediated multicellular signaling in human, Nature Communications, 2015 as the evidence of CCI (http://fantom.gsc.riken.jp/5/suppl/Ramilowski_et_al_2015/data/PairsLigRec.txt). The project proposed the L-R-candidate genes by following two reasons.
The project also merged the data with previous L-R database such as IUPHAR/DLRP/HPMR and filter out the list without PMIDs.
Besides, the recent L-R databases such as CellPhoneDB and SingleCellSignalR manually curated L-R pairs, which are not listed in IUPHAR/DLRP/HPMR.
In Bader Laboratory, many putative L-R databases are predicted by their standards.
In our framework, we expanded such L-R databases for 134 organisms based on the ortholog relationships. We implemented such a framework as multiple R/Bioconductor annotation packages for sustainable maintenance (LRBaseDbi and LRBase.XXX.eg.db-type packages (Figure 1). XXX is the abbreviation of the scientific name of organisms such as LRBase.Hsa.eg.db for L-R database of Homo sapiens. Besides, we also developed scTensor, which is a method to detect CCI and the CCI-related L-R pairs simultaneously. This document provides the way to use LRBaseDbi, LRBase.XXX.eg.db-type packages, and scTensor package.
To create the L-R-list of 134 organisms, we introduced 36 approarches including known/putative L-R pairing. Please see the evidence code of lrbase-workflow, which is the Snakemake workflow to create LRBase.XXX.eg.db. https://github.com/rikenbit/lrbase-workflow
Some data access functions are available for LRBase.XXX.eg.db-type packages.
Any data table are retrieved by 4 functions defined by
AnnotationDbi; columns
, keytypes
, keys
, and select
and commonly implemented by LRBaseDbi package. columns
returns the rows which we can retrieve in LRBase.XXX.eg.db-type packages.
keytypes
returns the rows which can be used as the optional parameter in
keys
and select functions against LRBase.XXX.eg.db-type packages. keys
function returns the value of keytype. select
function returns the rows in
particular columns, which are having user-specified keys. This function returns
the result as a dataframe. See the vignette of AnnotationDbi
for more details.
## Loading required package: LRBase.Hsa.eg.db
## Loading required package: LRBaseDbi
columns(LRBase.Hsa.eg.db)
## [1] "GENEID_L" "GENEID_R" "SOURCEDB" "SOURCEID"
keytypes(LRBase.Hsa.eg.db)
## [1] "GENEID_L" "GENEID_R" "SOURCEDB" "SOURCEID"
key_HSA <- keys(LRBase.Hsa.eg.db, keytype="GENEID_L")
head(select(LRBase.Hsa.eg.db, keys=key_HSA[1:2],
columns=c("GENEID_L", "GENEID_R"), keytype="GENEID_L"))
## GENEID_L GENEID_R
## 1 4016 14
## 2 344752 14
## 3 4016 2678
## 4 4016 5251
## 5 344752 56670
Other additional functions like species
, nomenclature
, and listDatabases
are available. In each LRBase.XXX.eg.db-type package, species
function
returns the common name and nomenclature
returns the scientific name.
listDatabases
function returns the source of data. dbInfo
returns the
information of the package. dbfile
returns the directory where sqlite
file is stored. dbschema
returns the schema of the database. dbconn
returns
the connection to the sqlite database.
lrPackageName(LRBase.Hsa.eg.db)
## [1] "LRBase.Hsa.eg.db"
lrNomenclature(LRBase.Hsa.eg.db)
## [1] "Homo sapiens"
species(LRBase.Hsa.eg.db)
## [1] "Human"
lrListDatabases(LRBase.Hsa.eg.db)
## SOURCEDB
## 1 SWISSPROT_STRING
## 2 TREMBL_STRING
## 3 SOURCEDB
## 4 IUPHAR
## 5 DLRP
lrVersion(LRBase.Hsa.eg.db)
## NAME VALUE
## 1 LRVERSION 2019
dbInfo(LRBase.Hsa.eg.db)
## NAME VALUE
## 1 SOURCEDATE 7-Oct-2019
## 2 SOURCENAME1 SWISSPROT
## 3 SOURCENAME2 TREMBL
## 4 SOURCENAME3 STRING
## 5 SOURCEURL1 http://www.uniprot.org/uniprot/?query=reviewed:yes
## 6 SOURCEURL2 http://www.uniprot.org/uniprot/?query=reviewed:no
## 7 SOURCEURL3 https://string-db.org/cgi/download.pl
## 8 DBSCHEMA LRBase.Hsa.eg.db
## 9 DBSCHEMAVERSION 1.2.0
## 10 ORGANISM Homo sapiens
## 11 SPECIES Human
## 12 package AnnotationDbi
## 13 Db type LRBaseDb
## 14 LRVERSION 2019
dbfile(LRBase.Hsa.eg.db)
## [1] "/home/biocbuild/bbs-3.12-bioc/R/library/LRBase.Hsa.eg.db/extdata/LRBase.Hsa.eg.db.sqlite"
dbschema(LRBase.Hsa.eg.db)
## [1] "CREATE TABLE `METADATA` (\n `NAME` TEXT,\n `VALUE` TEXT\n)"
## [2] "CREATE TABLE `DATA` (\n `GENEID_L` TEXT,\n `GENEID_R` TEXT,\n `SOURCEID` TEXT,\n `SOURCEDB` TEXT\n)"
dbconn(LRBase.Hsa.eg.db)
## <SQLiteConnection>
## Path: /home/biocbuild/bbs-3.12-bioc/R/library/LRBase.Hsa.eg.db/extdata/LRBase.Hsa.eg.db.sqlite
## Extensions: TRUE
Combined with dbGetQuery
function of RSQLite package,
more complicated queries also can be submitted.
suppressPackageStartupMessages(library("RSQLite"))
dbGetQuery(dbconn(LRBase.Hsa.eg.db),
"SELECT * FROM DATA WHERE GENEID_L = '9068' AND GENEID_R = '14' LIMIT 10")
## [1] GENEID_L GENEID_R SOURCEID SOURCEDB
## <0 rows> (or 0-length row.names)
LRBaseDbi regulates the class definition of LRBaseDb object
instantiated from LRBaseDb
-class. Besides, LRBaseDbi
the package generates user’s original LRBase.XXX.eg.db-type packages by
makeLRBasePackage
function. This function is inspired by our previous package
MeSHDbi, which constructs user’s original MeSH.XXX.eg.db-type
packages. Here we call this function “meta”-packaging. The 12
LRBase.XXX.eg.db-type packages described above are also generated by this
“meta”-packaging. In this case, the only user have to specify are 1. an L-R-list
containing the columns “GENEID_L” (ligand NCBI Gene IDs) and “GENEID_R”
(receptor NCBI Gene IDs) and 2. a meta information table describing the L-R-list.
makeLRBasePackage
function generates LRBase.XXX.eg.db like below. The gene
identifier is limited as NCBI Gene ID for now.
example("makeLRBasePackage")
##
## mkLRBP> if(interactive()){
## mkLRBP+ ## makeLRBasePackage enable users to construct
## mkLRBP+ ## user's own custom LRBase package
## mkLRBP+ data(FANTOM5)
## mkLRBP+ head(FANTOM5)
## mkLRBP+
## mkLRBP+ # We are also needed to prepare meta data as follows.
## mkLRBP+ data(metaFANTOM5)
## mkLRBP+ metaFANTOM5
## mkLRBP+
## mkLRBP+ ## sets up a temporary directory for this example
## mkLRBP+ ## (users won't need to do this step)
## mkLRBP+ tmp <- tempfile()
## mkLRBP+ dir.create(tmp)
## mkLRBP+
## mkLRBP+ ## makes an Organism package for human called Homo.sapiens
## mkLRBP+ makeLRBasePackage(pkgname = "FANTOM5.Hsa.eg.db",
## mkLRBP+ data = FANTOM5,
## mkLRBP+ metadata = metaFANTOM5,
## mkLRBP+ organism = "Homo sapiens",
## mkLRBP+ pkgtitle="An annotation package for the LRBaseDb object",
## mkLRBP+ pkgdescription=paste("Contains the LRBaseDb object",
## mkLRBP+ "to access data from several related annotation packages."),
## mkLRBP+ version = "0.99.0",
## mkLRBP+ maintainer = "Koki Tsuyuzaki <k.t.the-answer@hotmail.co.jp>",
## mkLRBP+ author = "Koki Tsuyuzaki",
## mkLRBP+ destDir = tmp,
## mkLRBP+ license="Artistic-2.0")
## mkLRBP+ }
Although any package name is acceptable, note that if the organism that user summarized L-R-list is also described above (Table ??), same XXX-character is recommended. This is because of the HTML report function described later identifies the XXX-character and if the XXX is corresponding to the 12 organisms, the gene annotation of the generated HTML report will become rich.
Combined with LRBase.XXX.eg.db-type package and user’s gene expression matrix of scRNA-Seq, scTensor detects CCIs and generates HTML reports for exploratory data inspection. The algorithm of scTensor is as follows.
Firstly, scTensor calculates the celltype-level mean vectors, searches the corresponding pair of genes in the row names of the matrix, and extracted as tow vectors.
Next, the cell type-level mean vectors of ligand expression and that of receptor expression are multiplied as outer product and converted to cell type \(\times\) cell type matrix. Here, the multiple matrices can be represented as a three-order “tensor” (Ligand-Cell * Receptor-Cell * L-R-Pair). scTensor decomposes the tensor into a small tensor (core tensor) and two factor matrices. Tensor decomposition is very similar to the matrix decomposition like PCA (principal component analysis). The core tensor is similar to the eigenvalue of PCA; this means that how much the pattern is outstanding. Likewise, three matrices are similar to the PC scores/loadings of PCA; These represent which ligand-cell/receptor-cell/L-R-pair are informative. When the matrices have negative values, interpreting which direction (+/-) is important and which is not, is a difficult and laboring task. That’s why, scTensor performs non-negative Tucker2 decomposition (NTD2), which is non-negative version of tensor decomposition (cf. nnTensor).
Finally, the result of NTD2 is summarized as an HTML report. Because most of the plots are visualized by plotly package, the precise information of the plot can be interactively confirmed by user’s on-site web browser. The two factor matrices can be interactively viewed and which cell types and which L-R-pairs are likely to be interacted each other. The mode-3 (LR-pair direction) sum of the core tensor is calculated and visualized as Ligand-Receptor Patterns. Detail of (Ligand-Cell, Receptor-Cell, L-R-pair) Patterns are also visualized.
SingleCellExperiment
objectHere, we use the scRNA-Seq dataset of male germline cells and somatic cells\(^{3}\) GSE86146 as demo data. For saving the package size, the number of genes is strictly reduced by the standard of highly variable genes with a threshold of the p-value are 1E-150 (cf. Identifying highly variable genes). That’s why we won’t argue about the scientific discussion of the data here.
We assume that user has a scRNA-Seq data matrix containing expression count data summarised at the level of the gene. First, we create a SingleCellExperiment object containing the data. The rows of the object correspond to features, and the columns correspond to cells. The gene identifier is limited as NCBI Gene ID for now.
To improve the interpretability of the following HTML report, we highly recommend
that user specifies the two-dimensional data of input data
(e.g. PCA, t-SNE, or UMAP). Such information is easily specified by
reducedDims
function of SingleCellExperiment package and is
saved to reducedDims slot of SingleCellExperiment
object
(Figure 1).
data(GermMale)
data(labelGermMale)
data(tsneGermMale)
sce <- SingleCellExperiment(assays=list(counts = GermMale))
reducedDims(sce) <- SimpleList(TSNE=tsneGermMale$Y)
plot(reducedDims(sce)[[1]], col=labelGermMale, pch=16, cex=2,
xlab="Dim1", ylab="Dim2", main="Germline, Male, GSE86146")
legend("topleft", legend=c(paste0("FGC_", 1:3), paste0("Soma_", 1:4)),
col=c("#9E0142", "#D53E4F", "#F46D43", "#ABDDA4", "#66C2A5", "#3288BD", "#5E4FA2"),
pch=16)
Note that if you want to use scTensor framework against other species such as mouse or rat, load corresponding LRBase.XXX.eg.db and MeSH.XXX.eg.db packages.
For example, if your scRNA-Seq dataset is sampled from Mouse, load LRBase.Mmu.eg.db and MeSH.Mmu.eg.db instead of LRBase.Hsa.eg.db and MeSH.Hsa.eg.db.
## Loading required package: LRBase.Mmu.eg.db
To perform the tensor decomposition and HTML report, user is supposed to specify
to SingleCellExperiment
object. The corresponding information
is registered to the metadata slot of SingleCellExperiment
object by
cellCellSetting
function.
cellCellSetting(sce, LRBase.Hsa.eg.db, names(labelGermMale))
After cellCellSetting
, we can perform tensor decomposition by
cellCellDecomp
. Here the parameter ranks
is specified as dimension of
core tensor. For example, c(2, 3) means The data tensor is decomposed to
2 ligand-patterns and 3 receptor-patterns.
set.seed(1234)
cellCellDecomp(sce, ranks=c(2,3))
## Input data matrix may contains 7 gene symbols because the name contains some alphabets.
## scTensor uses only NCBI Gene IDs for now.
## Here, the gene symbols are removed and remaining 235 NCBI Gene IDs are used for scTensor next step.
## 7 * 7 * 84 Tensor is created
Although user has to specify the rank to perform cellCellDecomp,
we implemented a simple rank estimation function based on the eigenvalues
distribution of PCA in the matricised tensor in each mode in cellCellRank
.
rks$selected is also specified as rank parameter of cellCellDecomp
.
(rks <- cellCellRanks(sce))
## Each rank, multiple NMF runs are performed
##
|
| | 0%
|
|========== | 14%
|
|==================== | 29%
|
|============================== | 43%
|
|======================================== | 57%
|
|================================================== | 71%
|
|============================================================ | 86%
|
|======================================================================| 100%
## Each rank estimation method
##
|
| | 0%
|
|========== | 14%
|
|==================== | 29%
|
|============================== | 43%
|
|======================================== | 57%
|
|================================================== | 71%
|
|============================================================ | 86%
|
|======================================================================| 100%
## Each rank, multiple NMF runs are performed
##
|
| | 0%
|
|========== | 14%
|
|==================== | 29%
|
|============================== | 43%
|
|======================================== | 57%
|
|================================================== | 71%
|
|============================================================ | 86%
|
|======================================================================| 100%
## Each rank estimation method
##
|
| | 0%
|
|========== | 14%
|
|==================== | 29%
|
|============================== | 43%
|
|======================================== | 57%
|
|================================================== | 71%
|
|============================================================ | 86%
|
|======================================================================| 100%
## $RSS
## $RSS$rss1
## [1] 13.89887249 8.46340572 3.47949123 2.54011316 0.26114430 0.08229159
## [7] 0.04792214
##
## $RSS$rss2
## [1] 12.59431240 5.50515680 1.91806433 0.28930182 0.29463463 0.01390719
## [7] 0.01207860
##
##
## $selected
## [1] 4 3
rks$selected
## [1] 4 3
If cellCellDecomp
is properly finished, we can perform cellCellReport
function to output the HTML report like below. Please type
example(cellCellReport)
and the report will be generated in the temporary
directory (it costs 5 to 10 minutes).
After cellCellReport
, multiple R markdown files, compiled HTML files,
figures, and R binary file containing the result of analysis are saved to
out.dir
(Figure 2). For more details, open the index.html
by your web
browser. Combined with cloud storage service such as Amazon Simple Storage
Service (S3), it can be a simple web application and multiple people like
collaborators can confirm the same report simultaneously.
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] LRBase.Mmu.eg.db_1.2.0 SingleCellExperiment_1.12.0
## [3] SummarizedExperiment_1.20.0 Biobase_2.50.0
## [5] GenomicRanges_1.42.0 GenomeInfoDb_1.26.0
## [7] IRanges_2.24.0 S4Vectors_0.28.0
## [9] BiocGenerics_0.36.0 MatrixGenerics_1.2.0
## [11] matrixStats_0.57.0 scTensor_2.0.0
## [13] RSQLite_2.2.1 LRBase.Hsa.eg.db_1.2.0
## [15] LRBaseDbi_2.0.0 BiocStyle_2.18.0
##
## loaded via a namespace (and not attached):
## [1] rsvd_1.0.3 Hmisc_4.4-1
## [3] ica_1.0-2 Rsamtools_2.6.0
## [5] foreach_1.5.1 lmtest_0.9-38
## [7] crayon_1.3.4 MASS_7.3-53
## [9] nlme_3.1-150 backports_1.1.10
## [11] GOSemSim_2.16.0 MeSHDbi_1.26.0
## [13] rlang_0.4.8 XVector_0.30.0
## [15] ROCR_1.0-11 irlba_2.3.3
## [17] nnTensor_1.0.5 GOstats_2.56.0
## [19] BiocParallel_1.24.0 tagcloud_0.6
## [21] bit64_4.0.5 glue_1.4.2
## [23] sctransform_0.3.1 AnnotationDbi_1.52.0
## [25] dotCall64_1.0-0 tcltk_4.0.3
## [27] DOSE_3.16.0 tidyselect_1.1.0
## [29] fitdistrplus_1.1-1 XML_3.99-0.5
## [31] tidyr_1.1.2 zoo_1.8-8
## [33] GenomicAlignments_1.26.0 xtable_1.8-4
## [35] magrittr_1.5 evaluate_0.14
## [37] ggplot2_3.3.2 zlibbioc_1.36.0
## [39] rstudioapi_0.11 miniUI_0.1.1.1
## [41] rpart_4.1-15 fastmatch_1.1-0
## [43] ensembldb_2.14.0 maps_3.3.0
## [45] fields_11.6 shiny_1.5.0
## [47] xfun_0.18 askpass_1.1
## [49] cluster_2.1.0 tidygraph_1.2.0
## [51] TSP_1.1-10 tibble_3.0.4
## [53] interactiveDisplayBase_1.28.0 ggrepel_0.8.2
## [55] biovizBase_1.38.0 listenv_0.8.0
## [57] dendextend_1.14.0 Biostrings_2.58.0
## [59] png_0.1-7 future_1.19.1
## [61] bitops_1.0-6 ggforce_0.3.2
## [63] RBGL_1.66.0 plyr_1.8.6
## [65] GSEABase_1.52.0 AnnotationFilter_1.14.0
## [67] pillar_1.4.6 GenomicFeatures_1.42.0
## [69] graphite_1.36.0 vctrs_0.3.4
## [71] ellipsis_0.3.1 generics_0.0.2
## [73] plot3D_1.3 MeSH.Aca.eg.db_1.13.0
## [75] outliers_0.14 tools_4.0.3
## [77] foreign_0.8-80 entropy_1.2.1
## [79] munsell_0.5.0 tweenr_1.0.1
## [81] fgsea_1.16.0 DelayedArray_0.16.0
## [83] fastmap_1.0.1 compiler_4.0.3
## [85] abind_1.4-5 httpuv_1.5.4
## [87] rtracklayer_1.50.0 Gviz_1.34.0
## [89] plotly_4.9.2.1 GenomeInfoDbData_1.2.4
## [91] gridExtra_2.3 lattice_0.20-41
## [93] deldir_0.1-29 visNetwork_2.0.9
## [95] AnnotationForge_1.32.0 later_1.1.0.1
## [97] dplyr_1.0.2 BiocFileCache_1.14.0
## [99] jsonlite_1.7.1 concaveman_1.1.0
## [101] scales_1.1.1 graph_1.68.0
## [103] pbapply_1.4-3 genefilter_1.72.0
## [105] lazyeval_0.2.2 promises_1.1.1
## [107] spatstat_1.64-1 MeSH.db_1.13.0
## [109] latticeExtra_0.6-29 goftest_1.2-2
## [111] spatstat.utils_1.17-0 reticulate_1.18
## [113] checkmate_2.0.0 rmarkdown_2.5
## [115] cowplot_1.1.0 schex_1.4.0
## [117] MeSH.Syn.eg.db_1.13.0 webshot_0.5.2
## [119] Rtsne_0.15 dichromat_2.0-0
## [121] BSgenome_1.58.0 uwot_0.1.8
## [123] igraph_1.2.6 survival_3.2-7
## [125] yaml_2.2.1 plotrix_3.7-8
## [127] htmltools_0.5.0 memoise_1.1.0
## [129] VariantAnnotation_1.36.0 rTensor_1.4.1
## [131] Seurat_3.2.2 seriation_1.2-9
## [133] graphlayouts_0.7.1 viridisLite_0.3.0
## [135] digest_0.6.27 assertthat_0.2.1
## [137] ReactomePA_1.34.0 mime_0.9
## [139] rappdirs_0.3.1 registry_0.5-1
## [141] spam_2.5-1 future.apply_1.6.0
## [143] misc3d_0.9-0 data.table_1.13.2
## [145] blob_1.2.1 cummeRbund_2.32.0
## [147] splines_4.0.3 Formula_1.2-4
## [149] AnnotationHub_2.22.0 ProtGenerics_1.22.0
## [151] RCurl_1.98-1.2 hms_0.5.3
## [153] colorspace_1.4-1 base64enc_0.1-3
## [155] BiocManager_1.30.10 nnet_7.3-14
## [157] Rcpp_1.0.5 bookdown_0.21
## [159] RANN_2.6.1 MeSH.PCR.db_1.13.0
## [161] enrichplot_1.10.0 R6_2.4.1
## [163] grid_4.0.3 ggridges_0.5.2
## [165] lifecycle_0.2.0 curl_4.3
## [167] MeSH.Bsu.168.eg.db_1.13.0 leiden_0.3.3
## [169] MeSH.AOR.db_1.13.0 meshr_1.26.0
## [171] DO.db_2.9 Matrix_1.2-18
## [173] qvalue_2.22.0 RcppAnnoy_0.0.16
## [175] org.Hs.eg.db_3.12.0 RColorBrewer_1.1-2
## [177] iterators_1.0.13 stringr_1.4.0
## [179] htmlwidgets_1.5.2 polyclip_1.10-0
## [181] biomaRt_2.46.0 purrr_0.3.4
## [183] shadowtext_0.0.7 reactome.db_1.74.0
## [185] mgcv_1.8-33 globals_0.13.1
## [187] openssl_1.4.3 htmlTable_2.1.0
## [189] patchwork_1.0.1 codetools_0.2-16
## [191] GO.db_3.12.0 prettyunits_1.1.1
## [193] dbplyr_1.4.4 gtable_0.3.0
## [195] DBI_1.1.0 tensor_1.5
## [197] httr_1.4.2 highr_0.8
## [199] KernSmooth_2.23-17 stringi_1.5.3
## [201] progress_1.2.2 reshape2_1.4.4
## [203] farver_2.0.3 heatmaply_1.1.1
## [205] annotate_1.68.0 viridis_0.5.1
## [207] hexbin_1.28.1 fdrtool_1.2.15
## [209] Rgraphviz_2.34.0 magick_2.5.0
## [211] xml2_1.3.2 rvcheck_0.1.8
## [213] Category_2.56.0 BiocVersion_3.12.0
## [215] bit_4.0.4 scatterpie_0.1.5
## [217] jpeg_0.1-8.1 spatstat.data_1.4-3
## [219] ggraph_2.0.3 pkgconfig_2.0.3
## [221] MeSH.Hsa.eg.db_1.13.0 knitr_1.30