tradeSeq
is an R
package that allows analysis of gene expression along trajectories. While it has been developed and applied to single-cell RNA-sequencing (scRNA-seq) data, its applicability extends beyond that, and also allows the analysis of, e.g., single-cell ATAC-seq data along trajectories or bulk RNA-seq time series datasets. For every gene in the dataset, tradeSeq
fits a generalized additive model (GAM) by building on the mgcv
R package. It then allows statistical inference on the GAM by assessing contrasts of the parameters of the fitted GAM model, aiding in interpreting complex datasets. All details about the tradeSeq
model and statistical tests are described in our preprint (Van den Berge et al. 2020).
In this vignette, we analyze a subset of the data from (Paul et al. 2015). A SingleCellExperiment
object of the data has been provided with the tradeSeq
package and can be retrieved as shown below. The data and UMAP reduced dimensions were derived from following the Monocle 3 vignette.
The main vignette focuses on using tradeSeq
downstream of slingshot
. Here, we present how to use tradeSeq
downstream of monocle
(Qiu et al. 2017).
library(tradeSeq)
library(RColorBrewer)
library(SingleCellExperiment)
# For reproducibility
palette(brewer.pal(8, "Dark2"))
data(countMatrix, package = "tradeSeq")
counts <- as.matrix(countMatrix)
rm(countMatrix)
data(celltype, package = "tradeSeq")
As of now (06/2020), monocle3(Cao et al. 2019), is still in its beta version. Therefore, we have no plan yet to include a S4 method for monocle3 while it is not on CRAN or Bioconductor and the format is still moving. However, we present below a way to use tradeSeq
downstream of monocle3
as of version ‘0.2’, for a fully connected graph. We follow the tutorial from the monocle3 website.
You will need to install monocle3 from here before running the code below.
set.seed(22)
library(monocle3)
# Create a cell_data_set object
cds <- new_cell_data_set(counts, cell_metadata = pd,
gene_metadata = data.frame(gene_short_name = rownames(counts),
row.names = rownames(counts)))
# Run PCA then UMAP on the data
cds <- preprocess_cds(cds, method = "PCA")
cds <- reduce_dimension(cds, preprocess_method = "PCA",
reduction_method = "UMAP")
# First display, coloring by the cell types from Paul et al
plot_cells(cds, label_groups_by_cluster = FALSE, cell_size = 1,
color_cells_by = "cellType")
# Running the clustering method. This is necessary to the construct the graph
cds <- cluster_cells(cds, reduction_method = "UMAP")
# Visualize the clusters
plot_cells(cds, color_cells_by = "cluster", cell_size = 1)
# Construct the graph
# Note that, for the rest of the code to run, the graph should be fully connected
cds <- learn_graph(cds, use_partition = FALSE)
# We find all the cells that are close to the starting point
cell_ids <- colnames(cds)[pd$cellType == "Multipotent progenitors"]
closest_vertex <- cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
closest_vertex <- closest_vertex[cell_ids, ]
closest_vertex <- as.numeric(names(which.max(table(closest_vertex))))
mst <- principal_graph(cds)$UMAP
root_pr_nodes <- igraph::V(mst)$name[closest_vertex]
# We compute the trajectory
cds <- order_cells(cds, root_pr_nodes = root_pr_nodes)
plot_cells(cds, color_cells_by = "pseudotime")
library(magrittr)
# Get the closest vertice for every cell
y_to_cells <- principal_graph_aux(cds)$UMAP$pr_graph_cell_proj_closest_vertex %>%
as.data.frame()
y_to_cells$cells <- rownames(y_to_cells)
y_to_cells$Y <- y_to_cells$V1
# Get the root vertices
# It is the same node as above
root <- cds@principal_graph_aux$UMAP$root_pr_nodes
# Get the other endpoints
endpoints <- names(which(igraph::degree(mst) == 1))
endpoints <- endpoints[!endpoints %in% root]
# For each endpoint
cellWeights <- lapply(endpoints, function(endpoint) {
# We find the path between the endpoint and the root
path <- igraph::shortest_paths(mst, root, endpoint)$vpath[[1]]
path <- as.character(path)
# We find the cells that map along that path
df <- y_to_cells[y_to_cells$Y %in% path, ]
df <- data.frame(weights = as.numeric(colnames(cds) %in% df$cells))
colnames(df) <- endpoint
return(df)
}) %>% do.call(what = 'cbind', args = .) %>%
as.matrix()
rownames(cellWeights) <- colnames(cds)
pseudotime <- matrix(pseudotime(cds), ncol = ncol(cellWeights),
nrow = ncol(cds), byrow = FALSE)
sce <- fitGAM(counts = counts,
pseudotime = pseudotime,
cellWeights = cellWeights)
Then, the sce
object can be analyzed following the main vignette.
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
## [3] Biobase_2.54.0 GenomicRanges_1.46.0
## [5] GenomeInfoDb_1.30.0 IRanges_2.28.0
## [7] S4Vectors_0.32.0 BiocGenerics_0.40.0
## [9] MatrixGenerics_1.6.0 matrixStats_0.61.0
## [11] RColorBrewer_1.1-2 tradeSeq_1.8.0
## [13] knitr_1.36
##
## loaded via a namespace (and not attached):
## [1] locfit_1.5-9.4 Rcpp_1.0.7 slingshot_2.2.0
## [4] lattice_0.20-45 assertthat_0.2.1 digest_0.6.28
## [7] utf8_1.2.2 R6_2.5.1 princurve_2.1.6
## [10] evaluate_0.14 ggplot2_3.3.5 pillar_1.6.4
## [13] zlibbioc_1.40.0 rlang_0.4.12 jquerylib_0.1.4
## [16] Matrix_1.3-4 rmarkdown_2.11 splines_4.1.1
## [19] BiocParallel_1.28.0 stringr_1.4.0 igraph_1.2.7
## [22] RCurl_1.98-1.5 munsell_0.5.0 DelayedArray_0.20.0
## [25] compiler_4.1.1 xfun_0.27 pkgconfig_2.0.3
## [28] TrajectoryUtils_1.2.0 mgcv_1.8-38 htmltools_0.5.2
## [31] tidyselect_1.1.1 gridExtra_2.3 tibble_3.1.5
## [34] GenomeInfoDbData_1.2.7 edgeR_3.36.0 viridisLite_0.4.0
## [37] fansi_0.5.0 crayon_1.4.1 dplyr_1.0.7
## [40] bitops_1.0-7 grid_4.1.1 nlme_3.1-153
## [43] jsonlite_1.7.2 gtable_0.3.0 lifecycle_1.0.1
## [46] DBI_1.1.1 magrittr_2.0.1 scales_1.1.1
## [49] pbapply_1.5-0 stringi_1.7.5 XVector_0.34.0
## [52] viridis_0.6.2 limma_3.50.0 bslib_0.3.1
## [55] ellipsis_0.3.2 generics_0.1.1 vctrs_0.3.8
## [58] tools_4.1.1 glue_1.4.2 purrr_0.3.4
## [61] parallel_4.1.1 fastmap_1.1.0 yaml_2.2.1
## [64] colorspace_2.0-2 sass_0.4.0
Cao, Junyue, Malte Spielmann, Xiaojie Qiu, Xingfan Huang, Daniel M. Ibrahim, Andrew J. Hill, Fan Zhang, et al. 2019. “The Dynamics and Regulators of Cell Fate Decisions Are Revealed by Pseudo-Temporal Ordering of Single Cells.” Nature.
Paul, Franziska, Ya’ara Arkin, Amir Giladi, Diego Adhemar Jaitin, Ephraim Kenigsberg, Hadas Keren-Shaul, Deborah Winter, et al. 2015. “Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors.” Cell 163 (7): 1663–77. https://doi.org/10.1016/J.CELL.2015.11.013.
Qiu, Xiaojie, Qi Mao, Ying Tang, Li Wang, Raghav Chawla, Hannah A Pliner, and Cole Trapnell. 2017. “Reversed graph embedding resolves complex single-cell trajectories.” Nature Methods 14 (10): 979–82. https://doi.org/10.1038/nmeth.4402.
Van den Berge, Koen, Hector Roux de Bézieux, Kelly Street, Wouter Saelens, Robrecht Cannoodt, Yvan Saeys, Sandrine Dudoit, and Lieven Clement. 2020. “Trajectory-based differential expression analysis for single-cell sequencing data.” Nature Communications 11 (1): 1201. https://doi.org/10.1038/s41467-020-14766-3.