tradeSeq
is an R
package that allows analysis of gene expression along trajectories. While it has been developed and applied to single-cell RNA-sequencing (scRNA-seq) data, its applicability extends beyond that, and also allows the analysis of, e.g., single-cell ATAC-seq data along trajectories or bulk RNA-seq time series datasets. For every gene in the dataset, tradeSeq
fits a generalized additive model (GAM) by building on the mgcv
R package. It then allows statistical inference on the GAM by assessing contrasts of the parameters of the fitted GAM model, ading in interpreting complex datasets. All details about the tradeSeq
model and statistical tests are described in our preprint (Van den Berge et al. 2019).
In this vignette, we analyse a subset of the data from (Paul et al. 2015). A SingleCellExperiment
object of the data has been provided with the tradeSeq
package and can be retrieved as shown below. The data and UMAP reduced dimensions were derived from following the Monocle 3 vignette.
To install the package, simply run
if(!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("tradeSeq")
library(tradeSeq)
library(RColorBrewer)
library(SingleCellExperiment)
library(slingshot)
# For reproducibility
RNGversion("3.5.0")
palette(brewer.pal(8, "Dark2"))
data(countMatrix, package = "tradeSeq")
counts <- countMatrix
rm(countMatrix)
data(crv, package = "tradeSeq")
data(celltype, package = "tradeSeq")
We will fit developmental trajectories using the slingshot
package (Street et al. 2018). slingshot
requires cluster labels as input, and fits trajectories in reduced dimension. We will use the reduced space calculated with the UMAP method, which is pre-calculated in the se
object. We cluster the data using k-means with \(7\) clusters. Since we know which cells are the progenitor cell type, we define the starting point for the trajectories as input for slingshot
. Note that this argument is optional, and not required to run slingshot
.
set.seed(200)
rd <- reducedDims(crv)
cl <- kmeans(rd, centers = 7)$cluster
plot(rd, col = brewer.pal(9, "Set1")[cl], pch = 16, asp = 1,
cex = 2/3)
lin <- getLineages(rd, clusterLabels = cl, start.clus = 1)
## Using full covariance matrix
crv <- getCurves(lin)
We find two lineages for this dataset. The trajectory can be visualized using the plotGeneCount
function, using either the cluster labels or cell type to color the cells.
plotGeneCount(curve = crv, clusters = cl)
plotGeneCount(curve = crv, clusters = celltype,
title = "Colored by cell type")
After estimating the trajectory, we can fit generalized additive models (GAMs) with the tradeSeq
package. Internally, this package builds on the mgcv
package by fitting additive models using the gam
function. The core function from tradeSeq
, fitGAM
, will use cubic splines as basis functions, and it tries to ensure that every lineage will end at a knot point of a smoother. By default, we allow for \(6\) knots for every lineage, but this can be changed with the nknots
argument. More knots will allow more flexibility, but also increase the risk of overfitting.
Ideally, the number of knots should be selected to reach an optimal bias-variance trade-off for the smoother, where one explains as much variability in the expression data as possible with only a few regression coefficients. In order to guide that choice, we developed diagnostic plots using the Akaike Informaction Criterion (AIC). This is implemented in the evaluateK
function in tradeSeq
. The function takes as input the expression count matrix, a SlingshotDataSet
(or alternatively, a matrix of pseudotimes and cell-level weights). The range of knots to evaluate is provided with the knots
argument. The minimum allowed number of knots is \(3\). While there is no boundary on the maximum number of knots, typically the interesting range is around \(3\) to \(10\) knots. The function will fit NB-GAM models for some number of genes, provided by the nGenes
argument, over the range of knots that are provided, and return the AIC for each gene fitted with each number of knots. It is generally a good idea to evaluate this multiple times using different seeds (using the set.seed
function), to check whether the results are reproducible across different gene subsets.
This task can be computationally demanding, since the models must be fit multiple times for each gene. We therefore skip this in the vignette, but show the output one can expect instead.
icMat <- evaluateK(counts = counts, sds = crv, k=3:20, nGenes = 200,
verbose=FALSE)
# alternatively:
icMat <- evaluateK(counts = counts, k=3:20, nGenes = 200,
pseudotime = slingPseudotime(crv, na = FALSE),
cellWeights = slingCurveWeights(crv))
The output graphics are organized into four panels. The left panel plots a boxplot for each number of knots we wanted to evaluate. The plotted values are the deviation from a gene’s AIC at that specific knot value from the average AIC of that gene across all the knots under evaluation. Typically, AIC values are somewhat higher for low number of knots, and we expect them to decrease as the number of knots gets higher. The two middle panels plot the average drop in AIC across all genes. The middle left panel simply plots the average AIC, while the middle right panel plots the change in AIC relative to the average AIC at the lowest knot number (here, this is 3 knots, as can also be seen from the plot since the relative AIC equals \(1\)). Finally, rhe right panel only plots a subset of genes where the AIC value changes significantly across the evaluated number of knots. Here, a significant change is defined as a change in absolute value of at least \(2\), but this can be tuned using the aicDiff
argument to evaluateK
. For the subset of genes, a barplot is displayed that shows the number of genes that have their lowest AIC at a specific knot value.
The middle panels show that the drop in AIC levels off if the number of knots is increased beyond \(6\). In the right panel, \(6\) knots also corresponds the highest number of genes with lowest AIC value. Based on these plots, we thus believe that fitting the NB-GAM models with \(6\) knots is an appropriate choice.
By default, the GAM model estimates one smoother for every lineage using the negative binomial distribution. If you want to allow for other fixed effects (e.g., batch effects), then an additional model matrix, typically created using the model.matrix
function, can be provided with the U
argument. The precise model definition of the statistical model is described in our preprint (Van den Berge et al. 2019).
We use the effective library size, estimated with TMM (Robinson and Oshlack 2010), as offset in the model. We allow for alternatives by allowing a user-defined offset with the offset
argument.
This dataset consists of UMI counts, and we do not expect zero inflation to be a big problem. However, we also allow to fit zero inflated negative binomial (ZINB) GAMs by providing observation-level weights to fitGAM
using the weights
argument. The weights
must correspond to the posterior probability that a count belongs to the count component of the ZINB distribution (Van den Berge et al. 2018).
For the vignette, we fit smoothers for a filtered set of genes in the dataset, 239 genes in total. We also include the Irf8 gene, since it is a known transcription factor involved in hematopoiesis.
The fitGAM
function relies on BiocParallel to implement parallelization, progress bars and so on. Similar to evaluateK
, fitGAM can either take a SlingshotDataSet
object as input (sds
argument), or a matrix of pseudotimes and cell-level weights (pseudotime
and cellWeights
argument). If a SlingshotDataSet
is provided, the function will return a SingleCellExperiment
object that contains the essential output from tradeSeq
. This is much more efficient than providing the pseudotime and cell-level weights as matrices, when a list of GAM models will be returned.
While in this vignette we will proceed with using the sds
argument, hence a SingleCellExperiment
object as output, tradeSeq
allows input from any trajectory inference method with the pseudotime
and cellWeights
arguments. All functions work with both the SingleCellExperiment
(i.e., sds
input to fitGAM
) output as well as the list output (i.e., pseudotime
and cellWeights
input to fitGAM
).
Because cells are assigned to a lineage based on their weigths, the result of fitGAM
is stochastic. While this should have little impact on the overall results in practice, users are therefore encouraged to use the set.seed
function before running fitGAM
to ensure reproducibility of their analyses.
library(BiocParallel)
library(magrittr)
# Register BiocParallel Serial Execution (no parallelization in that case)
BiocParallel::register(BiocParallel::SerialParam())
counts <- as.matrix(counts)
sce <- fitGAM(counts = counts,
sds = crv)
# This takes about 1mn to run
You can also plot the cells in reduced dimension to see where the knots are located.
plotGeneCount(curve = crv, counts = counts, clusters = cl,
models = sce)
A first exploration of the data analysis may consist in checking whether gene expression is associated with a particular lineage. The statistical test performed here, implemented in the associationTest
function, is testing the null hypothesis that all smoother coefficients are equal to each other. This can be interpreted as testing whether the smoothed gene expression is significantly changing along pseudotime.
assoRes <- associationTest(sce)
head(assoRes)
## waldStat df pvalue
## Acin1 129.63714 9 0.000000e+00
## Actb 446.76458 9 0.000000e+00
## Ak2 94.92015 9 2.220446e-16
## Alad 186.01902 9 0.000000e+00
## Alas1 972.18980 9 0.000000e+00
## Aldoa 249.30745 9 0.000000e+00
In order to discover marker genes of the progenitor cell population, researchers may be interested in assessing differential expression between the progenitor cell population (i.e., the starting point of a lineage) with the differentiated cell type population (i.e., the end point of a lineage). In the function startVsEndTest
, we have implemented a Wald test that tests the null hypothesis that the expression at the starting point of the smoother (progenitor population) is identical to the expression at the end point of the smoother (differentiated population). The test basically involves a comparison between two smoother coefficients for every lineage. The function startVsEndTest
performs an omnibus test across all lineages by default, but you can also assess all lineages separately by setting lineages=TRUE
. Below, we adopt an omnibus test across the two lineages.
startRes <- startVsEndTest(sce)
We can visualize the estimated smoothers for the most significant gene.
oStart <- order(startRes$waldStat, decreasing = TRUE)
sigGeneStart <- names(sce)[oStart[1]]
plotSmoothers(sce, counts, gene = sigGeneStart)
Alternatively, we can color the cells in UMAP space with that gene’s expression.
plotGeneCount(crv, counts, gene = sigGeneStart)
The startVsEndTest
compares two points on a lineage, and by default it is comparing the inception point with the end point. However, this is a specific form of a more general capability of the startVsEndTest
to compare any two points on any lineage. If the interest lies in comparing any two custom pseudotime values, one can specify this using the pseudotimeValues
arguments in startVsEndTest
. For example, below we’d like to compare the expression for each gene at pseudotime values of \(0.8\) and \(0.1\).
customRes <- startVsEndTest(sce, pseudotimeValues = c(0.1, 0.8))
tradeSeq
can discover marker genes for the differentiated cell types by comparing the end points of the lineage-specific smoothers. This is implemented in the diffEndTest
function. By default, diffEndTest
performs an omnibus test, testing the null hypothesis that the endpoint expression is equal for all lineages using a multivariate Wald test. If more than two trajectories are present, one can assess all pairwise comparisons using the pairwise=TRUE
argument.
endRes <- diffEndTest(sce)
We can plot the most significant gene using the plotSmoothers
function.
o <- order(endRes$waldStat, decreasing = TRUE)
sigGene <- names(sce)[o[1]]
plotSmoothers(sce, counts, sigGene)
Alternatively, we can color the cells in UMAP space with that gene’s expression.
plotGeneCount(crv, counts, gene = sigGene)
Asides from testing at the level of the differentiated cell type, researchers may be interested in assessing the expression pattern of a gene over pseudotime. The function patternTest
implements a statistical method that checks whether the smoothed gene expression is equal along pseudotime between two or multiple lineages. In practice, we use \(100\) points, equally distributed along pseudotime, that are compared between two (or multiple) lineages, and this number can be changed using the nPoints
argument.
patternRes <- patternTest(sce)
oPat <- order(patternRes$waldStat, decreasing = TRUE)
head(rownames(patternRes)[oPat])
## [1] "Mpo" "Prtn3" "Car2" "Ctsg" "Elane" "H2afy"
plotSmoothers(sce, counts, gene = rownames(patternRes)[oPat][1])
plotGeneCount(crv, counts, gene = rownames(patternRes)[oPat][1])
We find genes at the top that are also ranked as DE for the differentiated cell type. What is especially interesting are genes that have different expression patterns but no different expression at the differentiated cell type level. We therefore sort the genes according to the sum of square of their rank in increasing Wald statistics for the patternTest and their rank in decreasing Wald statistics for the diffEndTest.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:matrixStats':
##
## count
## The following objects are masked from 'package:GenomicRanges':
##
## intersect, setdiff, union
## The following object is masked from 'package:GenomeInfoDb':
##
## intersect
## The following objects are masked from 'package:IRanges':
##
## collapse, desc, intersect, setdiff, slice, union
## The following objects are masked from 'package:S4Vectors':
##
## first, intersect, rename, setdiff, setequal, union
## The following object is masked from 'package:Biobase':
##
## combine
## The following objects are masked from 'package:BiocGenerics':
##
## combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
##
## extract
## The following object is masked from 'package:S4Vectors':
##
## expand
compare <- inner_join(patternRes %>% mutate(Gene = rownames(patternRes),
pattern = waldStat) %>%
select(Gene, pattern),
endRes %>% mutate(Gene = rownames(endRes),
end = waldStat) %>%
select(Gene, end),
by = c("Gene" = "Gene")) %>%
mutate(transientScore = (min_rank(desc(end)))^2 +
(dense_rank(pattern))^2)
ggplot(compare, aes(x = log(pattern), y = log(end))) +
geom_point(aes(col = transientScore)) +
labs(x = "patternTest Wald Statistic (log scale)",
y = "diffEndTest Wald Statistic (log scale)") +
scale_color_continuous(low = "yellow", high = "red") +
theme_classic()
Or, we can visualize the expression in UMAP space of the top gene.
topTransient <- (compare %>% arrange(desc(transientScore)))[1, "Gene"]
plotSmoothers(sce, counts, gene = topTransient)
plotGeneCount(crv, counts, gene = topTransient)
Interestingly, we recover the Irf8 gene in the top 5 genes according to that ranking.
head(compare %>% arrange(desc(transientScore)) %>% select(Gene), n = 5)
## Gene
## 1 Nedd4
## 2 Mt1
## 3 Irf8
## 4 Hint1
## 5 Hsp90ab1
We can also plot the Irf8 gene.
plotSmoothers(sce, counts, gene = "Irf8")
plotGeneCount(crv, counts, gene = "Irf8")
Another question of interest is to find a list of genes that are differentially expressed around the separation of two or multiple lineages. The function earlyDETest
implements a statistical method to tests the null hypothesis of whether the smoothers are equal between two user-specified knots by building on the patternTest
, but restricting itself to a particular location of the smoothers. Again, the knots can be visualized with the plotGeneCount
function. By selecting the region covering the first two knot points to test for differential patterns between the lineages, we check which genes are behaving differently around the bifurcation point.
plotGeneCount(curve = crv, counts = counts, clusters = cl,
models = sce)
earlyDERes <- earlyDETest(sce, knots = c(1, 2))
oEarly <- order(earlyDERes$waldStat, decreasing = TRUE)
head(rownames(earlyDERes)[oEarly])
## [1] "Car1" "Mpo" "H2afy" "Car2" "Prtn3" "Ctsg"
plotSmoothers(sce, counts, gene = rownames(earlyDERes)[oEarly][2])
plotGeneCount(crv, counts, gene = rownames(earlyDERes)[oEarly][2])
tradeSeq provides the functionality to cluster genes according to their expression pattern along the lineages with the clusterExpressionPatterns
function. A number of equally spaced points for every lineage are selected to perform the clustering, and the number of points can be selected with the nPoints
argument. The genes
argument specifies which genes you want to cluster (e.g., all genes with differential expression patterns). Here, we use 20 points along each lineage to cluster the first 40 genes in the dataset. The clustering itself occurs by the clusterExperiment
package (Risso et al. 2018), hence the user may select any clustering algorithm that’s built into that package, or custom clustering algorithms implemented by the user. For a list of built-in clustering algorithms within clusterExperiment
, run clusterExperiment::listBuiltInFunctions()
on the command line.
library(clusterExperiment)
nPointsClus <- 20
clusPat <- clusterExpressionPatterns(sce, nPoints = nPointsClus,
genes = rownames(counts)[1:200])
## 36 parameter combinations, 36 use sequential method, 36 use subsampling method
## Running Clustering on Parameter Combinations...
## done.
clusterLabels <- primaryCluster(clusPat$rsec)
The first 4 clusters can be visualized using the normalized expression upon which the clustering is based. Please note that the code below would only works for a trajectory with two lineages. Modify the code appropriately if using with a dataset with 3 lineages or more.
library(cowplot)
##
## ********************************************************
## Note: As of version 1.0.0, cowplot does not change the
## default ggplot2 theme anymore. To recover the previous
## behavior, execute:
## theme_set(theme_cowplot())
## ********************************************************
cUniq <- unique(clusterLabels)
cUniq <- cUniq[!cUniq == -1] # remove unclustered genes
plots <- list()
for (xx in cUniq[1:4]) {
cId <- which(clusterLabels == xx)
p <- ggplot(data = data.frame(x = 1:nPointsClus,
y = rep(range(clusPat$yhatScaled[cId, ]),
nPointsClus / 2)),
aes(x = x, y = y)) +
geom_point(alpha = 0) +
labs(title = paste0("Cluster ", xx), x = "Pseudotime", y = "Normalized expression") +
theme_classic()
for (ii in 1:length(cId)) {
geneId <- rownames(clusPat$yhatScaled)[cId[ii]]
p <- p +
geom_line(data = data.frame(x = rep(1:nPointsClus, 2),
y = clusPat$yhatScaled[geneId, ],
lineage = rep(0:1, each = nPointsClus)),
aes(col = as.character(lineage), group = lineage), lwd = 1.5)
}
p <- p + guides(color = FALSE) +
scale_color_manual(values = c("orange", "darkseagreen3"),
breaks = c("0", "1"))
plots[[as.character(xx)]] <- p
}
plots$ncol <- 2
do.call(plot_grid, plots)
If another method than Slingshot is used for trajectory inference, one can input custom pseudotimes and cell-level weights in fitGAM
, as we also discussed above. The output from fitGAM
will be different in that case, and less memory efficient. All functions we have discussed above work exactly the same with the list output. However, the list output functionality is a little bit bigger, and here we discuss some capabilities that are only available with the list output.
gamList <- fitGAM(counts,
pseudotime = slingPseudotime(crv, na = FALSE),
cellWeights = slingCurveWeights(crv),
nknots = 6)
First, one may explore the results of a model by requesting its summary.
summary(gamList[["Irf8"]])
##
## Family: Negative Binomial(2.236)
## Link function: log
##
## Formula:
## y ~ -1 + U + s(t1, by = l1, bs = "cr", id = 1, k = nknots) +
## s(t2, by = l2, bs = "cr", id = 1, k = nknots) + offset(offset)
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## U -10.0554 0.5571 -18.05 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(t1):l1 5.696 5.954 579.61 <2e-16 ***
## s(t2):l2 3.432 3.850 12.36 0.0125 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Rank: 12/13
## R-sq.(adj) = 0.534 Deviance explained = 64.2%
## -REML = 2024.4 Scale est. = 1 n = 2660
Related to the associationTest
, one can extract the p-values generated by the mgcv
package using the getSmootherPvalues
function. These p-values are derived from a test that assesses the null hypothesis that all smoother coefficients are equal to zero. Note, however, that their interpretation is thus more complex. A significant lineage for a particular gene might thus be the result of (a) a different mean expression in that lineage as compared to the overall expression of that gene, or (b) significantly varying expression along that lineage, even if the means are equal, or (c) a combination of both. This function extracts the p-values calculated by mgcv
from the GAM, and will return NA
for genes that we were unable to fit properly. Similarly, the test statistics may be extracted with getSmootherTestStats
. Since this dataset was pre-filtered to only contain relevant genes, all p-values (test statistics) will be very low (high). Note, that these functions are only applicable with the list output of tradeSeq
, and not with the SingleCellExperiment
output. We will therefore not evaluate these here.
pvalLineage <- getSmootherPvalues(gamList)
statLineage <- getSmootherTestStats(gamList)
If you’re working with a dataset that has a limited number of cells, or if you are incorporating zero inflation weights, the GAMs may be harder to fit, as noted by the warnings when running fitGAM
. In that case, the situation might improve if you allow for more iterations in the GAM fitting. This can be done with the control
argument of fitGAM
.
library(mgcv)
## Loading required package: nlme
##
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
##
## collapse
## The following object is masked from 'package:IRanges':
##
## collapse
## This is mgcv 1.8-31. For overview type 'help("mgcv-package")'.
control <- gam.control()
control$maxit <- 1000 #set maximum number of iterations to 1K
# pass to control argument of fitGAM as below:
#
# gamList <- fitGAM(counts = counts,
# pseudotime = slingPseudotime(crv, na = FALSE),
# cellWeights = slingCurveWeights(crv),
# control = control)
To recapitulate the workflow, we have created a cheatsheet that users can refer to when deciding which tests to run.
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
##
## Random number generation:
## RNG: Mersenne-Twister
## Normal: Inversion
## Sample: Rounding
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] mgcv_1.8-31 nlme_3.1-145
## [3] cowplot_1.0.0 clusterExperiment_2.6.1
## [5] tidyr_1.0.2 ggplot2_3.3.0
## [7] dplyr_0.8.5 magrittr_1.5
## [9] slingshot_1.4.0 princurve_2.1.4
## [11] SingleCellExperiment_1.8.0 SummarizedExperiment_1.16.1
## [13] DelayedArray_0.12.2 BiocParallel_1.20.1
## [15] matrixStats_0.56.0 GenomicRanges_1.38.0
## [17] GenomeInfoDb_1.22.1 IRanges_2.20.2
## [19] S4Vectors_0.24.3 RColorBrewer_1.1-2
## [21] tradeSeq_1.0.1 bigmemory_4.5.36
## [23] Biobase_2.46.0 BiocGenerics_0.32.0
## [25] knitr_1.28
##
## loaded via a namespace (and not attached):
## [1] bigmemory.sri_0.1.3 colorspace_1.4-1 ellipsis_0.3.0
## [4] XVector_0.26.0 farver_2.0.3 bit64_0.9-7
## [7] AnnotationDbi_1.48.0 RSpectra_0.16-0 fansi_0.4.1
## [10] xml2_1.3.0 codetools_0.2-16 splines_3.6.3
## [13] doParallel_1.0.15 ade4_1.7-15 locfdr_1.1-8
## [16] phylobase_0.8.10 gridBase_0.4-7 annotate_1.64.0
## [19] cluster_2.1.0 kernlab_0.9-29 png_0.1-7
## [22] HDF5Array_1.14.3 compiler_3.6.3 httr_1.4.1
## [25] assertthat_0.2.1 Matrix_1.2-18 lazyeval_0.2.2
## [28] limma_3.42.2 cli_2.0.2 htmltools_0.4.0
## [31] prettyunits_1.1.1 tools_3.6.3 igraph_1.2.5
## [34] gtable_0.3.0 glue_1.4.0 GenomeInfoDbData_1.2.2
## [37] reshape2_1.4.3 Rcpp_1.0.4 softImpute_1.4
## [40] zinbwave_1.8.0 NMF_0.22.0 vctrs_0.2.4
## [43] ape_5.3 iterators_1.0.12 xfun_0.12
## [46] stringr_1.4.0 lifecycle_0.2.0 rngtools_1.5
## [49] XML_3.99-0.3 edgeR_3.28.1 zlibbioc_1.32.0
## [52] MASS_7.3-51.5 scales_1.1.0 MAST_1.12.0
## [55] hms_0.5.3 rhdf5_2.30.1 yaml_2.2.1
## [58] pbapply_1.4-2 memoise_1.1.0 pkgmaker_0.31.1
## [61] stringi_1.4.6 RSQLite_2.2.0 genefilter_1.68.0
## [64] foreach_1.5.0 bibtex_0.4.2.2 rlang_0.4.5
## [67] pkgconfig_2.0.3 bitops_1.0-6 rncl_0.8.4
## [70] evaluate_0.14 lattice_0.20-41 Rhdf5lib_1.8.0
## [73] purrr_0.3.3 labeling_0.3 bit_1.1-15.2
## [76] tidyselect_1.0.0 plyr_1.8.6 R6_2.4.1
## [79] DBI_1.1.0 pillar_1.4.3 withr_2.1.2
## [82] abind_1.4-5 survival_3.1-11 RCurl_1.98-1.1
## [85] tibble_3.0.0 crayon_1.3.4 uuid_0.1-4
## [88] howmany_0.3-1 rmarkdown_2.1 jpeg_0.1-8.1
## [91] progress_1.2.2 RNeXML_2.4.3 locfit_1.5-9.4
## [94] grid_3.6.3 data.table_1.12.8 blob_1.2.1
## [97] digest_0.6.25 xtable_1.8-4 munsell_0.5.0
## [100] viridisLite_0.3.0 registry_0.5-1
Paul, Franziska, Ya’ara Arkin, Amir Giladi, Diego Adhemar Jaitin, Ephraim Kenigsberg, Hadas Keren-Shaul, Deborah Winter, et al. 2015. “Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors.” Cell 163 (7). Cell Press:1663–77. https://doi.org/10.1016/J.CELL.2015.11.013.
Risso, Davide, Liam Purvis, Russell B. Fletcher, Diya Das, John Ngai, Sandrine Dudoit, and Elizabeth Purdom. 2018. “clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets.” Edited by Aaron E. Darling. PLOS Computational Biology 14 (9). Public Library of Science:e1006378. https://doi.org/10.1371/journal.pcbi.1006378.
Robinson, Mark D, and Alicia Oshlack. 2010. “A scaling normalization method for differential expression analysis of RNA-seq data.” Genome Biology 11 (3). BioMed Central:R25. https://doi.org/10.1186/gb-2010-11-3-r25.
Street, Kelly, Davide Risso, Russell B. Fletcher, Diya Das, John Ngai, Nir Yosef, Elizabeth Purdom, and Sandrine Dudoit. 2018. “Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics.” BMC Genomics 19 (1). BioMed Central:477. https://doi.org/10.1186/s12864-018-4772-0.
Van den Berge, Koen, Fanny Perraudeau, Charlotte Soneson, Michael I. Love, Davide Risso, Jean-Philippe Vert, Mark D. Robinson, Sandrine Dudoit, and Lieven Clement. 2018. “Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications.” Genome Biology 19 (1). BioMed Central:24. https://doi.org/10.1186/s13059-018-1406-4.
Van den Berge, Koen, Hector Roux de Bézieux, Kelly Street, Wouter Saelens, Robrecht Cannoodt, Yvan Saeys, Sandrine Dudoit, and Lieven Clement. 2019. “Trajectory-based differential expression analysis for single-cell sequencing data.” bioRxiv, May. Cold Spring Harbor Laboratory, 623397. https://doi.org/10.1101/623397.