require(IgGeneUsage)
require(rstan)
require(knitr)
require(ggplot2)
require(ggforce)
require(gridExtra)
require(ggrepel)
require(reshape2)
Decoding the properties of immune repertoires is key in understanding the response of adaptive immunity to challenges such as viral infection. One important property is biases in immunoglobulin (Ig) gene usage between biological conditions (e.g. healthy vs tumor). Yet, most analyses for differential gene usage (DGU) are performed qualitatively, or with inadequate statistical methods. Here we introduce IgGeneUsage, a computational tool for DGU analysis.
The main input of IgGeneUsage is a data.frame that has the following 4 columns:
The sum of all gene usage counts (column 4) for a given repertoire is equal to the repertoire size (number of cells in the repertoire).
IgGeneUsage transforms the provided input in the following way. Given \(R\) repertoires, each having \(G\) genes, IgGeneUsage generates a gene usage matrix \(Y^{R \times G}\). Row sums in \(Y\) define the total usage in each repertoire (\(N\)).
For the analysis of DGU between biological conditions, we designed the following Bayesian model (\(M\)) for zero-inflated beta-binomial regression. This model can fit over-dispersed gene usage data. The immune repertoire data is also not exhaustive, i.e. some Ig genes that are systematically rearranged at low probability might not be sampled, and certain Ig genes are not encoded (or dysfunctional) in some individuals. The zero-inflated component of our model accounts for this.
\[\begin{align} p(Y_{ijk} \mid M) = \begin{cases} \kappa_{j} + (1 - \kappa) \operatorname{BB}\left(0 \mid N_{ijk}, \theta_{ijk}, \phi \right), & \text{if $Y_{ijk}$ = 0} \\ (1 - \kappa_{j}) \operatorname{BB}\left(Y_{ijk} \mid N_{ijk}, \theta_{ijk}, \phi \right), & \text{if $Y_{ijk}$ > 0} \end{cases}\\ \theta_{ij}=\operatorname{logit^{-1}}\left(\alpha_{j}+\beta_{ijk}\right)\\ \beta_{ijk}\sim\operatorname{Normal}\left(\gamma_{jk},\sigma_{k}\right)\\ \gamma_{jk}\sim\operatorname{Normal}\left(0,\tau_{k}\right)\\ \alpha_{j}\sim\operatorname{Normal}\left(\mu_{\alpha},\sigma_{\alpha}\right)\\ \mu_{\alpha} \sim \operatorname{Normal}\left(0, 5\right)\\ \sigma_{k}, \tau_{k}, \sigma_{\alpha} \sim \operatorname{Cauchy^{+}}\left(0, 1\right) \\ \phi \sim \operatorname{Exponential}\left(\tau\right) \\ \tau \sim \operatorname{Gamma}\left(3, 0.1\right) \\ \kappa_{j} \sim \operatorname{Beta}\left({0.1, 1.0\right) \end{align}\]
Model \(M\) legend:
In the output of IgGeneUsage, we report the mean effect size (es or \(\gamma\)) and its 95% highest density interval (HDI). Genes with \(\gamma \neq 0\) (e.g. if 95% HDI of \(\gamma\) excludes 0) are most likely to experience differential usage. Additionally, we report the probability of differential gene usage (\(\pi\)): \[\begin{align} \pi = 2 \cdot \max\left(\int_{\gamma = -\infty}^{0} p(\gamma)\mathrm{d}\gamma, \int_{\gamma = 0}^{\infty} p(\gamma)\mathrm{d}\gamma\right) - 1 \end{align}\] with \(\pi = 1\) for genes with strong differential usage, and \(\pi = 0\) for genes with negligible differential gene usage. Both metrics are computed based on the posterior distribution of \(\gamma\), and are thus related.
IgGeneUsage has a couple of built-in Ig gene usage datasets. Some were obtained from studies and others were simulated.
Lets look into the simulated dataset d_zibb_2
. This dataset was generated by a
zero-inflated beta-binomial (ZIBB) model, and IgGeneUsage
was designed to fit ZIBB-distributed data.
data("d_zibb_2", package = "IgGeneUsage")
knitr::kable(head(d_zibb_2))
sample_id | condition | gene_name | gene_usage_count |
---|---|---|---|
S1 | C1 | G1 | 244 |
S1 | C1 | G2 | 193 |
S1 | C1 | G3 | 139 |
S1 | C1 | G4 | 39 |
S1 | C1 | G5 | 440 |
S1 | C1 | G6 | 15 |
We can also visualize d_zibb_2
with ggplot:
ggplot(data = d_zibb_2)+
geom_point(aes(x = gene_name, y = gene_usage_count, col = condition),
position = position_dodge(width = .7), shape = 21)+
theme_bw(base_size = 11)+
ylab(label = "Gene usage [count]")+
xlab(label = '')+
theme(legend.position = "top")+
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))
As main input IgGeneUsage uses a data.frame formatted as
d_zibb_2
. Other input parameters allow you to configure specific settings
of the rstan sampler.
In this example we analyze d_zibb_2
with 2 MCMC chains, 1500 iterations
each including 500 warm-ups using a single CPU core (Hint: for parallel
chain execution set parameter mcmc_cores
= 2). We report for each model
parameter its mean and 95% highest density interval (HDIs).
Important remark: you should run DGU analyses using default IgGeneUsage parameters. If warnings or errors are reported with regard to the MCMC sampling, please consult the Stan manual1 https://mc-stan.org/misc/warnings.html and adjust the inputs accordingly. If the warnings persist, please submit an issue with a reproducible script at the Bioconductor support site or on Github2 https://github.com/snaketron/IgGeneUsage/issues.
M <- DGU(ud = d_zibb_2, # input data
mcmc_warmup = 500, # how many MCMC warm-ups per chain (default: 500)
mcmc_steps = 1500, # how many MCMC steps per chain (default: 1,500)
mcmc_chains = 3, # how many MCMC chain to run (default: 4)
mcmc_cores = 1, # how many PC cores to use? (e.g. parallel chains)
hdi_lvl = 0.95, # highest density interval level (de fault: 0.95)
adapt_delta = 0.8, # MCMC target acceptance rate (default: 0.95)
max_treedepth = 10) # tree depth evaluated at each step (default: 12)
FALSE
FALSE SAMPLING FOR MODEL 'dgu' NOW (CHAIN 1).
FALSE Chain 1:
FALSE Chain 1: Gradient evaluation took 0.000209 seconds
FALSE Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.09 seconds.
FALSE Chain 1: Adjust your expectations accordingly!
FALSE Chain 1:
FALSE Chain 1:
FALSE Chain 1: Iteration: 1 / 1500 [ 0%] (Warmup)
FALSE Chain 1: Iteration: 150 / 1500 [ 10%] (Warmup)
FALSE Chain 1: Iteration: 300 / 1500 [ 20%] (Warmup)
FALSE Chain 1: Iteration: 450 / 1500 [ 30%] (Warmup)
FALSE Chain 1: Iteration: 501 / 1500 [ 33%] (Sampling)
FALSE Chain 1: Iteration: 650 / 1500 [ 43%] (Sampling)
FALSE Chain 1: Iteration: 800 / 1500 [ 53%] (Sampling)
FALSE Chain 1: Iteration: 950 / 1500 [ 63%] (Sampling)
FALSE Chain 1: Iteration: 1100 / 1500 [ 73%] (Sampling)
FALSE Chain 1: Iteration: 1250 / 1500 [ 83%] (Sampling)
FALSE Chain 1: Iteration: 1400 / 1500 [ 93%] (Sampling)
FALSE Chain 1: Iteration: 1500 / 1500 [100%] (Sampling)
FALSE Chain 1:
FALSE Chain 1: Elapsed Time: 9.801 seconds (Warm-up)
FALSE Chain 1: 9.649 seconds (Sampling)
FALSE Chain 1: 19.45 seconds (Total)
FALSE Chain 1:
FALSE
FALSE SAMPLING FOR MODEL 'dgu' NOW (CHAIN 2).
FALSE Chain 2:
FALSE Chain 2: Gradient evaluation took 0.000167 seconds
FALSE Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.67 seconds.
FALSE Chain 2: Adjust your expectations accordingly!
FALSE Chain 2:
FALSE Chain 2:
FALSE Chain 2: Iteration: 1 / 1500 [ 0%] (Warmup)
FALSE Chain 2: Iteration: 150 / 1500 [ 10%] (Warmup)
FALSE Chain 2: Iteration: 300 / 1500 [ 20%] (Warmup)
FALSE Chain 2: Iteration: 450 / 1500 [ 30%] (Warmup)
FALSE Chain 2: Iteration: 501 / 1500 [ 33%] (Sampling)
FALSE Chain 2: Iteration: 650 / 1500 [ 43%] (Sampling)
FALSE Chain 2: Iteration: 800 / 1500 [ 53%] (Sampling)
FALSE Chain 2: Iteration: 950 / 1500 [ 63%] (Sampling)
FALSE Chain 2: Iteration: 1100 / 1500 [ 73%] (Sampling)
FALSE Chain 2: Iteration: 1250 / 1500 [ 83%] (Sampling)
FALSE Chain 2: Iteration: 1400 / 1500 [ 93%] (Sampling)
FALSE Chain 2: Iteration: 1500 / 1500 [100%] (Sampling)
FALSE Chain 2:
FALSE Chain 2: Elapsed Time: 9.208 seconds (Warm-up)
FALSE Chain 2: 13.094 seconds (Sampling)
FALSE Chain 2: 22.302 seconds (Total)
FALSE Chain 2:
FALSE
FALSE SAMPLING FOR MODEL 'dgu' NOW (CHAIN 3).
FALSE Chain 3:
FALSE Chain 3: Gradient evaluation took 0.000179 seconds
FALSE Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 1.79 seconds.
FALSE Chain 3: Adjust your expectations accordingly!
FALSE Chain 3:
FALSE Chain 3:
FALSE Chain 3: Iteration: 1 / 1500 [ 0%] (Warmup)
FALSE Chain 3: Iteration: 150 / 1500 [ 10%] (Warmup)
FALSE Chain 3: Iteration: 300 / 1500 [ 20%] (Warmup)
FALSE Chain 3: Iteration: 450 / 1500 [ 30%] (Warmup)
FALSE Chain 3: Iteration: 501 / 1500 [ 33%] (Sampling)
FALSE Chain 3: Iteration: 650 / 1500 [ 43%] (Sampling)
FALSE Chain 3: Iteration: 800 / 1500 [ 53%] (Sampling)
FALSE Chain 3: Iteration: 950 / 1500 [ 63%] (Sampling)
FALSE Chain 3: Iteration: 1100 / 1500 [ 73%] (Sampling)
FALSE Chain 3: Iteration: 1250 / 1500 [ 83%] (Sampling)
FALSE Chain 3: Iteration: 1400 / 1500 [ 93%] (Sampling)
FALSE Chain 3: Iteration: 1500 / 1500 [100%] (Sampling)
FALSE Chain 3:
FALSE Chain 3: Elapsed Time: 9.508 seconds (Warm-up)
FALSE Chain 3: 10.748 seconds (Sampling)
FALSE Chain 3: 20.256 seconds (Total)
FALSE Chain 3:
The following objects are provided as part of the output of DGU:
dgu_summary
(main results of IgGeneUsage): quantitative
DGU summarygu_summary
: quantitative summary of the gene usage (GU) of each gene in
the different conditionsglm
: rstan (‘stanfit’) object of the fitted model \(\rightarrow\) used for
model checks (see section ‘Model checking’)ppc
: posterior predictive checks data (see section ‘Model checking’)summary(M)
FALSE Length Class Mode
FALSE dgu_summary 9 data.frame list
FALSE gu_summary 8 data.frame list
FALSE glm 1 stanfit S4
FALSE ppc 2 -none- list
FALSE ud 10 -none- list
Check your model fit. For this, you can use the object glm.
rstan::check_hmc_diagnostics(M$glm)
FALSE
FALSE Divergences:
FALSE
FALSE Tree depth:
FALSE
FALSE Energy:
gridExtra::grid.arrange(rstan::stan_rhat(object = M$glm),
rstan::stan_ess(object = M$glm),
nrow = 1)
The model used by IgGeneUsage is generative, i.e. with the model we can generate usage of each Ig gene in a given repertoire (y-axis). Error bars show 95% HDI of mean posterior prediction. The predictions can be compared with the observed data (x-axis). For points near the diagonal \(\rightarrow\) accurate prediction.
ggplot(data = M$ppc$ppc_rep)+
facet_wrap(facets = ~sample_id, ncol = 5)+
geom_abline(intercept = 0, slope = 1, linetype = "dashed", col = "darkgray")+
geom_errorbar(aes(x = observed_count, y = ppc_mean_count,
ymin = ppc_L_count, ymax = ppc_H_count), col = "darkgray")+
geom_point(aes(x = observed_count, y = ppc_mean_count), size = 1)+
theme_bw(base_size = 11)+
theme(legend.position = "top")+
xlab(label = "Observed usage [counts]")+
ylab(label = "PPC usage [counts]")
Prediction of generalized gene usage within a biological condition is also possible. We show the predictions (y-axis) of the model, and compare them against the observed mean usage (x-axis). If the points are near the diagonal \(\rightarrow\) accurate prediction. Errors are 95% HDIs of the mean.
ggplot(data = M$ppc$ppc_condition)+
geom_errorbar(aes(x = gene_name, ymin = ppc_L_prop*100,
ymax = ppc_H_prop*100, col = condition),
position = position_dodge(width = 0.65), width = 0.1)+
geom_point(aes(x = gene_name, y = ppc_mean_prop*100,col = condition),
position = position_dodge(width = 0.65))+
theme_bw(base_size = 11)+
theme(legend.position = "top")+
xlab(label = "Observed usage [%]")+
ylab(label = "PPC usage [%]")
Each row of glm_summary
summarizes the degree of DGU observed for specific
Igs. Two metrics are reported:
es
(also referred to as \(\gamma\)): effect size on DGU, where contrast
gives the direction of the effect (e.g. tumor - healthy or healthy - tumor)pmax
(also referred to as \(\pi\)): probability of DGU (parameter \(\pi\)
from model \(M\))For es
we also have the mean, median standard error (se), standard
deviation (sd), L (low bound of 95% HDI), H (high bound of 95% HDI)
kable(x = head(M$dgu_summary), row.names = FALSE, digits = 3)
es_mean | es_mean_se | es_sd | es_median | es_L | es_H | contrast | gene_name | pmax |
---|---|---|---|---|---|---|---|---|
0.019 | 0.003 | 0.182 | 0.018 | -0.337 | 0.379 | C1-vs-C2 | G1 | 0.069 |
-0.726 | 0.008 | 0.378 | -0.722 | -1.498 | -0.011 | C1-vs-C2 | G10 | 0.955 |
0.507 | 0.004 | 0.246 | 0.514 | 0.020 | 0.984 | C1-vs-C2 | G11 | 0.964 |
-0.451 | 0.002 | 0.140 | -0.450 | -0.732 | -0.176 | C1-vs-C2 | G12 | 0.998 |
0.957 | 0.004 | 0.244 | 0.952 | 0.487 | 1.431 | C1-vs-C2 | G13 | 1.000 |
0.738 | 0.004 | 0.212 | 0.735 | 0.339 | 1.163 | C1-vs-C2 | G14 | 0.999 |
We know that the values of \gamma
and \pi
are related to each other.
Lets visualize them for all genes (shown as a point). Names are shown for
genes associated with \(\pi \geq 0.9\). Dashed horizontal line represents
null-effect (\(\gamma = 0\)).
Notice that the gene with \(\pi \approx 1\) also has an effect size whose 95% HDI (error bar) does not overlap the null-effect. The genes with high degree of differential usage are easy to detect with this figure.
# format data
stats <- M$dgu_summary
stats <- stats[order(abs(stats$es_mean), decreasing = FALSE), ]
stats$gene_fac <- factor(x = stats$gene_name, levels = unique(stats$gene_name))
ggplot(data = stats)+
geom_hline(yintercept = 0, linetype = "dashed", col = "gray")+
geom_errorbar(aes(x = pmax, y = es_mean, ymin = es_L, ymax = es_H),
col = "darkgray")+
geom_point(aes(x = pmax, y = es_mean, col = contrast))+
geom_text_repel(data = stats[stats$pmax >= 0.9, ],
aes(x = pmax, y = es_mean, label = gene_fac),
min.segment.length = 0, size = 2.75)+
theme_bw(base_size = 11)+
theme(legend.position = "top")+
xlab(label = expression(pi))+
xlim(c(0, 1))+
ylab(expression(gamma))
Lets visualize the observed data of the genes with high probability of differential gene usage (\(\pi \geq 0.9\)). Here we show the gene usage in %.
promising_genes <- stats$gene_name[stats$pmax >= 0.9]
ppc_gene <- M$ppc$ppc_condition
ppc_gene <- ppc_gene[ppc_gene$gene_name %in% promising_genes, ]
ppc_rep <- M$ppc$ppc_rep
ppc_rep <- ppc_rep[ppc_rep$gene_name %in% promising_genes, ]
ggplot()+
geom_point(data = ppc_rep,
aes(x = gene_name, y = observed_prop*100, col = condition),
size = 1, fill = "black",
position = position_jitterdodge(jitter.width = 0.1,
jitter.height = 0,
dodge.width = 0.35))+
geom_errorbar(data = ppc_gene,
aes(x = gene_name, ymin = ppc_L_prop*100,
ymax = ppc_H_prop*100, group = condition),
position = position_dodge(width = 0.35), width = 0.15)+
theme_bw(base_size = 11)+
theme(legend.position = "top")+
ylab(label = "PPC usage [%]")+
xlab(label = '')
Lets also visualize the predicted gene usage counts in each repertoire.
ggplot()+
geom_point(data = ppc_rep,
aes(x = gene_name, y = observed_count, col = condition),
size = 1, fill = "black",
position = position_jitterdodge(jitter.width = 0.1,
jitter.height = 0,
dodge.width = 0.5))+
theme_bw(base_size = 11)+
theme(legend.position = "top")+
ylab(label = "PPC usage [count]")+
xlab(label = '')
IgGeneUsage also reports the inferred gene usage (GU)
probability of individual genes in each condition. For a given gene we
report its mean GU (prob_mean
) and the 95% (for instance) HDI (prob_L
and prob_H
).
ggplot(data = M$gu_summary)+
geom_hline(yintercept = 0, linetype = "dashed", col = "gray")+
geom_errorbar(aes(x = gene_name, y = prob_mean, ymin = prob_L,
ymax = prob_H, col = condition),
width = 0.1, position = position_dodge(width = 0.4))+
geom_point(aes(x = gene_name, y = prob_mean, col = condition), size = 1,
position = position_dodge(width = 0.4))+
theme_bw(base_size = 11)+
theme(legend.position = "top")+
ylab(label = "GU [probability]")
To assert the robustness of the probability of DGU (\(\pi\)) and the effect size (\(\gamma\)), IgGeneUsage has a built-in procedure for fully Bayesian leave-one-out (LOO) analysis.
During each step of LOO, we discard the data of one of the R repertoires, and use the remaining data to analyze for DGU. In each step we record \(\pi\) and \(\gamma\) for all genes, including the mean and 95% HDI of \(\gamma\). We assert quantitatively the robustness of \(\pi\) and \(\gamma\) by evaluating their variability for a specific gene.
This analysis can be computationally demanding.
L <- LOO(ud = d_zibb_2, # input data
mcmc_warmup = 500, # how many MCMC warm-ups per chain (default: 500)
mcmc_steps = 1000, # how many MCMC steps per chain (default: 1,500)
mcmc_chains = 1, # how many MCMC chain to run (default: 4)
mcmc_cores = 1, # how many PC cores to use? (e.g. parallel chains)
hdi_lvl = 0.95, # highest density interval level (de fault: 0.95)
adapt_delta = 0.8, # MCMC target acceptance rate (default: 0.95)
max_treedepth = 10) # tree depth evaluated at each step (default: 12)
FALSE
FALSE SAMPLING FOR MODEL 'dgu' NOW (CHAIN 1).
FALSE Chain 1:
FALSE Chain 1: Gradient evaluation took 0.0002 seconds
FALSE Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2 seconds.
FALSE Chain 1: Adjust your expectations accordingly!
FALSE Chain 1:
FALSE Chain 1:
FALSE Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
FALSE Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
FALSE Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
FALSE Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
FALSE Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
FALSE Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
FALSE Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
FALSE Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
FALSE Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
FALSE Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
FALSE Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
FALSE Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
FALSE Chain 1:
FALSE Chain 1: Elapsed Time: 7.931 seconds (Warm-up)
FALSE Chain 1: 4.417 seconds (Sampling)
FALSE Chain 1: 12.348 seconds (Total)
FALSE Chain 1:
FALSE
FALSE SAMPLING FOR MODEL 'dgu' NOW (CHAIN 1).
FALSE Chain 1:
FALSE Chain 1: Gradient evaluation took 0.000176 seconds
FALSE Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.76 seconds.
FALSE Chain 1: Adjust your expectations accordingly!
FALSE Chain 1:
FALSE Chain 1:
FALSE Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
FALSE Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
FALSE Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
FALSE Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
FALSE Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
FALSE Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
FALSE Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
FALSE Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
FALSE Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
FALSE Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
FALSE Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
FALSE Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
FALSE Chain 1:
FALSE Chain 1: Elapsed Time: 8.244 seconds (Warm-up)
FALSE Chain 1: 4.43 seconds (Sampling)
FALSE Chain 1: 12.674 seconds (Total)
FALSE Chain 1:
FALSE
FALSE SAMPLING FOR MODEL 'dgu' NOW (CHAIN 1).
FALSE Chain 1:
FALSE Chain 1: Gradient evaluation took 0.000173 seconds
FALSE Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.73 seconds.
FALSE Chain 1: Adjust your expectations accordingly!
FALSE Chain 1:
FALSE Chain 1:
FALSE Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
FALSE Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
FALSE Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
FALSE Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
FALSE Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
FALSE Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
FALSE Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
FALSE Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
FALSE Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
FALSE Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
FALSE Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
FALSE Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
FALSE Chain 1:
FALSE Chain 1: Elapsed Time: 8.677 seconds (Warm-up)
FALSE Chain 1: 4.425 seconds (Sampling)
FALSE Chain 1: 13.102 seconds (Total)
FALSE Chain 1:
FALSE
FALSE SAMPLING FOR MODEL 'dgu' NOW (CHAIN 1).
FALSE Chain 1:
FALSE Chain 1: Gradient evaluation took 0.000182 seconds
FALSE Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.82 seconds.
FALSE Chain 1: Adjust your expectations accordingly!
FALSE Chain 1:
FALSE Chain 1:
FALSE Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
FALSE Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
FALSE Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
FALSE Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
FALSE Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
FALSE Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
FALSE Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
FALSE Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
FALSE Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
FALSE Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
FALSE Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
FALSE Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
FALSE Chain 1:
FALSE Chain 1: Elapsed Time: 9.088 seconds (Warm-up)
FALSE Chain 1: 4.434 seconds (Sampling)
FALSE Chain 1: 13.522 seconds (Total)
FALSE Chain 1:
FALSE
FALSE SAMPLING FOR MODEL 'dgu' NOW (CHAIN 1).
FALSE Chain 1:
FALSE Chain 1: Gradient evaluation took 0.00017 seconds
FALSE Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.7 seconds.
FALSE Chain 1: Adjust your expectations accordingly!
FALSE Chain 1:
FALSE Chain 1:
FALSE Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
FALSE Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
FALSE Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
FALSE Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
FALSE Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
FALSE Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
FALSE Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
FALSE Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
FALSE Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
FALSE Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
FALSE Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
FALSE Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
FALSE Chain 1:
FALSE Chain 1: Elapsed Time: 8.653 seconds (Warm-up)
FALSE Chain 1: 4.417 seconds (Sampling)
FALSE Chain 1: 13.07 seconds (Total)
FALSE Chain 1:
FALSE
FALSE SAMPLING FOR MODEL 'dgu' NOW (CHAIN 1).
FALSE Chain 1:
FALSE Chain 1: Gradient evaluation took 0.000166 seconds
FALSE Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.66 seconds.
FALSE Chain 1: Adjust your expectations accordingly!
FALSE Chain 1:
FALSE Chain 1:
FALSE Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
FALSE Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
FALSE Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
FALSE Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
FALSE Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
FALSE Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
FALSE Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
FALSE Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
FALSE Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
FALSE Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
FALSE Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
FALSE Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
FALSE Chain 1:
FALSE Chain 1: Elapsed Time: 8.573 seconds (Warm-up)
FALSE Chain 1: 7.485 seconds (Sampling)
FALSE Chain 1: 16.058 seconds (Total)
FALSE Chain 1:
FALSE
FALSE SAMPLING FOR MODEL 'dgu' NOW (CHAIN 1).
FALSE Chain 1:
FALSE Chain 1: Gradient evaluation took 0.00016 seconds
FALSE Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.6 seconds.
FALSE Chain 1: Adjust your expectations accordingly!
FALSE Chain 1:
FALSE Chain 1:
FALSE Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
FALSE Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
FALSE Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
FALSE Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
FALSE Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
FALSE Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
FALSE Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
FALSE Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
FALSE Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
FALSE Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
FALSE Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
FALSE Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
FALSE Chain 1:
FALSE Chain 1: Elapsed Time: 8.027 seconds (Warm-up)
FALSE Chain 1: 4.432 seconds (Sampling)
FALSE Chain 1: 12.459 seconds (Total)
FALSE Chain 1:
FALSE
FALSE SAMPLING FOR MODEL 'dgu' NOW (CHAIN 1).
FALSE Chain 1:
FALSE Chain 1: Gradient evaluation took 0.000176 seconds
FALSE Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.76 seconds.
FALSE Chain 1: Adjust your expectations accordingly!
FALSE Chain 1:
FALSE Chain 1:
FALSE Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
FALSE Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
FALSE Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
FALSE Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
FALSE Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
FALSE Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
FALSE Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
FALSE Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
FALSE Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
FALSE Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
FALSE Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
FALSE Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
FALSE Chain 1:
FALSE Chain 1: Elapsed Time: 8.221 seconds (Warm-up)
FALSE Chain 1: 4.438 seconds (Sampling)
FALSE Chain 1: 12.659 seconds (Total)
FALSE Chain 1:
FALSE
FALSE SAMPLING FOR MODEL 'dgu' NOW (CHAIN 1).
FALSE Chain 1:
FALSE Chain 1: Gradient evaluation took 0.000165 seconds
FALSE Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.65 seconds.
FALSE Chain 1: Adjust your expectations accordingly!
FALSE Chain 1:
FALSE Chain 1:
FALSE Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
FALSE Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
FALSE Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
FALSE Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
FALSE Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
FALSE Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
FALSE Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
FALSE Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
FALSE Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
FALSE Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
FALSE Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
FALSE Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
FALSE Chain 1:
FALSE Chain 1: Elapsed Time: 9.302 seconds (Warm-up)
FALSE Chain 1: 4.425 seconds (Sampling)
FALSE Chain 1: 13.727 seconds (Total)
FALSE Chain 1:
FALSE
FALSE SAMPLING FOR MODEL 'dgu' NOW (CHAIN 1).
FALSE Chain 1:
FALSE Chain 1: Gradient evaluation took 0.000161 seconds
FALSE Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.61 seconds.
FALSE Chain 1: Adjust your expectations accordingly!
FALSE Chain 1:
FALSE Chain 1:
FALSE Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
FALSE Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
FALSE Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
FALSE Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
FALSE Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
FALSE Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
FALSE Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
FALSE Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
FALSE Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
FALSE Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
FALSE Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
FALSE Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
FALSE Chain 1:
FALSE Chain 1: Elapsed Time: 8.78 seconds (Warm-up)
FALSE Chain 1: 4.41 seconds (Sampling)
FALSE Chain 1: 13.19 seconds (Total)
FALSE Chain 1:
Next, we collected the results (GU and DGU) from each LOO iteration:
L_gu <- do.call(rbind, lapply(X = L, FUN = function(x){return(x$gu_summary)}))
L_dgu <- do.call(rbind, lapply(X = L, FUN = function(x){return(x$dgu_summary)}))
… and plot them:
ggplot(data = L_dgu)+
geom_hline(yintercept = 0, linetype = "dashed", col = "gray")+
geom_errorbar(aes(x = gene_name, y = es_mean, ymin = es_L,
ymax = es_H, col = contrast, group = loo_id),
width = 0.1, position = position_dodge(width = 0.5))+
geom_point(aes(x = gene_name, y = es_mean, col = contrast,
group = loo_id), size = 1,
position = position_dodge(width = 0.5))+
theme_bw(base_size = 11)+
theme(legend.position = "top")+
ylab(expression(gamma))
ggplot(data = L_dgu)+
geom_point(aes(x = gene_name, y = pmax, col = contrast,
group = loo_id), size = 1,
position = position_dodge(width = 0.5))+
theme_bw(base_size = 11)+
theme(legend.position = "top")+
ylab(expression(pi))
ggplot(data = L_gu)+
geom_hline(yintercept = 0, linetype = "dashed", col = "gray")+
geom_errorbar(aes(x = gene_name, y = prob_mean, ymin = prob_L,
ymax = prob_H, col = condition,
group = interaction(loo_id, condition)),
width = 0.1, position = position_dodge(width = 0.5))+
geom_point(aes(x = gene_name, y = prob_mean, col = condition,
group = interaction(loo_id, condition)), size = 1,
position = position_dodge(width = 0.5))+
theme_bw(base_size = 11)+
theme(legend.position = "top")+
ylab("GU [probability]")
sessionInfo()
FALSE R version 4.3.1 (2023-06-16)
FALSE Platform: x86_64-pc-linux-gnu (64-bit)
FALSE Running under: Ubuntu 22.04.3 LTS
FALSE
FALSE Matrix products: default
FALSE BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
FALSE LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
FALSE
FALSE locale:
FALSE [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
FALSE [3] LC_TIME=en_GB LC_COLLATE=C
FALSE [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
FALSE [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
FALSE [9] LC_ADDRESS=C LC_TELEPHONE=C
FALSE [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
FALSE
FALSE time zone: America/New_York
FALSE tzcode source: system (glibc)
FALSE
FALSE attached base packages:
FALSE [1] stats graphics grDevices utils datasets methods base
FALSE
FALSE other attached packages:
FALSE [1] reshape2_1.4.4 ggrepel_0.9.4 gridExtra_2.3
FALSE [4] ggforce_0.4.1 ggplot2_3.4.4 knitr_1.44
FALSE [7] rstan_2.32.3 StanHeaders_2.26.28 IgGeneUsage_1.16.0
FALSE [10] BiocStyle_2.30.0
FALSE
FALSE loaded via a namespace (and not attached):
FALSE [1] tidyselect_1.2.0 dplyr_1.1.3
FALSE [3] farver_2.1.1 loo_2.6.0
FALSE [5] bitops_1.0-7 fastmap_1.1.1
FALSE [7] RCurl_1.98-1.12 tweenr_2.0.2
FALSE [9] digest_0.6.33 lifecycle_1.0.3
FALSE [11] processx_3.8.2 magrittr_2.0.3
FALSE [13] compiler_4.3.1 rlang_1.1.1
FALSE [15] sass_0.4.7 tools_4.3.1
FALSE [17] utf8_1.2.4 yaml_2.3.7
FALSE [19] labeling_0.4.3 prettyunits_1.2.0
FALSE [21] S4Arrays_1.2.0 pkgbuild_1.4.2
FALSE [23] curl_5.1.0 DelayedArray_0.28.0
FALSE [25] plyr_1.8.9 abind_1.4-5
FALSE [27] withr_2.5.1 purrr_1.0.2
FALSE [29] BiocGenerics_0.48.0 grid_4.3.1
FALSE [31] polyclip_1.10-6 stats4_4.3.1
FALSE [33] fansi_1.0.5 colorspace_2.1-0
FALSE [35] inline_0.3.19 scales_1.2.1
FALSE [37] MASS_7.3-60 SummarizedExperiment_1.32.0
FALSE [39] cli_3.6.1 rmarkdown_2.25
FALSE [41] crayon_1.5.2 generics_0.1.3
FALSE [43] RcppParallel_5.1.7 cachem_1.0.8
FALSE [45] stringr_1.5.0 zlibbioc_1.48.0
FALSE [47] parallel_4.3.1 BiocManager_1.30.22
FALSE [49] XVector_0.42.0 matrixStats_1.0.0
FALSE [51] vctrs_0.6.4 V8_4.4.0
FALSE [53] Matrix_1.6-1.1 jsonlite_1.8.7
FALSE [55] bookdown_0.36 callr_3.7.3
FALSE [57] IRanges_2.36.0 S4Vectors_0.40.0
FALSE [59] magick_2.8.1 jquerylib_0.1.4
FALSE [61] tidyr_1.3.0 glue_1.6.2
FALSE [63] codetools_0.2-19 ps_1.7.5
FALSE [65] stringi_1.7.12 gtable_0.3.4
FALSE [67] GenomeInfoDb_1.38.0 QuickJSR_1.0.7
FALSE [69] GenomicRanges_1.54.0 munsell_0.5.0
FALSE [71] tibble_3.2.1 pillar_1.9.0
FALSE [73] htmltools_0.5.6.1 GenomeInfoDbData_1.2.11
FALSE [75] R6_2.5.1 evaluate_0.22
FALSE [77] lattice_0.22-5 Biobase_2.62.0
FALSE [79] bslib_0.5.1 rstantools_2.3.1.1
FALSE [81] Rcpp_1.0.11 SparseArray_1.2.0
FALSE [83] xfun_0.40 MatrixGenerics_1.14.0
FALSE [85] pkgconfig_2.0.3