dgsa_seq {dearseq} | R Documentation |
Wrapper function for performing gene set analysis of (potentially longitudinal) RNA-seq data
dgsa_seq( exprmat = NULL, object = NULL, covariates = NULL, variables2test, weights_var2test_condi = TRUE, genesets, sample_group = NULL, cov_variables2test_eff = NULL, which_test = c("permutation", "asymptotic"), which_weights = c("loclin", "voom", "none"), n_perm = 1000, progressbar = TRUE, parallel_comp = TRUE, nb_cores = parallel::detectCores() - 1, preprocessed = FALSE, gene_based_weights = TRUE, bw = "nrd", kernel = c("gaussian", "epanechnikov", "rectangular", "triangular", "biweight", "tricube", "cosine", "optcosine"), exact = FALSE, transform = TRUE, padjust_methods = c("BH", "BY", "holm", "hochberg", "hommel", "bonferroni"), lowess_span = 0.5, R = NULL, adaptive = TRUE, max_adaptive = 64000, homogen_traj = FALSE, na.rm_gsaseq = TRUE, verbose = TRUE )
exprmat |
a numeric matrix of size |
object |
an object that can be either an
|
covariates |
If |
variables2test |
|
weights_var2test_condi |
a logical flag indicating whether
heteroscedasticity weights computation should be conditional on both the
variable(s) to be tested |
genesets |
Can be either:
Can be a vector of index or subscripts that defines which
rows of Can also be a Finally, can also be a If |
sample_group |
a vector of length |
cov_variables2test_eff |
a matrix of size |
which_test |
a character string indicating which method to use to
approximate the variance component score test, either |
which_weights |
a character string indicating which method to use to
estimate the mean-variance relationship weights. Possibilities are
|
n_perm |
the number of perturbations. Default is |
progressbar |
logical indicating wether a progressBar should be displayed when computing permutations (only in interactive mode). |
parallel_comp |
a logical flag indicating whether parallel computation
should be enabled. Only Linux and MacOS are supported, this is ignored on
Windows. Default is |
nb_cores |
an integer indicating the number of cores to be used when
|
preprocessed |
a logical flag indicating whether the expression data have
already been preprocessed (e.g. log2 transformed). Default is |
gene_based_weights |
a logical flag used for |
bw |
a character string indicating the smoothing bandwidth selection
method to use. See |
kernel |
a character string indicating which kernel should be used.
Possibilities are |
exact |
a logical flag indicating whether the non-parametric weights
accounting for the mean-variance relationship should be computed exactly or
extrapolated from the interpolation of local regression of the mean against
the variance. Default is |
transform |
a logical flag used for |
padjust_methods |
multiple testing correction method used if
|
lowess_span |
smoother span for the lowess function, between 0 and 1.
This gives the proportion of points in the plot which influence the smooth at
each value. Larger values give more smoothness. Only used if
|
R |
library size (optional, important to provide if
|
adaptive |
a logical flag indicating whether adaptive permutation should
be performed. Default is |
max_adaptive |
The maximum number of permutations considered.
Default is |
homogen_traj |
a logical flag indicating whether trajectories should be
considered homogeneous. Default is |
na.rm_gsaseq |
logical: should missing values in |
verbose |
logical: should informative messages be printed during the
computation? Default is |
A list with the following elements:
which_test
: a character string carrying forward the value of
the 'which_test
' argument indicating which test was perform (either
'asymptotic' or 'permutation').
preprocessed
: a logical flag carrying forward the value of the
'preprocessed
' argument indicating whether the expression data were
already preprocessed, or were provided as raw counts and transformed into
log-counts per million.
n_perm
: an integer carrying forward the value of the
'n_perm
' argument indicating the number of perturbations performed
(NA
if asymptotic test was performed).
genesets
: carrying forward the value of the 'genesets
'
argument defining the gene sets of interest (NULL
for gene-wise t
esting).
pval
: computed p-values. A data.frame
with one raw for
each each gene set, or for each gene if genesets
argument is
NULL
, and with 2 columns: the first one 'rawPval
' contains
the raw p-values, the second one contains the FDR adjusted p-values
(according to the 'padjust_methods
' argument) and is named
'adjPval
'.
Agniel D & Hejblum BP (2017). Variance component score test for time-course gene set analysis of longitudinal RNA-seq data, Biostatistics, 18(4):589-604. 10.1093/biostatistics/kxx005. arXiv:1605.02351.
Law, C. W., Chen, Y., Shi, W., & Smyth, G. K. (2014). voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, 15(2), R29.
sp_weights
vc_test_perm
vc_test_asym
p.adjust
nsims <- 2 #100 res_quant <- list() for(i in 1:2){ n <- 2000#0 nr <- 3 r <- nr*20 #4*nr#100*nr t <- matrix(rep(1:nr), r/nr, ncol=1, nrow=r) sigma <- 0.4 b0 <- 1 #under the null: b1 <- 0 y.tilde <- b0 + b1*t + rnorm(r, sd = sigma) y <- t(matrix(rnorm(n*r, sd = sqrt(sigma*abs(y.tilde))), ncol=n, nrow=r) + matrix(rep(y.tilde, n), ncol=n, nrow=r)) x <- matrix(1, ncol=1, nrow=r) #run test res <- dgsa_seq(exprmat = y, covariates = x, variables2test = t, genesets=lapply(0:9, function(x){x*10+(1:10)}), cov_variables2test_eff = matrix(1), sample_group = rep(1:(r/nr), each=nr), which_test='asymptotic', which_weights='none', preprocessed=TRUE) res_genes <- dgsa_seq(exprmat = y, covariates = x, variables2test = cbind(t),#, rnorm(r)), #t^2 genesets = NULL, cov_variables2test_eff = diag(1), sample_group = rep(1:(r/nr), each=nr), which_test = 'asymptotic', which_weights = 'none', preprocessed = TRUE) length(res_genes$pvals[, 'rawPval']) quantile(res_genes$pvals[, 'rawPval']) res_quant[[i]] <- res_genes$pvals[, 'rawPval'] } #round(rowMeans(vapply(res_quant, FUN = quantile, FUN.VALUE = rep(1.1, 5)), 3) #plot(density(unlist(res_quant))) #mean(unlist(res_quant)<0.05) if(interactive()){ res_genes <- dgsa_seq(exprmat = y, covariates = x, variables2test = t, genesets = NULL, cov_variables2test_eff = matrix(1), sample_group = rep(1:(r/nr), each=nr), which_test = 'permutation', which_weights = 'none', preprocessed = TRUE, n_perm = 1000, parallel_comp = FALSE) mean(res_genes$pvals$rawPval < 0.05) summary(res_genes$pvals$adjPval) }