SDA {SDAMS} | R Documentation |
This function considers a two-part semi-parametric model for metabolomics, proteomics and single-cell RNA sequencing data. A kernel-smoothed method is applied to estimate the regression coefficients. And likelihood ratio test is constructed for differential abundance/expression analysis.
SDA(sumExp, VOI = NULL, ...)
sumExp |
An object of 'SummarizedExperiment' class. |
VOI |
Variable of interest. Default is NULL, when there is only one covariate, otherwise it must be one of the column names in colData. |
... |
Additional arguments passed to |
The differential abundance/expression analysis is to compare metabolomic or proteomic profiles or gene expression between different experimental groups, which utilizes a two-part model: a logistic regression model to characterize the zero proportion and a semi-parametric model to characterize non-zero values. Let Y_i be the random variable and X_i is a vector of covariates. This two-part model has the following form:
log(pi_i/(1-pi_i))=gamma_0 + gamma*X_i
log(Y_i)=beta*X_i+ epsilon_i
where pi_i=Pr(Y_i=0). The model parameters gamma quantify the covariates effects on the fraction of zero values and gamma_0 is the intercept. beta are the model parameters quantifying the covariates effects on the non-zero values, epsilon_i are independent error terms with a common but completely unspecified density function f.
For differential abundant analysis on data from mass spectrometry, Y_i represents the abundance of certain feature for subject i, pi_i is the probability of point mass. X_i=(X_i1, X_i2,..., X_iQ)^T is a Q-vector of covariates that specifies the treatment conditions applied to subject i. The corresponding Q-vector of model parameters gamma=(gamma_1, gamma_2,...,gamma_Q)^T and beta=(beta_1, beta_2,..., beta_Q)^T quantify the covariates effects for certain feature. Hypothesis testing on the effect of the qth covariate on certain feature is performed by assessing gamma_q and beta_q. Consider the null hypothesis H_0: gamma_q=0 and beta_q=0 against alternative hypothesis H_1: at least one of the two parameters is non-zero. We also consider the hypotheses for testing gamma_q=0 and beta_q=0 individually.
For differential expression analysis on single-cell RNA sequencing data, Y_i represents represents the expression (TPM value) of certain gene in ith cell, pi_i is the drop-out probability. X_i=(Z_i, W_i)^T is a vector of covariates with Z_i being a binary indicator of the cell population under comparison and W_i being a vector of other covariates, e.g. cell size, and gamma =(gamma_Z, gamma_W) and beta= (beta_Z, beta_W) are model parameters. Hypothesis testing on the effect of different cell subpopulations on certain gene is performed by assessing gamma_Z and beta_Z. For each gene, the likelihood ratio test is performed on the null hypothesis H_0: gamma_Z=0 and beta_Z=0 against alternative hypothesis H_1: at least one of the two parameters is non-zero. We also consider the hypotheses for testing gamma_Z=0 and beta_Z=0 individually.
The p-value is calculated based on an asympotic chi-squared distribution. To adjust for multiple comparisons across features, the false discovery discovery rate (FDR) q-value is calculated based on the qvalue function in R/Bioconductor.
A list containing the following components:
gamma |
a matrix of point estimators for gamma_g in the logistic model (binary part) |
beta |
a matrix of point estimators for beta_g in the semi-parametric model (non-zero part) |
pv_gamma |
a matrix of one-part p-values for gamma_g |
pv_beta |
a matrix of one-part p-values for beta_g |
qv_gamma |
a matrix of one-part q-values for gamma_g |
qv_beta |
a matrix of one-part q-values for beta_g |
pv_2part |
a matrix of two-part p-values for overall test |
qv_2part |
a matrix of two-part q-values for overall test |
feat.names |
a vector of feature/gene names |
Yuntong Li <yuntong.li@uky.edu>, Chi Wang <chi.wang@uky.edu>, Li Chen <lichenuky@uky.edu>
##--------- load data ------------ data(exampleSumExp) results = SDA(exampleSumExp) ##------ two part q-values ------- results$qv_2part