Introduction
This is the tutorial for ADAPT
(analysis of microbiome differential abundance analysis by pooling Tobit models). Differential abundance analysis (DAA) is one of the fundamental steps in microbiome research.1 The metagenomics count data have excessive zeros and are compositional, causing challenges for detecting microbiome taxa whose abundances are associated with different conditions. ADAPT
overcomes these challenges differently from other DAA methods. First, ADAPT
treats zero counts as left censored observations and apply Tobit models which are censored regressions2 for modeling the log count ratios. Second, ADAPT
has an innovative and theoretically justified way of finding non-differentially abundant taxa as reference taxa. It then applies Tobit models to log count ratios between individual taxa and the summed counts of reference taxa to identify differentially abundant taxa. The overall workflow is presented in the graph below.
There are two ways to install the package, either through GitHub or Bioconductor.
# install through GitHub
if(!require("ADAPT")) BiocManager::install("mkbwang/ADAPT", build_vignettes = TRUE)
# install through Bioconductor
if(!require("ADAPT")) BiocManager::install("ADAPT", version="devel",
build_vignettes = TRUE)
Main function adapt
The main function adapt
takes in a phyloseq3 object that needs to have at least a count table and a sample metadata. The metadata must contain one variable that represents the condition to test on. We currently allow binary and continuous conditions. If there are more than two groups for the condition variable, the user may single out one group to compare with all the others for DAA. ADAPT
allows adjusting for extra covariates. There are eight arguments in the adapt
function:
input_data
is the phyloseq object with the count matrix and metadata. This argument is required.
cond.var
is the variable in the metadata that represents the condition to test on. This argument is required.
base.cond
is the group that the user is interested in comparing against other groups. This argument is only needed if the condition is categorical.
adj.var
contains the covariates to adjust for. It can be empty or be a vector of variable names.
censor
is the value to censor at for all the zero counts. By default the zeros are censored at one.
prev.filter
is the threshold for filtering out rare taxa. By default taxa with prevalence lower than 0.05 are filtered out before analysis.
depth.filter
is the threshold for filtering out samples with low sequencing depths. The default threshold is 1000 reads.
alpha
is the cutoff for the Benjamini-hochberg adjusted p-values to decide differentially abundant taxa. The default is 0.05.
The ADAPT
package contains two metagenomics datasets from an early childhood dental caries study.4 One corresponds to 16S rRNA sequencing of 161 saliva samples collected from 12-month-old infants (ecc_saliva
). The other corresponds to shotgun metagenomic sequencing of 30 plaque samples collected from kids between 36 and 60 months old (ecc_plaque
). In this vignette let’s use the saliva data to find out differentially abundant taxa between kids who developed early childhood caries (ECC) after 36 months old and those who didn’t. Besides the main variable CaseStatus
, there is another variable Site
representing the site where each sample was collected. Both variables are discrete and contain two categories each.
library(ADAPT)
data(ecc_saliva)
We can run adapt
with or without covariate adjustment. The returned object is a customized S4-type object with slots corresponding to analysis name, reference taxa, differentially abundant taxa, detailed analysis results (a dataframe) and input phyloseq object.
saliva_output_noadjust <- adapt(input_data=ecc_saliva, cond.var="CaseStatus", base.cond="Control")
#> Choose 'Control' as the baseline condition
#> 155 taxa and 161 samples being analyzed...
#> Selecting Reference Set... 77 taxa selected as reference
#> 27 differentially abundant taxa detected
saliva_output_adjust <- adapt(input_data=ecc_saliva, cond.var="CaseStatus", base.cond="Control", adj.var="Site")
#> Choose 'Control' as the baseline condition
#> 155 taxa and 161 samples being analyzed...
#> Selecting Reference Set... 77 taxa selected as reference
#> 28 differentially abundant taxa detected
The Site
is not a confounding variable, therefore the DAA results only differ by one taxon.
Explore Analysis Results
We have developed two utility functions for result exploration. summary
returns a dataframe with the estimated log10 abundance fold changes, hypothesis test p-values and taxonomy (if provided in the phyloseq object). The user can choose to return results of all the taxa, the differentially abundant taxa only or the reference taxa only through the select
variable (“all”, “da” or “ref”). We choose to return the results of only the DA taxa below.
DAtaxa_result <- summary(saliva_output_noadjust, select="da")
#> Differential abundance analysis for CaseStatus (Case VS Control)
#> 155 taxa and 161 samples analyzed
#> 77 taxa are selected as reference and 27 taxa are differentially abundant
head(DAtaxa_result)
#> Taxa prevalence log10foldchange teststat pval adjusted_pval
#> ASV3 ASV3 0.9751553 -0.3618780 9.313926 2.274187e-03 1.855258e-02
#> ASV8 ASV8 0.9689441 0.3567131 9.881400 1.669578e-03 1.437693e-02
#> ASV9 ASV9 0.8695652 -0.8925900 19.157084 1.203899e-05 4.665108e-04
#> ASV14 ASV14 0.7204969 0.9203963 13.159234 2.861058e-04 4.031491e-03
#> ASV18 ASV18 0.7080745 -1.3929574 28.068324 1.171070e-07 9.075792e-06
#> ASV20 ASV20 0.4099379 2.2046044 31.758612 1.745735e-08 2.705889e-06
#> Kingdom Phylum Class Order
#> ASV3 Bacteria Proteobacteria Gammaproteobacteria Pasteurellales
#> ASV8 Bacteria Firmicutes Bacilli Lactobacillales
#> ASV9 Bacteria Proteobacteria Betaproteobacteria Neisseriales
#> ASV14 Bacteria Firmicutes Bacilli Lactobacillales
#> ASV18 Bacteria Fusobacteria Fusobacteriia Fusobacteriales
#> ASV20 Bacteria Bacteroidetes Bacteroidia Bacteroidales
#> Family Genus Species
#> ASV3 Pasteurellaceae Haemophilus parainfluenzae
#> ASV8 Streptococcaceae Streptococcus salivarius
#> ASV9 Neisseriaceae Neisseria subflava
#> ASV14 Streptococcaceae Streptococcus <NA>
#> ASV18 Fusobacteriaceae Fusobacterium periodonticum
#> ASV20 Prevotellaceae Prevotella <NA>
We have also developed a plot
function to generate a volcano plot. The DA taxa are highlighted in red. The user can decide how many points with the smallest p-values are labeled (default 5) with n.label
argument.
plot(saliva_output_noadjust, n.label=5)