bugsigdbr 1.0.1
BugSigDB is a manually curated database of microbial signatures from the published literature of differential abundance studies of human and other host microbiomes.
BugSigDB provides:
The bugsigdbr package implements convenient access to BugSigDB from within R/Bioconductor. The goal of the package is to facilitate import of BugSigDB data into R/Bioconductor, provide utilities for extracting microbe signatures, and enable export of the extracted signatures to plain text files in standard file formats such as GMT.
The bugsigdbr package is primarily a data package. For descriptive statistics and comprehensive analysis of BugSigDB contents, please see the BugSigDBStats package and analysis vignette.
We start by loading the package.
library(bugsigdbr)
The function importBugSigDB
can be used to import the complete collection of
curated signatures from BugSigDB. The dataset is downloaded once and
subsequently cached. Use cache = FALSE
to force a fresh download of BugSigDB
and overwrite the local copy in your cache.
bsdb <- importBugSigDB()
dim(bsdb)
#> [1] 2064 48
colnames(bsdb)
#> [1] "Study" "Study design"
#> [3] "PMID" "DOI"
#> [5] "URL" "Authors"
#> [7] "Title" "Journal"
#> [9] "Year" "Experiment"
#> [11] "Location of subjects" "Host species"
#> [13] "Body site" "UBERON ID"
#> [15] "Condition" "EFO ID"
#> [17] "Group 0 name" "Group 1 name"
#> [19] "Group 1 definition" "Group 0 sample size"
#> [21] "Group 1 sample size" "Antibiotics exclusion"
#> [23] "Sequencing type" "16S variable region"
#> [25] "Sequencing platform" "Statistical test"
#> [27] "Significance threshold" "MHT correction"
#> [29] "LDA Score above" "Matched on"
#> [31] "Confounders controlled for" "Pielou"
#> [33] "Shannon" "Chao1"
#> [35] "Simpson" "Inverse Simpson"
#> [37] "Richness" "Signature page name"
#> [39] "Source" "Curated date"
#> [41] "Curator" "Revision editor"
#> [43] "Description" "Abundance in Group 1"
#> [45] "MetaPhlAn taxon names" "NCBI Taxonomy IDs"
#> [47] "State" "Reviewer"
Each row of the resulting data.frame
corresponds to a microbe signature from
differential abundance analysis, i.e. a set of microbes that has been found
with increased or decreased abundance in one sample group when compared to
another sample group (eg. in a case-vs.-control setup).
The curated signatures are richly annotated with additional metadata columns
providing information on study design, antibiotics exclusion criteria,
sample size, and experimental and statistical procedures, among others.
Subsetting the full dataset to certain conditions, body sites, or other
metadata columns of interest can be done along the usual lines for
subsetting data.frame
s.
For example, the following subset
command restricts the dataset to signatures
obtained from microbiome studies on obesity, based on fecal samples from
participants in the US.
us.obesity.feces <- subset(bsdb,
`Location of subjects` == "United States of America" &
Condition == "obesity" &
`Body site` == "feces")
Given the full BugSigDB collection (or a subset of interest), the function
getSignatures
can be used to obtain the microbes annotated to each signature.
Microbes annotated to a signature are returned following the NCBI Taxonomy nomenclature per default.
sigs <- getSignatures(bsdb)
length(sigs)
#> [1] 2064
sigs[1:3]
#> $`bsdb:1/1/1_adenoma:conventional-adenoma-cases_vs_controls_UP`
#> [1] "91061" "1236" "1654" "1716" "1301" "162289" "189330" "33024"
#> [9] "40544" "2037" "2049" "506" "186826" "1300" "31977" "91347"
#> [17] "1653" "57037" "1386" "186817"
#>
#> $`bsdb:1/1/2_adenoma:conventional-adenoma-cases_vs_controls_DOWN`
#> [1] "100883" "1117"
#>
#> $`bsdb:1/2/1_hyperplastic-polyp:hyperplastic-polyp-cases_vs_controls_UP`
#> [1] "207244" "57037"
It is also possible obtain signatures based on the full taxonomic classification in MetaPhlAn format …
mp.sigs <- getSignatures(bsdb, tax.id.type = "metaphlan")
mp.sigs[1:3]
#> $`bsdb:1/1/1_adenoma:conventional-adenoma-cases_vs_controls_UP`
#> [1] "k__Bacteria|p__Firmicutes|c__Bacilli"
#> [2] "k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria"
#> [3] "k__Bacteria|p__Actinobacteria|c__Actinomycetia|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces"
#> [4] "k__Bacteria|p__Actinobacteria|c__Actinomycetia|o__Corynebacteriales|f__Corynebacteriaceae|g__Corynebacterium"
#> [5] "k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Streptococcaceae|g__Streptococcus"
#> [6] "k__Bacteria|p__Firmicutes|c__Tissierellia|o__Tissierellales|f__Peptoniphilaceae|g__Peptoniphilus"
#> [7] "k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Dorea"
#> [8] "k__Bacteria|p__Firmicutes|c__Negativicutes|o__Acidaminococcales|f__Acidaminococcaceae|g__Phascolarctobacterium"
#> [9] "k__Bacteria|p__Proteobacteria|c__Betaproteobacteria|o__Burkholderiales|f__Sutterellaceae|g__Sutterella"
#> [10] "k__Bacteria|p__Actinobacteria|c__Actinomycetia|o__Actinomycetales"
#> [11] "k__Bacteria|p__Actinobacteria|c__Actinomycetia|o__Actinomycetales|f__Actinomycetaceae"
#> [12] "k__Bacteria|p__Proteobacteria|c__Betaproteobacteria|o__Burkholderiales|f__Alcaligenaceae"
#> [13] "k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales"
#> [14] "k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Streptococcaceae"
#> [15] "k__Bacteria|p__Firmicutes|c__Negativicutes|o__Veillonellales|f__Veillonellaceae"
#> [16] "k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales"
#> [17] "k__Bacteria|p__Actinobacteria|c__Actinomycetia|o__Corynebacteriales|f__Corynebacteriaceae"
#> [18] "k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Lactobacillaceae|g__Lacticaseibacillus|s__Lacticaseibacillus zeae"
#> [19] "k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Bacillaceae|g__Bacillus"
#> [20] "k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Bacillaceae"
#>
#> $`bsdb:1/1/2_adenoma:conventional-adenoma-cases_vs_controls_DOWN`
#> [1] "k__Bacteria|p__Firmicutes|c__Erysipelotrichia|o__Erysipelotrichales|f__Coprobacillaceae|g__Coprobacillus"
#> [2] "k__Bacteria|p__Cyanobacteria"
#>
#> $`bsdb:1/2/1_hyperplastic-polyp:hyperplastic-polyp-cases_vs_controls_UP`
#> [1] "k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Anaerostipes"
#> [2] "k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Lactobacillaceae|g__Lacticaseibacillus|s__Lacticaseibacillus zeae"
… or using the taxonomic name only:
tn.sigs <- getSignatures(bsdb, tax.id.type = "taxname")
tn.sigs[1:3]
#> $`bsdb:1/1/1_adenoma:conventional-adenoma-cases_vs_controls_UP`
#> [1] "Bacilli" "Gammaproteobacteria"
#> [3] "Actinomyces" "Corynebacterium"
#> [5] "Streptococcus" "Peptoniphilus"
#> [7] "Dorea" "Phascolarctobacterium"
#> [9] "Sutterella" "Actinomycetales"
#> [11] "Actinomycetaceae" "Alcaligenaceae"
#> [13] "Lactobacillales" "Streptococcaceae"
#> [15] "Veillonellaceae" "Enterobacterales"
#> [17] "Corynebacteriaceae" "Lacticaseibacillus zeae"
#> [19] "Bacillus" "Bacillaceae"
#>
#> $`bsdb:1/1/2_adenoma:conventional-adenoma-cases_vs_controls_DOWN`
#> [1] "Coprobacillus" "Cyanobacteria"
#>
#> $`bsdb:1/2/1_hyperplastic-polyp:hyperplastic-polyp-cases_vs_controls_UP`
#> [1] "Anaerostipes" "Lacticaseibacillus zeae"
As metagenomic profiling with 16S RNA sequencing or whole-metagenome shotgun sequencing is typically conducted on a certain taxonomic level, it is also possible to obtain signatures restricted to eg. the genus level …
gn.sigs <- getSignatures(bsdb,
tax.id.type = "taxname",
tax.level = "genus")
gn.sigs[1:3]
#> $`bsdb:1/1/1_adenoma:conventional-adenoma-cases_vs_controls_UP`
#> [1] "Actinomyces" "Corynebacterium" "Streptococcus"
#> [4] "Peptoniphilus" "Dorea" "Phascolarctobacterium"
#> [7] "Sutterella" "Bacillus"
#>
#> $`bsdb:1/1/2_adenoma:conventional-adenoma-cases_vs_controls_DOWN`
#> [1] "Coprobacillus"
#>
#> $`bsdb:1/2/1_hyperplastic-polyp:hyperplastic-polyp-cases_vs_controls_UP`
#> [1] "Anaerostipes"
… or the species level:
gn.sigs <- getSignatures(bsdb,
tax.id.type = "taxname",
tax.level = "species")
gn.sigs[1:3]
#> $`bsdb:1/1/1_adenoma:conventional-adenoma-cases_vs_controls_UP`
#> [1] "Lacticaseibacillus zeae"
#>
#> $`bsdb:1/2/1_hyperplastic-polyp:hyperplastic-polyp-cases_vs_controls_UP`
#> [1] "Lacticaseibacillus zeae"
#>
#> $`bsdb:1/6/1_adenoma:Non-advanced-conventional-adenoma-cases_vs_controls_UP`
#> [1] "Lacticaseibacillus zeae"
Note that restricting signatures to microbes given at the genus level, will per default exclude microbes given at a more specific taxonomic rank such as species or strain.
For certain applications, it might be desirable to not exclude microbes given
at a more specific taxonomic rank, but rather extract the more general
tax.level
for microbes given at a more specific taxonomic level.
This can be achieved by setting the argument exact.tax.level
to FALSE
,
which will here extract genus level taxon names, for taxa given at the species
or strain level.
gn.sigs <- getSignatures(bsdb,
tax.id.type = "taxname",
tax.level = "genus",
exact.tax.level = FALSE)
gn.sigs[1:3]
#> $`bsdb:1/1/1_adenoma:conventional-adenoma-cases_vs_controls_UP`
#> [1] "Actinomyces" "Corynebacterium" "Streptococcus"
#> [4] "Peptoniphilus" "Dorea" "Phascolarctobacterium"
#> [7] "Sutterella" "Lacticaseibacillus" "Bacillus"
#>
#> $`bsdb:1/1/2_adenoma:conventional-adenoma-cases_vs_controls_DOWN`
#> [1] "Coprobacillus"
#>
#> $`bsdb:1/2/1_hyperplastic-polyp:hyperplastic-polyp-cases_vs_controls_UP`
#> [1] "Anaerostipes" "Lacticaseibacillus"
Once signatures have been extracted using a taxonomic identifier type of
choice, the function writeGMT
allows to write the signatures to plain text
files in GMT format.
writeGMT(sigs, gmt.file = "bugsigdb_signatures.gmt")
This is the standard file format for gene sets used by MSigDB and GeneSigDB and is compatible with most enrichment analysis software.
Leveraging BugSigDB’s semantic MediaWiki web interface, we can also programmatically access annotations for individual microbes and microbe signatures.
The browseSignature
function can be used to display BugSigDB signature pages
in an interactive session. For programmatic access in a non-interactive
setting, the URL of the signature page is returned.
browseSignature(names(sigs)[1])
#> [1] "https://bugsigdb.org/Study_1/Experiment_1/Signature_1"
Analogously, the browseTaxon
function displays BugSigDB taxon pages in an
interactive session, or the URL of the corresponding taxon page otherwise.
browseTaxon(sigs[[1]][1])
#> [1] "https://bugsigdb.org/Special:RunQuery/Taxon?Taxon%5BNCBI%5D=91061&_run=1"
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] bugsigdbr_1.0.1 BiocStyle_2.22.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.7 bslib_0.3.1 compiler_4.1.1
#> [4] pillar_1.6.4 BiocManager_1.30.16 jquerylib_0.1.4
#> [7] dbplyr_2.1.1 tools_4.1.1 digest_0.6.28
#> [10] bit_4.0.4 tibble_3.1.5 jsonlite_1.7.2
#> [13] BiocFileCache_2.2.0 RSQLite_2.2.8 evaluate_0.14
#> [16] memoise_2.0.0 lifecycle_1.0.1 pkgconfig_2.0.3
#> [19] rlang_0.4.12 DBI_1.1.1 filelock_1.0.2
#> [22] curl_4.3.2 yaml_2.2.1 xfun_0.27
#> [25] fastmap_1.1.0 withr_2.4.2 httr_1.4.2
#> [28] stringr_1.4.0 dplyr_1.0.7 knitr_1.36
#> [31] rappdirs_0.3.3 generics_0.1.1 sass_0.4.0
#> [34] vctrs_0.3.8 tidyselect_1.1.1 bit64_4.0.5
#> [37] glue_1.4.2 R6_2.5.1 fansi_0.5.0
#> [40] rmarkdown_2.11 bookdown_0.24 purrr_0.3.4
#> [43] blob_1.2.2 magrittr_2.0.1 ellipsis_0.3.2
#> [46] htmltools_0.5.2 assertthat_0.2.1 utf8_1.2.2
#> [49] stringi_1.7.5 cachem_1.0.6 crayon_1.4.2