brendaDb aims to make importing and analyzing data from the BRENDA database easier. The main functions include:
tibble
For bug reports or feature requests, please go to the GitHub repository.
brendaDb is a Bioconductor package and can be installed through BiocManager::install()
.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("brendaDb", dependencies=TRUE)
Alternatively, install the development version from GitHub.
if(!requireNamespace("brendaDb")) {
devtools::install_github("y1zhou/brendaDb")
}
After the package is installed, it can be loaded into the R workspace by
library(brendaDb)
Download the BRENDA database as a text file here. Alternatively, download the file in R (file updated 2019-04-24):
brenda.filepath <- DownloadBrenda()
#> Please read the license agreement in the link below.
#>
#> https://www.brenda-enzymes.org/download_brenda_without_registration.php
#>
#> Found zip file in cache.
#> Extracting zip file...
The function downloads the file to a local cache directory. Now the text file can be loaded into R as a tibble
:
df <- ReadBrenda(brenda.filepath)
#> Reading BRENDA text file...
#> Converting text into a list. This might take a while...
#> Converting list to tibble and removing duplicated entries...
#> If you're going to use this data again, consider saving this table using data.table::fwrite().
As suggested in the function output, you may save the df
object to a text file using data.table::fwrite()
or to an R object using save(df)
, and load the table using data.table::fread()
or load()
1 This requires the R package data.table to be installed.. Both methods should be much faster than reading the raw text file again using ReadBrenda()
.
Since BRENDA is a database for enzymes, all final queries are based on EC numbers.
If you already have a list of EC numbers in mind, you may call QueryBrenda
directly:
brenda_txt <- system.file("extdata", "brenda_download_test.txt",
package = "brendaDb")
df <- ReadBrenda(brenda_txt)
#> Reading BRENDA text file...
#> Converting text into a list. This might take a while...
#> Converting list to tibble and removing duplicated entries...
#> If you're going to use this data again, consider saving this table using data.table::fwrite().
res <- QueryBrenda(df, EC = c("1.1.1.1", "6.3.5.8"), n.core = 2)
res
#> A list of 2 brenda.entry object(s) with:
#> - 1 regular brenda.entry object(s)
#> 1.1.1.1
#> - 1 transferred or deleted object(s)
#> 6.3.5.8
res[["1.1.1.1"]]
#> Entry 1.1.1.1
#> ├── nomenclature
#> | ├── ec: 1.1.1.1
#> | ├── systematic.name: alcohol:NAD+ oxidoreductase
#> | ├── recommended.name: alcohol dehydrogenase
#> | ├── synonyms: A tibble with 128 rows
#> | ├── reaction: A tibble with 2 rows
#> | └── reaction.type: A tibble with 3 rows
#> ├── interactions
#> | ├── substrate.product: A tibble with 772 rows
#> | ├── natural.substrate.product: A tibble with 20 rows
#> | ├── cofactor: A tibble with 7 rows
#> | ├── metals.ions: A tibble with 20 rows
#> | ├── inhibitors: A tibble with 207 rows
#> | └── activating.compound: A tibble with 22 rows
#> ├── parameters
#> | ├── km.value: A tibble with 878 rows
#> | ├── turnover.number: A tibble with 495 rows
#> | ├── ki.value: A tibble with 34 rows
#> | ├── pi.value: A tibble with 11 rows
#> | ├── ph.optimum: A tibble with 55 rows
#> | ├── ph.range: A tibble with 28 rows
#> | ├── temperature.optimum: A tibble with 29 rows
#> | ├── temperature.range: A tibble with 20 rows
#> | ├── specific.activity: A tibble with 88 rows
#> | └── ic50: A tibble with 2 rows
#> ├── organism
#> | ├── organism: A tibble with 159 rows
#> | ├── source.tissue: A tibble with 63 rows
#> | └── localization: A tibble with 9 rows
#> ├── molecular
#> | ├── stability
#> | | ├── general.stability: A tibble with 15 rows
#> | | ├── storage.stability: A tibble with 15 rows
#> | | ├── ph.stability: A tibble with 20 rows
#> | | ├── organic.solvent.stability: A tibble with 25 rows
#> | | ├── oxidation.stability: A tibble with 3 rows
#> | | └── temperature.stability: A tibble with 36 rows
#> | ├── purification: A tibble with 48 rows
#> | ├── cloned: A tibble with 46 rows
#> | ├── engineering: A tibble with 60 rows
#> | ├── renatured: A tibble with 1 rows
#> | └── application: A tibble with 5 rows
#> ├── structure
#> | ├── molecular.weight: A tibble with 119 rows
#> | ├── subunits: A tibble with 11 rows
#> | ├── posttranslational.modification: A tibble with 2 rows
#> | └── crystallization: A tibble with 22 rows
#> └── bibliography
#> | └── reference: A tibble with 285 rows
You can also query for certain fields to reduce the size of the returned object.
ShowFields(df)
#> # A tibble: 40 × 2
#> field acronym
#> <chr> <chr>
#> 1 PROTEIN PR
#> 2 RECOMMENDED_NAME RN
#> 3 SYSTEMATIC_NAME SN
#> 4 SYNONYMS SY
#> 5 REACTION RE
#> 6 REACTION_TYPE RT
#> 7 SOURCE_TISSUE ST
#> 8 LOCALIZATION LO
#> 9 NATURAL_SUBSTRATE_PRODUCT NSP
#> 10 SUBSTRATE_PRODUCT SP
#> # … with 30 more rows
res <- QueryBrenda(df, EC = "1.1.1.1", fields = c("PROTEIN", "SUBSTRATE_PRODUCT"))
res[["1.1.1.1"]][["interactions"]][["substrate.product"]]
#> # A tibble: 772 × 7
#> proteinID substrate product comme…¹ comme…² rever…³ refID
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 10 n-propanol + NAD+ n-prop… <NA> <NA> r 120
#> 2 10 2-propanol + NAD+ aceton… <NA> <NA> <NA> 122
#> 3 10 n-hexanol + NAD+ n-hexa… <NA> <NA> r 120
#> 4 10 (S)-2-butanol + NAD+ 2-buta… <NA> <NA> r 120
#> 5 10 ethylenglycol + NAD+ ? + NA… <NA> <NA> r 120
#> 6 10 n-butanol + NAD+ butyra… <NA> <NA> <NA> 122
#> 7 10 n-decanol + NAD+ n-deca… <NA> <NA> r 120
#> 8 10 Tris + NAD+ ? + NA… <NA> <NA> r 120
#> 9 10 isopropanol + NAD+ aceton… <NA> <NA> <NA> 139,…
#> 10 10 5-hydroxymethylfurfural + NA… (furan… #10# m… <NA> <NA> 193,…
#> # … with 762 more rows, and abbreviated variable names ¹commentarySubstrate,
#> # ²commentaryProduct, ³reversibility
It should be noted that most fields contain a fieldInfo
column and a commentary
column. The fieldInfo
column is what’s extracted by BRENDA from the literature, and the commentary
column is usually some context from the original paper. #
symbols in the commentary correspond to the proteinID
s, and <>
enclose the corresponding refID
s. For further information, please see the README file from BRENDA.
Note the difference in row numbers in the following example and in the one where we queried for all organisms.
res <- QueryBrenda(df, EC = "1.1.1.1", organisms = "Homo sapiens")
res$`1.1.1.1`
#> Entry 1.1.1.1
#> ├── nomenclature
#> | ├── ec: 1.1.1.1
#> | ├── systematic.name: alcohol:NAD+ oxidoreductase
#> | ├── recommended.name: alcohol dehydrogenase
#> | ├── synonyms: A tibble with 41 rows
#> | ├── reaction: A tibble with 2 rows
#> | └── reaction.type: A tibble with 3 rows
#> ├── interactions
#> | ├── substrate.product: A tibble with 102 rows
#> | ├── natural.substrate.product: A tibble with 9 rows
#> | ├── cofactor: A tibble with 2 rows
#> | ├── metals.ions: A tibble with 2 rows
#> | └── inhibitors: A tibble with 36 rows
#> ├── parameters
#> | ├── km.value: A tibble with 163 rows
#> | ├── turnover.number: A tibble with 64 rows
#> | ├── ki.value: A tibble with 8 rows
#> | ├── ph.optimum: A tibble with 15 rows
#> | ├── ph.range: A tibble with 2 rows
#> | ├── temperature.optimum: A tibble with 2 rows
#> | └── specific.activity: A tibble with 5 rows
#> ├── organism
#> | ├── organism: A tibble with 3 rows
#> | ├── source.tissue: A tibble with 21 rows
#> | └── localization: A tibble with 1 rows
#> ├── molecular
#> | ├── stability
#> | | ├── general.stability: A tibble with 1 rows
#> | | ├── storage.stability: A tibble with 4 rows
#> | | ├── ph.stability: A tibble with 1 rows
#> | | ├── organic.solvent.stability: A tibble with 1 rows
#> | | └── temperature.stability: A tibble with 2 rows
#> | ├── purification: A tibble with 7 rows
#> | ├── cloned: A tibble with 5 rows
#> | ├── engineering: A tibble with 3 rows
#> | └── application: A tibble with 1 rows
#> ├── structure
#> | ├── molecular.weight: A tibble with 12 rows
#> | ├── subunits: A tibble with 3 rows
#> | └── crystallization: A tibble with 2 rows
#> └── bibliography
#> | └── reference: A tibble with 285 rows
To transform the brenda.entries
structure into a table, use the helper function ExtractField()
.
res <- QueryBrenda(df, EC = c("1.1.1.1", "6.3.5.8"), n.core = 2)
ExtractField(res, field = "parameters$ph.optimum")
#> Deprecated entries in the res object will be removed.
#> # A tibble: 158 × 9
#> ec organism prote…¹ uniprot org.c…² descr…³ field…⁴ comme…⁵ refID
#> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <chr> <chr>
#> 1 1.1.1.1 Acetobacter pa… 60 <NA> <NA> 5.5 NA #122# … 113,…
#> 2 1.1.1.1 Acetobacter pa… 60 <NA> <NA> 6 NA #91,11… 113,…
#> 3 1.1.1.1 Acetobacter pa… 60 <NA> <NA> 8.5 NA #3# ox… 4,20…
#> 4 1.1.1.1 Acinetobacter … 28 <NA> <NA> 5.9 NA #8# ac… 15,1…
#> 5 1.1.1.1 Aeropyrum pern… 131 Q9Y9P9 <NA> 10.5 NA #5# as… 2,10…
#> 6 1.1.1.1 Aeropyrum pern… 131 Q9Y9P9 <NA> 8 NA #40# a… 64,9…
#> 7 1.1.1.1 Arabidopsis th… 20 <NA> <NA> 10.5 NA #5# as… 2,10…
#> 8 1.1.1.1 Aspergillus ni… 14 <NA> <NA> 8.1 NA #14# r… 81
#> 9 1.1.1.1 Brevibacterium… 46 <NA> <NA> 10.4 NA #46# o… 14,1…
#> 10 1.1.1.1 Brevibacterium… 46 <NA> <NA> 6 NA #91,11… 113,…
#> # … with 148 more rows, and abbreviated variable names ¹proteinID,
#> # ²org.commentary, ³description, ⁴fieldInfo, ⁵commentary
As shown above, the returned table consists of three parts: the EC number, organism-related information (organism, protein ID, uniprot ID, and commentary on the organism), and extracted field information (description, commentary, etc.).
A lot of the times we have a list of gene symbols or enzyme names instead of EC numbers. In this case, a helper function can be used to find the corresponding EC numbers:
ID2Enzyme(brenda = df, ids = c("ADH4", "CD38", "pyruvate dehydrogenase"))
#> # A tibble: 4 × 5
#> ID EC RECOMMENDED_NAME SYNON…¹ SYSTE…²
#> <chr> <chr> <chr> <chr> <chr>
#> 1 ADH4 1.1.1.1 <NA> "aldeh… <NA>
#> 2 CD38 2.4.99.20 <NA> "#1,3,… <NA>
#> 3 pyruvate dehydrogenase 1.2.1.51 pyruvate dehydrogenase (NADP… "#1,2#… <NA>
#> 4 pyruvate dehydrogenase 2.7.11.2 [pyruvate dehydrogenase (ace… "kinas… ATP:[p…
#> # … with abbreviated variable names ¹SYNONYMS, ²SYSTEMATIC_NAME
The EC
column can be then handpicked and used in QueryBrenda()
.
Often we are interested in the enzymes involved in a specific BioCyc pathway. As BioCyc now requires login credentials for using their web service, users are recommended to use the metabolike package for more advanced queries.
By default QueryBrenda
uses all available cores, but often limiting n.core
could give better performance as it reduces the overhead. The following are results produced on a machine with 40 cores (2 Intel Xeon CPU E5-2640 v4 @ 3.4GHz), and 256G of RAM:
EC.numbers <- head(unique(df$ID), 100)
system.time(QueryBrenda(df, EC = EC.numbers, n.core = 0)) # default
# user system elapsed
# 4.528 7.856 34.567
system.time(QueryBrenda(df, EC = EC.numbers, n.core = 1))
# user system elapsed
# 22.080 0.360 22.438
system.time(QueryBrenda(df, EC = EC.numbers, n.core = 2))
# user system elapsed
# 0.552 0.400 13.597
system.time(QueryBrenda(df, EC = EC.numbers, n.core = 4))
# user system elapsed
# 0.688 0.832 9.517
system.time(QueryBrenda(df, EC = EC.numbers, n.core = 8))
# user system elapsed
# 1.112 1.476 10.000
sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] brendaDb_1.12.0 BiocStyle_2.26.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.9 bslib_0.4.1 compiler_4.2.2
#> [4] pillar_1.8.1 BiocManager_1.30.19 jquerylib_0.1.4
#> [7] dbplyr_2.2.1 tools_4.2.2 digest_0.6.31
#> [10] bit_4.0.5 tibble_3.1.8 jsonlite_1.8.4
#> [13] BiocFileCache_2.6.0 RSQLite_2.2.19 evaluate_0.19
#> [16] memoise_2.0.1 lifecycle_1.0.3 pkgconfig_2.0.3
#> [19] rlang_1.0.6 cli_3.4.1 DBI_1.1.3
#> [22] filelock_1.0.2 parallel_4.2.2 curl_4.3.3
#> [25] yaml_2.3.6 xfun_0.35 fastmap_1.1.0
#> [28] withr_2.5.0 httr_1.4.4 stringr_1.5.0
#> [31] dplyr_1.0.10 knitr_1.41 rappdirs_0.3.3
#> [34] generics_0.1.3 vctrs_0.5.1 sass_0.4.4
#> [37] tidyselect_1.2.0 bit64_4.0.5 glue_1.6.2
#> [40] R6_2.5.1 fansi_1.0.3 BiocParallel_1.32.4
#> [43] rmarkdown_2.18 bookdown_0.31 tidyr_1.2.1
#> [46] purrr_0.3.5 blob_1.2.3 magrittr_2.0.3
#> [49] ellipsis_0.3.2 codetools_0.2-18 htmltools_0.5.4
#> [52] assertthat_0.2.1 utf8_1.2.2 stringi_1.7.8
#> [55] cachem_1.0.6 crayon_1.5.2