1 Introduction

biodbKegg is a biodb extension package that implements a connector to KEGG Compound database (Kanehisa and Goto 2000).

2 Installation

Install using Bioconductor:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install('biodbKegg')

3 Initialization

The first step in using biodbKegg, is to create an instance of the biodb class BiodbMain from the main biodb package. This is done by calling the constructor of the class:

mybiodb <- biodb::newInst()

During this step the configuration is set up, the cache system is initialized and extension packages are loaded.

We will see at the end of this vignette that the biodb instance needs to be terminated with a call to the terminate() method.

4 Creating a connector to KEGG Compound database

In biodb the connection to a database is handled by a connector instance that you can get from the factory. biodbKegg implements a connector to a remote database. Here is the code to instantiate a connector:

kegg.comp.conn <- mybiodb$getFactory()$createConn('kegg.compound')
## Loading required package: biodbKegg

5 Accessing entries

To retrieve entries, use:

entries <- kegg.comp.conn$getEntry(c('C00133', 'C00751'))
entries
## [[1]]
## Biodb KEGG Compound entry instance C00133.
## 
## [[2]]
## Biodb KEGG Compound entry instance C00751.

To convert a list of entries into a dataframe, run:

x <- mybiodb$entriesToDataframe(entries, compute=FALSE)
x
##   accession monoisotopic.mass formula molecular.mass
## 1    C00133           89.0477 C3H7NO2        89.0932
## 2    C00751          410.3913  C30H50       410.7180
##                                      name   cas.id ncbi.pubchem.comp.id
## 1 D-Alanine;D-2-Aminopropionic acid;D-Ala 338-69-2                 3433
## 2             Squalene;Spinacene;Supraene 111-02-4                 4013
##   chebi.id
## 1    15570
## 2    15440
##                                                                                                                                                                 kegg.reaction.id
## 1 R00399;R00401;R01147;R01148;R01149;R01150;R01225;R01344;R02718;R04369;R04611;R05861;R07651;R08850;R09588;R09595;R11965;R12557;R12812;R12863;R12867;R12871;R12873;R12875;R12904
## 2                                                                                     R02872;R02874;R02875;R02876;R06223;R07322;R07323;R08535;R09712;R10167;R10169;R11401;R12355
##                                                                                                                         kegg.enzyme.id
## 1 1.4.3.3;1.4.3.19;2.1.2.7;2.3.1.263;2.3.2.14;2.6.1.21;3.1.1.103;3.4.13.22;3.4.17.8;5.1.1.1;6.1.1.13;6.1.2.1;6.3.2.4;6.3.2.16;6.3.2.35
## 2                                    1.3.1.96;1.14.14.17;1.14.19.-;1.17.8.1;2.1.1.262;2.5.1.21;4.2.1.123;4.2.1.129;5.4.99.17;5.4.99.37
##                                                                             kegg.pathway.id
## 1                            map00470;map00550;map00552;map01100;map01502;map01503;map04742
## 2 map00100;map00909;map00996;map00999;map01060;map01062;map01066;map01070;map01100;map01110
##   kegg.compound.id lipidmaps.structure.id
## 1           C00133                   <NA>
## 2           C00751         LMPR0106010002

6 Search for compounds of a certain mass

ids <- kegg.comp.conn$searchForEntries(list(monoisotopic.mass=list(value=64, delta=2.0)), max.results=10)
entries <- mybiodb$getFactory()$getEntry('kegg.compound', ids)

7 Add information to a data frame containing KEGG Compound IDs

If you have a data frame containing a column with KEGG Compound IDs, you can add information such as associated KEGG Enzymes, associated KEGG Pathways and KEGG Modules to your data frame, for a specific organism.

For the example we use the list of compound IDs we already have, to construct a data frame:

kegg.comp.ids <- c('C06144', 'C06178', 'C02659')
mydf <- data.frame(kegg.ids=kegg.comp.ids)

Using the addInfo() method of KeggCompoundConn class, we add information about pathways, enzymes and modules for these compounds:

kegg.comp.conn$addInfo(mydf, id.col='kegg.ids', org='mmu')
##   kegg.ids               kegg.enzyme.id
## 1   C06144                     4.2.1.27
## 2   C06178            1.4.3.21|2.3.1.74
## 3   C02659 1.14.14.41|2.4.1.63|3.2.1.21
##                                                 kegg.reaction.id
## 1                                                  R01611;R01367
## 2                      R01853;R02382;R02529|R01613;R07987;R07988
## 3 R10030;R10034;R11597|R03625;R04948;R10037|R00026;R00306;R02558
##     kegg.pathway.id
## 1 mmu00650|mmu01100
## 2 mmu00760|mmu01100
## 3          mmu01100
##                                                                                                     kegg.pathway.name
## 1                   Butanoate metabolism - Mus musculus (house mouse)|Metabolic pathways - Mus musculus (house mouse)
## 2 Nicotinate and nicotinamide metabolism - Mus musculus (house mouse)|Metabolic pathways - Mus musculus (house mouse)
## 3                                                                     Metabolic pathways - Mus musculus (house mouse)
##                           kegg.pathway.pathway.class       kegg.module.id
## 1              Metabolism;Carbohydrate metabolism|NA M00027|M00001|M00002
## 2 Metabolism;Metabolism of cofactors and vitamins|NA M00912|M00001|M00002
## 3                                               <NA> M00001|M00002|M00003
##                                                                                                                                                        kegg.module.name
## 1                   GABA (gamma-Aminobutyrate) shunt|Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate|Glycolysis, core module involving three-carbon compounds
## 2 NAD biosynthesis, tryptophan => quinolinate => NAD|Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate|Glycolysis, core module involving three-carbon compounds
## 3       Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate|Glycolysis, core module involving three-carbon compounds|Gluconeogenesis, oxaloacetate => fructose-6P

Note that, by default, the number of values for each field is limited to 3. Please see the help page of KeggCompoundConn for more information about addInfo(), and a description of all parameters.

The list of organisms is available at https://www.genome.jp/kegg/catalog/org_list.html.

9 Terminate the biodb instance

When done with your biodb instance you have to terminate it, in order to ensure release of resources (file handles, database connection, etc):

mybiodb$terminate()
## INFO  [16:27:40.526] Closing BiodbMain instance...
## INFO  [16:27:40.528] Connector "kegg.compound" deleted.
## INFO  [16:27:40.536] Connector "kegg.enzyme" deleted.
## INFO  [16:27:40.537] Connector "kegg.pathway" deleted.
## INFO  [16:27:40.539] Connector "kegg.module" deleted.
## INFO  [16:27:40.540] Connector "kegg.reaction" deleted.

10 Session information

sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] biodbKegg_1.8.0  BiocStyle_2.30.0
## 
## loaded via a namespace (and not attached):
##  [1] rappdirs_0.3.3       sass_0.4.7           utf8_1.2.4          
##  [4] generics_0.1.3       stringi_1.7.12       RSQLite_2.3.1       
##  [7] hms_1.1.3            digest_0.6.33        magrittr_2.0.3      
## [10] evaluate_0.22        bookdown_0.36        fastmap_1.1.1       
## [13] blob_1.2.4           plyr_1.8.9           jsonlite_1.8.7      
## [16] progress_1.2.2       DBI_1.1.3            BiocManager_1.30.22 
## [19] httr_1.4.7           fansi_1.0.5          XML_3.99-0.14       
## [22] jquerylib_0.1.4      cli_3.6.1            rlang_1.1.1         
## [25] chk_0.9.1            crayon_1.5.2         dbplyr_2.3.4        
## [28] bit64_4.0.5          withr_2.5.1          cachem_1.0.8        
## [31] yaml_2.3.7           tools_4.3.1          memoise_2.0.1       
## [34] biodb_1.10.0         dplyr_1.1.3          filelock_1.0.2      
## [37] curl_5.1.0           vctrs_0.6.4          R6_2.5.1            
## [40] magick_2.8.1         BiocFileCache_2.10.0 lifecycle_1.0.3     
## [43] stringr_1.5.0        bit_4.0.5            pkgconfig_2.0.3     
## [46] pillar_1.9.0         bslib_0.5.1          glue_1.6.2          
## [49] Rcpp_1.0.11          lgr_0.4.4            xfun_0.40           
## [52] tibble_3.2.1         tidyselect_1.2.0     knitr_1.44          
## [55] igraph_1.5.1         htmltools_0.5.6.1    rmarkdown_2.25      
## [58] compiler_4.3.1       prettyunits_1.2.0    askpass_1.2.0       
## [61] openssl_2.1.1

References

Kanehisa, Minoru, and Susumu Goto. 2000. “KEGG: Kyoto Encyclopedia of Genes and Genomes.” Nucleic Acids Research 28 (1): 27–30. https://doi.org/10.1093/nar/28.1.27.