Disease Ontology (DO) was developed to create a consistent description of gene products with disease perspectives, and is essential for supporting functional genomics in disease context. Accurate disease descriptions can discover new relationships between genes and disease, and new functions for previous uncharacteried genes and alleles.We have developed the DOSE package for semantic similarity analysis and disease enrichment analysis, and DOSE
import an Bioconductor package ‘DO.db’ to get the relationship(such as parent and child) between DO terms. But DO.db
hasn’t been updated for years, and a lot of semantic information is missing. So we developed the new package HDO.db
for Human Disease Ontology annotation.
library(AnnotationDbi)
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#> lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#> tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: IRanges
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#>
#> I, expand.grid, unname
The annotation data comes from https://github.com/DiseaseOntology/HumanDiseaseOntology/tree/main/src/ontology, and HDO.db provide these AnnDbBimap object:
ls("package:HDO.db")
#> [1] "HDO" "HDO.db" "HDOALIAS" "HDOANCESTOR" "HDOCHILDREN"
#> [6] "HDOMAPCOUNTS" "HDOOFFSPRING" "HDOPARENTS" "HDOSYNONYM" "HDOTERM"
#> [11] "HDO_dbInfo" "HDO_dbconn" "HDO_dbfile" "HDO_dbschema" "HDOmetadata"
#> [16] "columns" "keys" "keytypes" "select"
packageVersion("HDO.db")
#> [1] '0.99.1'
You can use help
function to get their documents: help(DOOFFSPRING)
toTable(HDOmetadata)
#> name
#> 1 DBSCHEMA
#> 2 DBSCHEMAVERSION
#> 3 HDOSOURCENAME
#> 4 HDOSOURCURL
#> 5 HDOSOURCEDATE
#> 6 Db type
#> value
#> 1 HDO_DB
#> 2 1.0
#> 3 Disease Ontology
#> 4 https://github.com/DiseaseOntology/HumanDiseaseOntology/blob/main/src/ontology/HumanDO.obo
#> 5 20220706
#> 6 HDODb
HDOMAPCOUNTS
#> HDOANCESTOR HDOCHILDREN HDOOFFSPRING HDOPARENTS HDOTERM
#> "66768" "11034" "66768" "11034" "11003"
In HDO.db, HDOTERM
represet the whole DO terms and their names. The users can also get their aliases and synonyms from HDOALIAS
and HDOSYNONYM
, respectively.
convert HDOTERM to table
doterm <- toTable(HDOTERM)
head(doterm)
#> doid term
#> 1 DOID:0001816 angiosarcoma
#> 2 DOID:0002116 pterygium
#> 3 DOID:0014667 disease of metabolism
#> 4 DOID:0040001 shrimp allergy
#> 5 DOID:0040002 aspirin allergy
#> 6 DOID:0040003 benzylpenicillin allergy
convert HDOTERM to list
dotermlist <- as.list(HDOTERM)
head(dotermlist)
#> $`DOID:0001816`
#> [1] "angiosarcoma"
#>
#> $`DOID:0002116`
#> [1] "pterygium"
#>
#> $`DOID:0014667`
#> [1] "disease of metabolism"
#>
#> $`DOID:0040001`
#> [1] "shrimp allergy"
#>
#> $`DOID:0040002`
#> [1] "aspirin allergy"
#>
#> $`DOID:0040003`
#> [1] "benzylpenicillin allergy"
get alias of DOID:0001816
get synonym of DOID:0001816
Similar to DO.db
, we provide four Bimap objects to represent relationship between DO terms: HDOANCESTOR,HDOPARENTS,HDOOFFSPRING, and HDOCHILDREN.
HDOANCESTOR describes the association between DO terms and their ancestral terms based on a directed acyclic graph (DAG) defined by the Disease Ontology. We can use toTable
function in AnnotationDbi
package to get a two-column data.frame: the first column means the DO term ids, and the second column means their ancestor terms.
anc_table <- toTable(HDOANCESTOR)
head(anc_table)
#> doid ancestor
#> 1 DOID:0001816 DOID:175
#> 2 DOID:0001816 DOID:176
#> 3 DOID:0001816 DOID:0050686
#> 4 DOID:0001816 DOID:162
#> 5 DOID:0001816 DOID:14566
#> 6 DOID:0001816 DOID:4
get ancestor of “DOID:0001816”
HDOPARENTS describes the association between DO terms and their direct parent terms based on DAG. We can use toTable
function in AnnotationDbi
package to get a two-column data.frame: the first column means the DO term ids, and the second column means their parent terms.
parent_table <- toTable(HDOPARENTS)
head(parent_table)
#> doid parent
#> 1 DOID:0001816 DOID:175
#> 2 DOID:0002116 DOID:10124
#> 3 DOID:0014667 DOID:4
#> 4 DOID:0040001 DOID:0060524
#> 5 DOID:0040002 DOID:0060500
#> 6 DOID:0040003 DOID:0060519
get parent term of “DOID:0001816”
HDOPARENTS describes the association between DO terms and their offspring
terms based on DAG. it’s the exact opposite of HDOANCESTOR
, whose usage is similar to it.
get offspring of “DOID:0001816”
HDOCHILDREN describes the association between DO terms and their direct children terms based on DAG. it’s the exact opposite of HDOPARENTS
, whose usage is similar to it.
get children of “DOID:4”
child_list <- AnnotationDbi::as.list(HDO.db::HDOCHILDREN)
child_list[["DOID:4"]]
#> [1] "DOID:0014667" "DOID:0050117" "DOID:0080015" "DOID:14566" "DOID:150"
#> [6] "DOID:225" "DOID:630" "DOID:7"
The HDO.db support the select()
, keys()
, keytypes()
, and columns
interface.
columns(HDO.db)
#> [1] "alias" "ancestor" "children" "doid" "offspring" "parent"
#> [7] "synonym" "term"
## use doid keys
dokeys <- head(keys(HDO.db))
res <- select(x = HDO.db, keys = dokeys, keytype = "doid",
columns = c("offspring", "term", "parent"))
head(res)
#> doid offspring term parent
#> 1 DOID:0001816 DOID:265 angiosarcoma DOID:175
#> 2 DOID:0001816 DOID:268 angiosarcoma DOID:175
#> 3 DOID:0001816 DOID:4505 angiosarcoma DOID:175
#> 4 DOID:0001816 DOID:4510 angiosarcoma DOID:175
#> 5 DOID:0001816 DOID:4512 angiosarcoma DOID:175
#> 6 DOID:0001816 DOID:4513 angiosarcoma DOID:175
## use term keys
dokeys <- head(keys(HDO.db, keytype = "term"))
res <- select(x = HDO.db, keys = dokeys, keytype = "term",
columns = c("offspring", "doid", "parent"))
head(res)
#> doid offspring parent
#> 1 DOID:0001816 DOID:265 DOID:175
#> 2 DOID:0001816 DOID:268 DOID:175
#> 3 DOID:0001816 DOID:4505 DOID:175
#> 4 DOID:0001816 DOID:4510 DOID:175
#> 5 DOID:0001816 DOID:4512 DOID:175
#> 6 DOID:0001816 DOID:4513 DOID:175
Please go to https://yulab-smu.top/biomedical-knowledge-mining-book/ for the vignette.
Please go to https://yulab-smu.top/biomedical-knowledge-mining-book/dose-enrichment.html for the vignette.
sessionInfo()
#> R version 4.2.0 Patched (2022-05-05 r82321)
#> Platform: x86_64-apple-darwin19.6.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#>
#> Matrix products: default
#> BLAS: /Users/ka36530_ca/R-stuff/bin/R-4-2/lib/libRblas.dylib
#> LAPACK: /Users/ka36530_ca/R-stuff/bin/R-4-2/lib/libRlapack.dylib
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] AnnotationDbi_1.59.1 IRanges_2.31.2 S4Vectors_0.35.4
#> [4] Biobase_2.57.1 BiocGenerics_0.43.4 HDO.db_0.99.1
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.9 GenomeInfoDb_1.33.7 XVector_0.37.1
#> [4] bslib_0.4.0 compiler_4.2.0 jquerylib_0.1.4
#> [7] bitops_1.0-7 zlibbioc_1.43.0 tools_4.2.0
#> [10] digest_0.6.29 bit_4.0.4 jsonlite_1.8.0
#> [13] RSQLite_2.2.17 evaluate_0.16 memoise_2.0.1
#> [16] pkgconfig_2.0.3 png_0.1-7 rlang_1.0.6
#> [19] DBI_1.1.3 cli_3.4.1 yaml_2.3.5
#> [22] xfun_0.33 fastmap_1.1.0 GenomeInfoDbData_1.2.9
#> [25] stringr_1.4.1 httr_1.4.4 knitr_1.40
#> [28] Biostrings_2.65.6 sass_0.4.2 vctrs_0.4.2
#> [31] bit64_4.0.5 R6_2.5.1 rmarkdown_2.16
#> [34] blob_1.2.3 magrittr_2.0.3 htmltools_0.5.3
#> [37] KEGGREST_1.37.3 stringi_1.7.8 RCurl_1.98-1.8
#> [40] cachem_1.0.6 crayon_1.5.1