Human Phenotype Ontology (HPO) was developed to create a consistent description of gene products with disease perspectives, and is essential for supporting functional genomics in disease context. Accurate disease descriptions can discover new relationships between genes and disease, and new functions for previous uncharacteried genes and alleles.We have developed the DOSE package for semantic similarity analysis and disease enrichment analysis, and DOSE
import an Bioconductor package ‘DO.db’ to get the relationship(such as parent and child) between DO terms. But DO.db
hasn’t been updated for years, and a lot of semantic information is missing. So we developed the new package HPO.db
for Human Human Phenotype Ontology annotation.
The released version from Bioconductor
library(AnnotationDbi)
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#> lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#> tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: IRanges
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> I, expand.grid, unname
The annotation data comes from https://github.com/DiseaseOntology/HumanDiseaseOntology/tree/main/src/ontology, and HPO.db provide these AnnDbBimap object:
ls("package:HPO.db")
#> [1] "HPO" "HPO.db" "HPOALIAS" "HPOANCESTOR" "HPOCHILDREN"
#> [6] "HPODO" "HPOGENE" "HPOMAPCOUNTS" "HPOMPO" "HPOOFFSPRING"
#> [11] "HPOPARENTS" "HPOSYNONYM" "HPOTERM" "HPO_dbInfo" "HPO_dbconn"
#> [16] "HPO_dbfile" "HPO_dbschema" "HPOmetadata" "columns" "keys"
#> [21] "keytypes" "select"
packageVersion("HPO.db")
#> [1] '0.99.2'
You can use help
function to get their documents: help(HPOOFFSPRING)
toTable(HPOmetadata)
#> name
#> 1 DBSCHEMA
#> 2 DBSCHEMAVERSION
#> 3 HPOSOURCENAME
#> 4 HPOSOURCURL
#> 5 HPOSOURCEDATE
#> 6 Db type
#> value
#> 1 HPO_DB
#> 2 1.0
#> 3 Human Phenotype Ontology
#> 4 https://github.com/DiseaseOntology/HumanDiseaseOntology/blob/main/src/ontology/HumanDO.obo
#> 5 20230405
#> 6 HPODb
HPOMAPCOUNTS
#> HPOANCESTOR HPOCHILDREN HPODO HPOGENE HPOMPO HPOOFFSPRING
#> "178071" "21139" "46027" "415767" "2282" "178071"
#> HPOPARENTS HPOTERM
#> "21139" "16874"
In HPO.db, HPOTERM
represet the whole HPO terms and their names. The users can also get their aliases and synonyms from HPOALIAS
and HPOSYNONYM
, respectively.
convert HPOTERM to table
doterm <- toTable(HPOTERM)
head(doterm)
#> hpoid term
#> 1 HP:0000001 All
#> 2 HP:0000002 Abnormality of body height
#> 3 HP:0000003 Multicystic kidney dysplasia
#> 4 HP:0000005 Mode of inheritance
#> 5 HP:0000006 Autosomal dominant inheritance
#> 6 HP:0000007 Autosomal recessive inheritance
convert HPOTERM to list
dotermlist <- as.list(HPOTERM)
head(dotermlist)
#> $`HP:0000001`
#> [1] "All"
#>
#> $`HP:0000002`
#> [1] "Abnormality of body height"
#>
#> $`HP:0000003`
#> [1] "Multicystic kidney dysplasia"
#>
#> $`HP:0000005`
#> [1] "Mode of inheritance"
#>
#> $`HP:0000006`
#> [1] "Autosomal dominant inheritance"
#>
#> $`HP:0000007`
#> [1] "Autosomal recessive inheritance"
get alias of HP:0000006
doalias <- as.list(HPOALIAS)
doalias[['HP:0000006']]
#> [1] "HP:0001415" "HP:0001447" "HP:0001448" "HP:0001451" "HP:0001452"
#> [6] "HP:0001455" "HP:0001456" "HP:0001463"
get synonym of HP:0000006
Similar to HPO.db
, we provide four Bimap objects to represent relationship between HPO terms: HPOANCESTOR,HPOPARENTS,HPOOFFSPRING, and HPOCHILDREN.
HPOANCESTOR describes the association between HPO terms and their ancestral terms based on a directed acyclic graph (DAG) defined by the Human Phenotype Ontology. We can use toTable
function in AnnotationDbi
package to get a two-column data.frame: the first column means the HPO term ids, and the second column means their ancestor terms.
anc_table <- toTable(HPOANCESTOR)
head(anc_table)
#> hpoid ancestor
#> 1 HP:0000002 HP:0001507
#> 2 HP:0000002 HP:0000118
#> 3 HP:0000002 HP:0000001
#> 4 HP:0000003 HP:0000107
#> 5 HP:0000003 HP:0012210
#> 6 HP:0000003 HP:0000077
get ancestor of “HP:0000006”
HPOPARENTS describes the association between HPO terms and their direct parent terms based on DAG. We can use toTable
function in AnnotationDbi
package to get a two-column data.frame: the first column means the HPO term ids, and the second column means their parent terms.
parent_table <- toTable(HPOPARENTS)
head(parent_table)
#> hpoid parent
#> 1 HP:0000002 HP:0001507
#> 2 HP:0000003 HP:0000107
#> 3 HP:0000005 HP:0000001
#> 4 HP:0000006 HP:0034345
#> 5 HP:0000007 HP:0034345
#> 6 HP:0000008 HP:0000812
get parent term of “HP:0000006”
HPOPARENTS describes the association between HPO terms and their offspring
terms based on DAG. it’s the exact opposite of HPOANCESTOR
, whose usage is similar to it.
get offspring of “HP:0000002”
off_list <- AnnotationDbi::as.list(HPO.db::HPOOFFSPRING)
off_list[["HP:0000002"]]
#> [1] "HP:0000098" "HP:0000839" "HP:0001519" "HP:0001533" "HP:0001548"
#> [6] "HP:0003498" "HP:0003502" "HP:0003508" "HP:0003510" "HP:0003517"
#> [11] "HP:0003521" "HP:0003561" "HP:0004322" "HP:0004991" "HP:0005026"
#> [16] "HP:0005069" "HP:0008845" "HP:0008848" "HP:0008857" "HP:0008873"
#> [21] "HP:0008890" "HP:0008905" "HP:0008909" "HP:0008921" "HP:0008922"
#> [26] "HP:0008929" "HP:0011404" "HP:0011405" "HP:0011406" "HP:0011407"
#> [31] "HP:0012106" "HP:0012772" "HP:0012773" "HP:0012774"
HPOCHILDREN describes the association between HPO terms and their direct children terms based on DAG. it’s the exact opposite of HPOPARENTS
, whose usage is similar to it.
get children of “HP:0000002”
child_list <- AnnotationDbi::as.list(HPO.db::HPOCHILDREN)
child_list[["HP:0000002"]]
#> [1] "HP:0000098" "HP:0004322" "HP:0012772"
The HPO.db support the select()
, keys()
, keytypes()
, and columns
interface.
columns(HPO.db)
#> [1] "alias" "ancestor" "children" "hpoid" "offspring" "parent"
#> [7] "synonym" "term"
## use hpoid keys
dokeys <- head(keys(HPO.db))
res <- select(x = HPO.db, keys = dokeys, keytype = "hpoid",
columns = c("offspring", "term", "parent"))
head(res)
#> hpoid offspring term parent
#> 1 HP:0000001 HP:0000002 All <NA>
#> 2 HP:0000001 HP:0000003 All <NA>
#> 3 HP:0000001 HP:0000005 All <NA>
#> 4 HP:0000001 HP:0000006 All <NA>
#> 5 HP:0000001 HP:0000007 All <NA>
#> 6 HP:0000001 HP:0000008 All <NA>
## use term keys
dokeys <- head(keys(HPO.db, keytype = "term"))
res <- select(x = HPO.db, keys = dokeys, keytype = "term",
columns = c("offspring", "hpoid", "parent"))
head(res)
#> hpoid offspring parent
#> 1 HP:0000001 HP:0000002 <NA>
#> 2 HP:0000001 HP:0000003 <NA>
#> 3 HP:0000001 HP:0000005 <NA>
#> 4 HP:0000001 HP:0000006 <NA>
#> 5 HP:0000001 HP:0000007 <NA>
#> 6 HP:0000001 HP:0000008 <NA>
Please go to https://yulab-smu.top/biomedical-knowledge-mining-book/ for the vignette.
Please go to https://yulab-smu.top/biomedical-knowledge-mining-book/dose-enrichment.html for the vignette.
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] AnnotationDbi_1.63.1 IRanges_2.35.2 S4Vectors_0.39.1
#> [4] Biobase_2.61.0 BiocGenerics_0.47.0 HPO.db_0.99.2
#>
#> loaded via a namespace (and not attached):
#> [1] KEGGREST_1.41.0 xfun_0.39
#> [3] bslib_0.5.0 vctrs_0.6.3
#> [5] tools_4.3.1 bitops_1.0-7
#> [7] generics_0.1.3 curl_5.0.1
#> [9] tibble_3.2.1 fansi_1.0.4
#> [11] RSQLite_2.3.1 blob_1.2.4
#> [13] pkgconfig_2.0.3 dbplyr_2.3.2
#> [15] lifecycle_1.0.3 GenomeInfoDbData_1.2.10
#> [17] compiler_4.3.1 Biostrings_2.69.1
#> [19] httpuv_1.6.11 GenomeInfoDb_1.37.2
#> [21] htmltools_0.5.5 sass_0.4.6
#> [23] RCurl_1.98-1.12 yaml_2.3.7
#> [25] interactiveDisplayBase_1.39.0 pillar_1.9.0
#> [27] later_1.3.1 crayon_1.5.2
#> [29] jquerylib_0.1.4 ellipsis_0.3.2
#> [31] cachem_1.0.8 mime_0.12
#> [33] AnnotationHub_3.9.1 tidyselect_1.2.0
#> [35] digest_0.6.32 purrr_1.0.1
#> [37] dplyr_1.1.2 BiocVersion_3.18.0
#> [39] fastmap_1.1.1 cli_3.6.1
#> [41] magrittr_2.0.3 utf8_1.2.3
#> [43] withr_2.5.0 filelock_1.0.2
#> [45] promises_1.2.0.1 rappdirs_0.3.3
#> [47] bit64_4.0.5 rmarkdown_2.22
#> [49] XVector_0.41.1 httr_1.4.6
#> [51] bit_4.0.5 png_0.1-8
#> [53] memoise_2.0.1 shiny_1.7.4
#> [55] evaluate_0.21 knitr_1.43
#> [57] BiocFileCache_2.9.0 rlang_1.1.1
#> [59] Rcpp_1.0.10 xtable_1.8-4
#> [61] glue_1.6.2 DBI_1.1.3
#> [63] BiocManager_1.30.21 jsonlite_1.8.5
#> [65] R6_2.5.1 zlibbioc_1.47.0