EpiTxDb.Hs.hg38 0.99.3
EpiTxDb.Hs.hg38
contains post-transcriptional RNA modifications from RMBase
v2.0 (Xuan et al. 2017), tRNAdb (Jühling et al. 2009) and
snoRNAdb (Lestrade and Weber 2006) and can be accessed through the
functions EpiTxDb.Hs.hg38.tRNAdb()
, EpiTxDb.Hs.hg38.snoRNAdb()
and
EpiTxDb.Hs.hg38.RMBase()
library(EpiTxDb.Hs.hg38)
etdb <- EpiTxDb.Hs.hg38.tRNAdb()
## snapshotDate(): 2020-03-31
## downloading 1 resources
## retrieving 1 resource
## loading from cache
etdb
## EpiTxDb object:
## # Db type: EpiTxDb
## # Supporting package: EpiTxDb
## # Data source: tRNAdb
## # Organism: Homo sapiens
## # Genome: hg38
## # Coordinates: per Transcript
## # Checked against sequence: Yes
## # Nb of modifications: 1843
## # Db created by: EpiTxDb package from Bioconductor
## # Creation time: 2020-02-26 10:35:24 +0100 (Wed, 26 Feb 2020)
## # EpiTxDb version at creation time: 0.99.0
## # RSQLite version at creation time: 2.2.0
## # DBSCHEMAVERSION: 1.0
Modification information can be accessed through the typical function for an
EpiTxDb
object, for example modifications()
:
modifications(etdb)
## GRanges object with 1843 ranges and 3 metadata columns:
## seqnames ranges strand | mod_id mod_type
## <Rle> <IRanges> <Rle> | <integer> <character>
## [1] tRNA-Ala-AGC-11-1 10 + | 1 m2G
## [2] tRNA-Ala-AGC-15-1 10 + | 2 m2G
## [3] tRNA-Ala-AGC-8-1 10 + | 3 m2G
## [4] tRNA-Ala-AGC-8-2 10 + | 4 m2G
## [5] tRNA-Ala-AGC-11-1 17 + | 5 D
## ... ... ... ... . ... ...
## [1839] ENST00000386347 50 + | 2546 f5Cm
## [1840] ENST00000386347 56 + | 2547 m5U
## [1841] ENST00000386347 57 + | 2548 Y
## [1842] ENST00000386347 60 + | 2549 m1A
## [1843] ENST00000387449 23 + | 2550 t6A
## mod_name
## <character>
## [1] m2G_10_tRNA-Ala-AGC-11-1
## [2] m2G_10_tRNA-Ala-AGC-15-1
## [3] m2G_10_tRNA-Ala-AGC-8-1
## [4] m2G_10_tRNA-Ala-AGC-8-2
## [5] D_17_tRNA-Ala-AGC-11-1
## ... ...
## [1839] f5Cm_50_ENST00000386347
## [1840] m5U_56_ENST00000386347
## [1841] Y_57_ENST00000386347
## [1842] m1A_60_ENST00000386347
## [1843] t6A_23_ENST00000387449
## -------
## seqinfo: 160 sequences from hg38 genome; no seqlengths
For a more detailed overview and explanation of the functionality of the
EpiTxDb
class, have a look at the EpiTxDb
package.
The rRNA sequence annotation for ribosomal RNA has undergone some clarification processes in recent years. Therefore some of the annotation of modification refer to an older rRNA annotation.
In order to help using and updating older information, a chain file was
generated for use with liftOver
, which allows conversion of hg19 coordinates
to hg38 coordinates and back. The resources can be loaded via
chain.rRNA.hg19Tohg38()
and chain.rRNA.hg38Tohg19()
.
cf <- chain.rRNA.hg19Tohg38()
## snapshotDate(): 2020-03-31
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## require("rtracklayer")
cf
## Chain of length 3
## names(3): 5.8S 18S 28S
The following example illustrate a use case, in which data from the Modomics
(“MODOMICS: A Database of Rna Modification Pathways. 2017 Update” 2018) database will be used. The sequence data
currently stored, is the hg19 version. First we load the sequence as
ModRNAStringSet
.
library(rtracklayer)
library(Modstrings)
files <- c(system.file("extdata","Modomics.LSU.Hs.txt",
package = "EpiTxDb.Hs.hg38"),
system.file("extdata","Modomics.SSU.Hs.txt",
package = "EpiTxDb.Hs.hg38"))
seq <- lapply(files,readLines,encoding = "UTF-8")
seq <- unlist(seq)
names <- seq[seq.int(1L,6L,2L)]
seq <- seq[seq.int(2L,6L,2L)]
seq <- ModRNAStringSet(sanitizeFromModomics(gsub("-","",seq)))
names(seq) <- c("28S","5.8S","18S")
mod <- separate(seq)
The position for the one m7G
and two m6A
are of for the current rRNA
sequences. This is also the case for the other modifications.
mod[mod$mod == "m7G" | mod$mod == "m6A"]
## GRanges object with 3 ranges and 1 metadata column:
## seqnames ranges strand | mod
## <Rle> <IRanges> <Rle> | <character>
## [1] 28S 4180 + | m6A
## [2] 18S 1640 + | m7G
## [3] 18S 1833 + | m6A
## -------
## seqinfo: 3 sequences from an unspecified genome; no seqlengths
Using the chain file we can liftOver
the coordinates, which matches the
expected coordinates.
mod_new <- unlist(liftOver(mod,cf))
mod_new[mod_new$mod == "m7G" | mod_new$mod == "m6A"]
## GRanges object with 3 ranges and 1 metadata column:
## seqnames ranges strand | mod
## <Rle> <IRanges> <Rle> | <character>
## [1] 28S 4220 + | m6A
## [2] 18S 1639 + | m7G
## [3] 18S 1832 + | m6A
## -------
## seqinfo: 3 sequences from an unspecified genome; no seqlengths
In addition, the ModRNAStringSet
object can be update with the current
sequence.
rna <- readDNAStringSet(system.file("extdata","rRNA_hg38.fasta",
package = "EpiTxDb.Hs.hg38"))
seqtype(rna) <- "RNA"
seq_new <- combineIntoModstrings(rna,mod_new)
seq_new
## A ModRNAStringSet instance of length 3
## width seq names
## [1] 5070 CGCGACCUCAGAUCAGACGUGGC...AGCCCUCGACACAAGGGUUUGUC 28S
## [2] 1869 UACCUGGUUGAUCCUGCCAGUAG...GUGζζCCUGCGGAAGGAUCAUUA 18S
## [3] 157 CGACUCUUAGCGGJGGAUCACUC...CUACGCCUGUCUGAGCGUCGCUU 5.8S
sessionInfo()
## R Under development (unstable) (2020-04-01 r78128)
## Platform: x86_64-apple-darwin17.7.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
##
## Matrix products: default
## BLAS: /Users/ka36530_ca/R-stuff/bin/R-devel/lib/libRblas.dylib
## LAPACK: /Users/ka36530_ca/R-stuff/bin/R-devel/lib/libRlapack.dylib
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] rtracklayer_1.47.0 GenomicRanges_1.39.3 GenomeInfoDb_1.23.16
## [4] EpiTxDb.Hs.hg38_0.99.3 EpiTxDb_0.99.5 Modstrings_1.3.12
## [7] Biostrings_2.55.7 XVector_0.27.2 AnnotationDbi_1.49.1
## [10] IRanges_2.21.8 S4Vectors_0.25.14 Biobase_2.47.3
## [13] AnnotationHub_2.19.9 BiocFileCache_1.11.4 dbplyr_1.4.2
## [16] BiocGenerics_0.33.3 BiocStyle_2.15.6
##
## loaded via a namespace (and not attached):
## [1] matrixStats_0.56.0 bitops_1.0-6
## [3] assertive.models_0.0-2 bit64_0.9-7
## [5] progress_1.2.2 httr_1.4.1
## [7] assertive.datetimes_0.0-2 tools_4.1.0
## [9] R6_2.4.1 colorspace_1.4-1
## [11] DBI_1.1.0 assertive.data_0.0-3
## [13] assertive.reflection_0.0-4 tidyselect_1.0.0
## [15] prettyunits_1.1.1 bit_1.1-15.2
## [17] curl_4.3 compiler_4.1.0
## [19] cli_2.0.2 assertive.properties_0.0-4
## [21] xml2_1.3.0 DelayedArray_0.13.8
## [23] assertive.files_0.0-2 bookdown_0.18
## [25] scales_1.1.0 tRNA_1.5.2
## [27] askpass_1.1 rappdirs_0.3.1
## [29] stringr_1.4.0 digest_0.6.25
## [31] Rsamtools_2.3.7 rmarkdown_2.1
## [33] assertive.numbers_0.0-2 pkgconfig_2.0.3
## [35] htmltools_0.4.0 fastmap_1.0.1
## [37] rlang_0.4.5 RSQLite_2.2.0
## [39] assertive_0.3-5 shiny_1.4.0.2
## [41] BiocParallel_1.21.2 dplyr_0.8.5
## [43] RCurl_1.98-1.1 magrittr_1.5
## [45] GenomeInfoDbData_1.2.2 Matrix_1.2-18
## [47] munsell_0.5.0 Rcpp_1.0.4
## [49] fansi_0.4.1 lifecycle_0.2.0
## [51] stringi_1.4.6 assertive.base_0.0-7
## [53] yaml_2.2.1 SummarizedExperiment_1.17.5
## [55] zlibbioc_1.33.1 grid_4.1.0
## [57] blob_1.2.1 promises_1.1.0
## [59] crayon_1.3.4 lattice_0.20-41
## [61] tRNAdbImport_1.5.6 assertive.code_0.0-3
## [63] GenomicFeatures_1.39.7 hms_0.5.3
## [65] knitr_1.28 pillar_1.4.3
## [67] assertive.sets_0.0-3 codetools_0.2-16
## [69] biomaRt_2.43.4 XML_3.99-0.3
## [71] glue_1.3.2 BiocVersion_3.11.1
## [73] Structstrings_1.3.5 evaluate_0.14
## [75] BiocManager_1.30.10 vctrs_0.2.4
## [77] httpuv_1.5.2 gtable_0.3.0
## [79] openssl_1.4.1 purrr_0.3.3
## [81] assertive.strings_0.0-3 assertthat_0.2.1
## [83] ggplot2_3.3.0 xfun_0.12
## [85] mime_0.9 xtable_1.8-4
## [87] assertive.types_0.0-3 later_1.0.0
## [89] assertive.data.uk_0.0-2 tibble_3.0.0
## [91] GenomicAlignments_1.23.2 memoise_1.1.0
## [93] assertive.matrices_0.0-2 ellipsis_0.3.0
## [95] assertive.data.us_0.0-2 interactiveDisplayBase_1.25.0
Jühling, Frank, Mario Mörl, Roland K. Hartmann, Mathias Sprinzl, Peter F. Stadler, and Joern Pütz. 2009. “TRNAdb 2009: Compilation of tRNA Sequences and tRNA Genes.” Nucleic Acids Research 37: D159–D162. https://doi.org/10.1093/nar/gkn772.
Lestrade, Laurent, and Michel J. Weber. 2006. “snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs.” Nucleic Acids Research 34 (January): D158–D162. https://doi.org/10.1093/nar/gkj002.
“MODOMICS: A Database of Rna Modification Pathways. 2017 Update.” 2018. Nucleic Acids Research 46 (D1): D303–D307. https://doi.org/10.1093/nar/gkx1030.
Xuan, Jia-Jia, Wen-Ju Sun, Peng-Hui Lin, Ke-Ren Zhou, Shun Liu, Ling-Ling Zheng, Liang-Hu Qu, and Jian-Hua Yang. 2017. “RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data.” Nucleic Acids Research 46 (D1): D327–D334. https://doi.org/10.1093/nar/gkx934.