Contents

0.1 Introduction

The raerdata package contains datasets and databases used to illustrate functionality to characterize RNA editing using the raer package. Included in the package are databases of known human and mouse RNA editing sites. Datasets have been preprocessed to generate smaller examples suitable for quick exploration of the data and demonstration of the raer package.

0.2 Installation

if (!require("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

# The following initializes usage of Bioc devel
BiocManager::install(version = "devel")

BiocManager::install("raerdata")
library(raerdata)

0.3 RNA editing Atlases

Atlases of known human and mouse A-to-I RNA editing sites formatted into GRanges objects are provided.

0.3.1 REDIportal

The REDIportal is a collection of RNA editing sites identified from multiple studies in multiple species (Picardi et al. (2017)). The human (hg38) and mouse (mm10) collections are provided in GRanges objects, in either coordinate only format, or with additional metadata.

rediportal_coords_hg38()
## GRanges object with 15638648 ranges and 0 metadata columns:
##              seqnames    ranges strand
##                 <Rle> <IRanges>  <Rle>
##          [1]     chr1     87158      -
##          [2]     chr1     87168      -
##          [3]     chr1     87171      -
##          [4]     chr1     87189      -
##          [5]     chr1     87218      -
##          ...      ...       ...    ...
##   [15638644]     chrY  56885715      +
##   [15638645]     chrY  56885716      +
##   [15638646]     chrY  56885728      +
##   [15638647]     chrY  56885841      +
##   [15638648]     chrY  56885850      +
##   -------
##   seqinfo: 44 sequences from hg38 genome; no seqlengths

0.3.2 CDS recoding sites

Human CDS recoding RNA editing sites identified by Gabay et al. (2022) were formatted into GRanges objects. These sites were also lifted over to the mouse genome (mm10).

cds_sites <- gabay_sites_hg38()
cds_sites[1:4, 1:4]
## GRanges object with 4 ranges and 4 metadata columns:
##       seqnames    ranges strand |    GeneName
##          <Rle> <IRanges>  <Rle> | <character>
##   [1]     chr1    999279      - |        HES4
##   [2]     chr1   1014084      + |       ISG15
##   [3]     chr1   1281229      + |      SCNN1D
##   [4]     chr1   1281248      + |      SCNN1D
##       RefseqAccession_1,ExonNum_1,NucleotideSubstitution_1,AminoAcidSubstitution_1;…;RefseqAccession_N,ExonNum_N,NucleotideSubstitution_N,AminoAcidSubstitution_N
##                                                                                                                                                       <character>
##   [1]                                                                                                                                      NM_001142467.1,exon3..
##   [2]                                                                                                                                      NM_005101.3,exon2,c...
##   [3]                                                                                                                                      NM_001130413.3,exon2..
##   [4]                                                                                                                                      NM_001130413.3,exon2..
##          Syn/NonSyn Diversifying/Restorative/Syn
##         <character>                  <character>
##   [1] nonsynonymous                           NA
##   [2] nonsynonymous                           NA
##   [3]    synonymous                           NA
##   [4] nonsynonymous                           NA
##   -------
##   seqinfo: 23 sequences from hg38 genome; no seqlengths

0.4 Datasets

0.4.1 Whole genome and RNA sequencing data from NA12878 cell line

WGS and RNA-seq BAM and associated files generated from a subset of chromosome 4. Paths to files and related data objects are returned in a list.

NA12878()
## $bams
## BamFileList of length 2
## names(2): NA12878_RNASEQ NA12878_WGS
## 
## $fasta
## [1] "/home/biocbuild/.cache/R/ExperimentHub/2148b22a9928a2_8469"
## 
## $snps
## GRanges object with 380175 ranges and 2 metadata columns:
##            seqnames    ranges strand |         name     score
##               <Rle> <IRanges>  <Rle> |  <character> <numeric>
##        [1]     chr4     10001      * | rs1581341342         0
##        [2]     chr4     10002      * | rs1581341346         0
##        [3]     chr4     10004      * | rs1581341351         0
##        [4]     chr4     10005      * | rs1581341354         0
##        [5]     chr4     10006      * | rs1209159313         0
##        ...      ...       ...    ... .          ...       ...
##   [380171]     chr4    999987      * | rs1577536513         0
##   [380172]     chr4    999989      * |  rs948695434         0
##   [380173]     chr4    999991      * | rs1044698628         0
##   [380174]     chr4    999996      * | rs1361920394         0
##   [380175]     chr4    999997      * |   rs59206677         0
##   -------
##   seqinfo: 711 sequences (1 circular) from hg38 genome

0.4.2 GSE99249: RNA-Seq of Interferon beta treatment of ADAR1KO cell line

RNA-seq BAM files from ADAR1KO and Wild-Type HEK293 cells and associated reference files from chromosome 18 (Chung et al. (2018)).

GSE99249()
## $bams
## BamFileList of length 6
## names(6): SRR5564260 SRR5564261 SRR5564269 SRR5564270 SRR5564271 SRR5564277
## 
## $fasta
## [1] "/home/biocbuild/.cache/R/ExperimentHub/2148b24ed4f154_8310"
## 
## $sites
## GRanges object with 15638648 ranges and 0 metadata columns:
##              seqnames    ranges strand
##                 <Rle> <IRanges>  <Rle>
##          [1]     chr1     87158      -
##          [2]     chr1     87168      -
##          [3]     chr1     87171      -
##          [4]     chr1     87189      -
##          [5]     chr1     87218      -
##          ...      ...       ...    ...
##   [15638644]     chrY  56885715      +
##   [15638645]     chrY  56885716      +
##   [15638646]     chrY  56885728      +
##   [15638647]     chrY  56885841      +
##   [15638648]     chrY  56885850      +
##   -------
##   seqinfo: 44 sequences from hg38 genome; no seqlengths

0.4.3 10x Genomics 10k PBMC scRNA-seq

10x Genomics BAM file and RNA editing sites from chromosome 16 of human PBMC scRNA-seq library. Also included is a SingleCellExperiment object containing gene expression values, cluster annotations, cell-type annotations, and a UMAP projection.

pbmc_10x()
## $bam
## class: BamFile 
## path: /home/biocbuild/.cache/R/ExperimentHub/2148b26f471a5d_8311
## index: /home/biocbuild/.cache/R/ExperimentHub/2148b251cf2365_8312
## isOpen: FALSE 
## yieldSize: NA 
## obeyQname: FALSE 
## asMates: FALSE 
## qnamePrefixEnd: NA 
## qnameSuffixStart: NA 
## 
## $sites
## GRanges object with 15638648 ranges and 0 metadata columns:
##              seqnames    ranges strand
##                 <Rle> <IRanges>  <Rle>
##          [1]     chr1     87158      -
##          [2]     chr1     87168      -
##          [3]     chr1     87171      -
##          [4]     chr1     87189      -
##          [5]     chr1     87218      -
##          ...      ...       ...    ...
##   [15638644]     chrY  56885715      +
##   [15638645]     chrY  56885716      +
##   [15638646]     chrY  56885728      +
##   [15638647]     chrY  56885841      +
##   [15638648]     chrY  56885850      +
##   -------
##   seqinfo: 44 sequences from hg38 genome; no seqlengths
## 
## $sce
## class: SingleCellExperiment 
## dim: 36601 500 
## metadata(2): Samples mkrs
## assays(2): counts logcounts
## rownames(36601): MIR1302-2HG FAM138A ... AC007325.4 AC007325.2
## rowData names(3): ID Symbol Type
## colnames(500): TGTTTGTCAGTTAGGG-1 ATCTCTACAAGCTACT-1 ...
##   GGGCGTTTCAGGACGA-1 CTATAGGAGATTGTGA-1
## colData names(8): Sample Barcode ... r celltype
## reducedDimNames(2): PCA UMAP
## mainExpName: NULL
## altExpNames(0):

0.5 ExperimentHub access

Alternatively individual files can be accessed from the ExperimentHub directly

library(ExperimentHub)
eh <- ExperimentHub()
raerdata_files <- query(eh, "raerdata")
data.frame(
    id = raerdata_files$ah_id,
    title = raerdata_files$title,
    description = raerdata_files$description
)
ABCDEFGHIJ0123456789
id
<chr>
title
<chr>
EH8233Rediportal RNA editing sites, all data, (mm10)
EH8234Rediportal RNA editing sites, coordinates only, (mm10)
EH8235Rediportal RNA editing sites, all data, (hg38)
EH8236Rediportal RNA editing sites, coordinates only, (hg38)
EH8237Mouse adenosine-to-inosine RNA recoding sites
EH8238Human adenosine-to-inosine RNA recoding sites
EH8239SRR5564260_chr18.bam
EH8240SRR5564260_chr18.bam.bai
EH8241SRR5564261_chr18.bam
EH8242SRR5564261_chr18.bam.bai
EH8243SRR5564269_chr18.bam
EH8244SRR5564269_chr18.bam.bai
EH8245SRR5564270_chr18.bam
EH8246SRR5564270_chr18.bam.bai
EH8247SRR5564271_chr18.bam
EH8248SRR5564271_chr18.bam.bai
EH8249SRR5564277_chr18.bam
EH8250SRR5564277_chr18.bam.bai
EH8251Genome sequence of chr18 (hg38)
EH825210k PBMC scRNA-seq library BAM
EH825310k PBMC scRNA-seq library BAM index
EH825410k PBMC scRNA-seq library SingleCellExperiment
EH8255Rediportal RNA editing sites, chr16, coordinates only, (hg38)
EH8405WGS of NA12878 cell line, first megabase of chr4, (BAM)
EH8406WGS of NA12878 cell line, first megabase of chr4, (BAI)
EH8407RNA-seq of NA12878 cell line, first megabase of chr4, (BAM)
EH8408RNA-seq of NA12878 cell line, first megabase of chr4, (BAI)
EH8409SNPs from chr4 in dbSNP155
EH8410Genome sequence of first megabase of chr4 (hg38)

Session info

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ExperimentHub_2.14.0              AnnotationHub_3.14.0             
##  [3] BiocFileCache_2.14.0              dbplyr_2.5.0                     
##  [5] SingleCellExperiment_1.28.0       SummarizedExperiment_1.36.0      
##  [7] Biobase_2.66.0                    MatrixGenerics_1.18.0            
##  [9] matrixStats_1.4.1                 Rsamtools_2.22.0                 
## [11] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.74.0                  
## [13] BiocIO_1.16.0                     Biostrings_2.74.0                
## [15] XVector_0.46.0                    rtracklayer_1.66.0               
## [17] GenomicRanges_1.58.0              GenomeInfoDb_1.42.0              
## [19] IRanges_2.40.0                    S4Vectors_0.44.0                 
## [21] BiocGenerics_0.52.0               raerdata_1.4.0                   
## [23] BiocStyle_2.34.0                 
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1         dplyr_1.1.4              blob_1.2.4              
##  [4] filelock_1.0.3           bitops_1.0-9             fastmap_1.2.0           
##  [7] RCurl_1.98-1.16          GenomicAlignments_1.42.0 XML_3.99-0.17           
## [10] digest_0.6.37            mime_0.12                lifecycle_1.0.4         
## [13] KEGGREST_1.46.0          RSQLite_2.3.7            magrittr_2.0.3          
## [16] compiler_4.4.1           rlang_1.1.4              sass_0.4.9              
## [19] tools_4.4.1              utf8_1.2.4               yaml_2.3.10             
## [22] knitr_1.48               S4Arrays_1.6.0           bit_4.5.0               
## [25] curl_5.2.3               DelayedArray_0.32.0      abind_1.4-8             
## [28] BiocParallel_1.40.0      withr_3.0.2              purrr_1.0.2             
## [31] grid_4.4.1               fansi_1.0.6              cli_3.6.3               
## [34] rmarkdown_2.28           crayon_1.5.3             generics_0.1.3          
## [37] httr_1.4.7               rjson_0.2.23             DBI_1.2.3               
## [40] cachem_1.1.0             zlibbioc_1.52.0          parallel_4.4.1          
## [43] AnnotationDbi_1.68.0     BiocManager_1.30.25      restfulr_0.0.15         
## [46] vctrs_0.6.5              Matrix_1.7-1             jsonlite_1.8.9          
## [49] bookdown_0.41            bit64_4.5.2              jquerylib_0.1.4         
## [52] glue_1.8.0               codetools_0.2-20         BiocVersion_3.20.0      
## [55] UCSC.utils_1.2.0         tibble_3.2.1             pillar_1.9.0            
## [58] rappdirs_0.3.3           htmltools_0.5.8.1        GenomeInfoDbData_1.2.13 
## [61] R6_2.5.1                 evaluate_1.0.1           lattice_0.22-6          
## [64] png_0.1-8                memoise_2.0.1            bslib_0.8.0             
## [67] SparseArray_1.6.0        xfun_0.48                pkgconfig_2.0.3

Chung, Hachung, Jorg J A Calis, Xianfang Wu, Tony Sun, Yingpu Yu, Stephanie L Sarbanes, Viet Loan Dao Thi, et al. 2018. “Human ADAR1 Prevents Endogenous RNA from Triggering Translational Shutdown.” Cell 172 (4): 811–824.e14. https://doi.org/10.1016/j.cell.2017.12.038.

Gabay, Orshay, Yoav Shoshan, Eli Kopel, Udi Ben-Zvi, Tomer D Mann, Noam Bressler, Roni Cohen-Fultheim, et al. 2022. “Landscape of Adenosine-to-Inosine RNA Recoding Across Human Tissues.” Nat. Commun. 13 (1): 1184. https://doi.org/10.1038/s41467-022-28841-4.

Picardi, Ernesto, Anna Maria D’Erchia, Claudio Lo Giudice, and Graziano Pesole. 2017. “REDIportal: A Comprehensive Database of A-to-I RNA Editing Events in Humans.” Nucleic Acids Res. 45 (D1): D750–D757. https://doi.org/10.1093/nar/gkw767.