Contents

1 Available CAGE data resources

Note: this still uses the CAGEset class.

There are several large collections of CAGE data available that provide single base-pair resolution TSSs for numerous human and mouse primary cells, cell lines and tissues. Together with several minor datasets for other model organisms (Drosophila melanogaster, Danio rerio) they are a valuable resource that provides cell/tissue type and developmental stage specific TSSs essential for any type of promoter centred analysis. By enabling direct and user-friendly import of TSSs for selected samples into R, CAGEr facilitates the integration of these precise TSS data with other genomic data types. Each of the available CAGE data resources accessible from within CAGEr is explained in more detail further below.

1.1 FANTOM5

FANTOM consortium provides single base-pair resolution TSS data for numerous human and mouse primary cells, cell lines and tissues. The main FANTOM5 publication (Consortium 2014) released ~1000 human and ~400 mouse CAGE samples that cover the vast majority of human primary cell types, mouse developmental tissues and a large number of commonly used cell lines. These data is available from FANTOM web resource http://fantom.gsc.riken.jp/5/data/ in the form of TSS files with genomic coordinates and number of tags mapping to each TSS detected by CAGE. The list of all available samples for both human and mouse (as presented in the Supplementary Table 1 of the publication) has been included in CAGEr to facilitate browsing, searching and selecting samples of interest. TSSs for selected samples are then fetched directly form the web resource and imported into a CAGEset object enabling their further manipulation with CAGEr.

1.2 FANTOM3 and 4

Previous FANTOM projects (3 and 4) (Consortium 2005, @Faulkner:2009fw, @Suzuki:2009gy)) produced CAGE datasets for multiple human and mouse tissues as well as several timecourses, including differentiation of a THP-1 human myeloid leukemia cell line. All this TSS data has been grouped into datasets by the organism and tissue of origin and has been collected into an R data package named FANTOM3and4CAGE, which is available from Bioconductor https://bioconductor.org/packages/FANTOM3and4CAGE. The vignette accompanying the package provides information on available datasets and lists of samples. When the data package is installed, CAGEr can import the TSSs for selected samples directly into a CAGEset object for further manipulation.

1.3 ENCODE cell lines

ENCODE consortium produced CAGE data for common human cell lines (Djebali et al. 2012), which were used by ENCODE for various other types of genome-wide analyses. The advantage of this dataset is that it enables the integration of precise TSSs from a specific cell line with many other genome-wide data types provided by ENCODE for the same cell line. However, the format of CAGE data provided by ENCODE at UCSC (http://genome.ucsc.edu/ENCODE/dataMatrix/encodeDataMatrixHuman.html) includes only raw mapped CAGE tags and their coverage along the genome, and coordinates of enriched genomic regions (peaks), which do not take advantage of the single base-pair resolution provided by CAGE. To address this, we have used the raw CAGE tags to derive single base-pair resolution TSSs and collected them into an R data package named ENCODEprojectCAGE. This data package is available for download from CAGEr web site at http://promshift.genereg.net/CAGEr and includes TSSs for 36 different cell lines fractionated by cellular compartment. The vignette accompanying the package provides information on available datasets and lists of individual samples. Once the package has been downloaded and installed, CAGEr can access it to import TSS data for selected subset of samples for further manipulation and integration.

1.4 Zebrafish developmental timecourse

Precise TSSs are also available for zebrafish (Danio Rerio) from CAGE data published by (Nepal et al. 2013). The timecourse covering early embryonic development of zebrafish includes 12 developmental stages. The TSS data has been collected into an R data package named ZebrafishDevelopmentalCAGE, which is available for download from CAGEr web site at http://promshift.genereg.net/CAGEr. As with other data packages mentioned above, once the package is installed CAGEr can use it to import stage-specific single base pair TSSs into a CAGEset object.

2 Importing public TSS data for manipulation in CAGEr

Note: this still uses the CAGEset class.

The data from above mentioned resources can be imported into a CAGEset object using the importPublicData() function. It function has four arguments: source, dataset, group and sample. Argument source accepts one of the following values: "FANTOM5", "FANTOM3and4", "ENCODE", or "ZebrafishDevelopment", which refer to one of the four resources listed above. The following sections explain how to utilize this function for each of the four currently supported resources.

2.1 FANTOM5 human and mouse samples

Lists of all human and mouse CAGE samples produced within FANTOM5 project are available in CAGEr. To load the information on human samples type:

library(CAGEr)
data(FANTOM5humanSamples)
head(FANTOM5humanSamples)
##                                                     sample      type
## 143      acantholytic_squamous_carcinoma_cell_line_HCC1806 cell line
## 46   acute_lymphoblastic_leukemia__B-ALL__cell_line_BALL-1 cell line
## 75   acute_lymphoblastic_leukemia__B-ALL__cell_line_NALM-6 cell line
## 24  acute_lymphoblastic_leukemia__T-ALL__cell_line_HPB-ALL cell line
## 48   acute_lymphoblastic_leukemia__T-ALL__cell_line_Jurkat cell line
## 250     acute_myeloid_leukemia__FAB_M0__cell_line_Kasumi-3 cell line
##                                                description library_id
## 143      acantholytic squamous carcinoma cell line:HCC1806  CNhs11844
## 46   acute lymphoblastic leukemia (B-ALL) cell line:BALL-1  CNhs11251
## 75   acute lymphoblastic leukemia (B-ALL) cell line:NALM-6  CNhs11282
## 24  acute lymphoblastic leukemia (T-ALL) cell line:HPB-ALL  CNhs10746
## 48   acute lymphoblastic leukemia (T-ALL) cell line:Jurkat  CNhs11253
## 250     acute myeloid leukemia (FAB M0) cell line:Kasumi-3  CNhs13241
##                                                                                                                                                                                                      data_url
## 143                    http://fantom.gsc.riken.jp/5/datafiles/latest/basic/human.cell_line.hCAGE/acantholytic%2520squamous%2520carcinoma%2520cell%2520line%253aHCC1806.CNhs11844.10717-109I6.hg19.ctss.bed.gz
## 46     http://fantom.gsc.riken.jp/5/datafiles/latest/basic/human.cell_line.hCAGE/acute%2520lymphoblastic%2520leukemia%2520%2528B-ALL%2529%2520cell%2520line%253aBALL-1.CNhs11251.10455-106G5.hg19.ctss.bed.gz
## 75     http://fantom.gsc.riken.jp/5/datafiles/latest/basic/human.cell_line.hCAGE/acute%2520lymphoblastic%2520leukemia%2520%2528B-ALL%2529%2520cell%2520line%253aNALM-6.CNhs11282.10534-107G3.hg19.ctss.bed.gz
## 24    http://fantom.gsc.riken.jp/5/datafiles/latest/basic/human.cell_line.hCAGE/acute%2520lymphoblastic%2520leukemia%2520%2528T-ALL%2529%2520cell%2520line%253aHPB-ALL.CNhs10746.10429-106D6.hg19.ctss.bed.gz
## 48     http://fantom.gsc.riken.jp/5/datafiles/latest/basic/human.cell_line.hCAGE/acute%2520lymphoblastic%2520leukemia%2520%2528T-ALL%2529%2520cell%2520line%253aJurkat.CNhs11253.10464-106H5.hg19.ctss.bed.gz
## 250 http://fantom.gsc.riken.jp/5/datafiles/latest/basic/human.cell_line.LQhCAGE/acute%2520myeloid%2520leukemia%2520%2528FAB%2520M0%2529%2520cell%2520line%253aKasumi-3.CNhs13241.10789-110H6.hg19.ctss.bed.gz
nrow(FANTOM5humanSamples)
## [1] 988

There are 988 human samples in total and for each the following information is provided:

  • sample: a unique name/label of the sample which should be provided to importPublicData() function to retrieve given sample

  • type: type of sample, which can be “cell line”, “primary cell” or “tissue”

  • description: short description of the sample as provided in FANTOM5 main publication (Consortium 2014)

  • library_id: unique ID of the CAGE library within FANTOM5

  • data_url: URL to corresponding CTSS file at FANTOM5 web resource from which the data is fetched

Provided information facilitates searching for samples of interest, e.g. we can search for astrocyte samples:

astrocyteSamples <- FANTOM5humanSamples[grep("Astrocyte", 
                FANTOM5humanSamples[,"description"]),]
astrocyteSamples
##                                  sample         type
## 343      Astrocyte_-_cerebellum__donor1 primary cell
## 537      Astrocyte_-_cerebellum__donor2 primary cell
## 562      Astrocyte_-_cerebellum__donor3 primary cell
## 286 Astrocyte_-_cerebral_cortex__donor1 primary cell
## 429 Astrocyte_-_cerebral_cortex__donor2 primary cell
## 472 Astrocyte_-_cerebral_cortex__donor3 primary cell
##                             description library_id
## 343      Astrocyte - cerebellum, donor1  CNhs11321
## 537      Astrocyte - cerebellum, donor2  CNhs12081
## 562      Astrocyte - cerebellum, donor3  CNhs12117
## 286 Astrocyte - cerebral cortex, donor1  CNhs10864
## 429 Astrocyte - cerebral cortex, donor2  CNhs11960
## 472 Astrocyte - cerebral cortex, donor3  CNhs12005
##                                                                                                                                                                        data_url
## 343          http://fantom.gsc.riken.jp/5/datafiles/latest/basic/human.primary_cell.hCAGE/Astrocyte%2520-%2520cerebellum%252c%2520donor1.CNhs11321.11500-119F6.hg19.ctss.bed.gz
## 537          http://fantom.gsc.riken.jp/5/datafiles/latest/basic/human.primary_cell.hCAGE/Astrocyte%2520-%2520cerebellum%252c%2520donor2.CNhs12081.11580-120F5.hg19.ctss.bed.gz
## 562          http://fantom.gsc.riken.jp/5/datafiles/latest/basic/human.primary_cell.hCAGE/Astrocyte%2520-%2520cerebellum%252c%2520donor3.CNhs12117.11661-122F5.hg19.ctss.bed.gz
## 286 http://fantom.gsc.riken.jp/5/datafiles/latest/basic/human.primary_cell.hCAGE/Astrocyte%2520-%2520cerebral%2520cortex%252c%2520donor1.CNhs10864.11235-116D2.hg19.ctss.bed.gz
## 429 http://fantom.gsc.riken.jp/5/datafiles/latest/basic/human.primary_cell.hCAGE/Astrocyte%2520-%2520cerebral%2520cortex%252c%2520donor2.CNhs11960.11316-117D2.hg19.ctss.bed.gz
## 472 http://fantom.gsc.riken.jp/5/datafiles/latest/basic/human.primary_cell.hCAGE/Astrocyte%2520-%2520cerebral%2520cortex%252c%2520donor3.CNhs12005.11392-118C6.hg19.ctss.bed.gz
data(FANTOM5mouseSamples)
head(FANTOM5mouseSamples)
##                                                                           sample
## 144 J2E_erythroblastic_leukemia_response_to_erythropoietin__00hr00min__biol_rep1
## 145 J2E_erythroblastic_leukemia_response_to_erythropoietin__00hr00min__biol_rep2
## 146 J2E_erythroblastic_leukemia_response_to_erythropoietin__00hr00min__biol_rep3
## 91                           Atoh1+_Inner_ear_hair_cells_-_organ_of_corti__pool1
## 111       CD326+_enterocyte_isolated_from_mice__treated_with_RANKL__day03__pool1
## 109      CD326++_enterocyte_isolated_from_mice__treated_with_RANKL__day03__pool1
##             type
## 144    cell line
## 145    cell line
## 146    cell line
## 91  primary cell
## 111 primary cell
## 109 primary cell
##                                                                      description
## 144 J2E erythroblastic leukemia response to erythropoietin, 00hr00min, biol_rep1
## 145 J2E erythroblastic leukemia response to erythropoietin, 00hr00min, biol_rep2
## 146 J2E erythroblastic leukemia response to erythropoietin, 00hr00min, biol_rep3
## 91                           Atoh1+ Inner ear hair cells - organ of corti, pool1
## 111       CD326+ enterocyte isolated from mice, treated with RANKL, day03, pool1
## 109      CD326++ enterocyte isolated from mice, treated with RANKL, day03, pool1
##     library_id
## 144  CNhs12449
## 145  CNhs12668
## 146  CNhs12770
## 91   CNhs12533
## 111  CNhs13242
## 109  CNhs13236
##                                                                                                                                                                                                                                                 data_url
## 144                    http://fantom.gsc.riken.jp/5/datafiles/latest/basic/mouse.timecourse.hCAGE/J2E%2520erythroblastic%2520leukemia%2520response%2520to%2520erythropoietin%252c%252000hr00min%252c%2520biol_rep1.CNhs12449.13063-139I3.mm9.ctss.bed.gz
## 145                    http://fantom.gsc.riken.jp/5/datafiles/latest/basic/mouse.timecourse.hCAGE/J2E%2520erythroblastic%2520leukemia%2520response%2520to%2520erythropoietin%252c%252000hr00min%252c%2520biol_rep2.CNhs12668.13129-140G6.mm9.ctss.bed.gz
## 146                    http://fantom.gsc.riken.jp/5/datafiles/latest/basic/mouse.timecourse.hCAGE/J2E%2520erythroblastic%2520leukemia%2520response%2520to%2520erythropoietin%252c%252000hr00min%252c%2520biol_rep3.CNhs12770.13195-141E9.mm9.ctss.bed.gz
## 91                                  http://fantom.gsc.riken.jp/5/datafiles/latest/basic/mouse.primary_cell.LQhCAGE/Atoh1%252b%2520Inner%2520ear%2520hair%2520cells%2520-%2520organ%2520of%2520corti%252c%2520pool1.CNhs12533.12158-128G7.mm9.ctss.bed.gz
## 111      http://fantom.gsc.riken.jp/5/datafiles/latest/basic/mouse.primary_cell.LQhCAGE/CD326%252b%2520enterocyte%2520isolated%2520from%2520mice%252c%2520treated%2520with%2520RANKL%252c%2520day03%252c%2520pool1.CNhs13242.11850-124I5.mm9.ctss.bed.gz
## 109 http://fantom.gsc.riken.jp/5/datafiles/latest/basic/mouse.primary_cell.LQhCAGE/CD326%252b%252b%2520enterocyte%2520isolated%2520from%2520mice%252c%2520treated%2520with%2520RANKL%252c%2520day03%252c%2520pool1.CNhs13236.11852-124I7.mm9.ctss.bed.gz
nrow(FANTOM5mouseSamples)
## [1] 395

To import TSS data for samples of interest from FANTOM5 we use the importPublicData() function and set the argument source = "FANTOM5". The dataset argument can be set to either "human" or "mouse", and the sample argument is provided by a vector of sample lables/names. For example, names of astrocyte samples from above are:

astrocyteSamples[,"sample"]
## [1] "Astrocyte_-_cerebellum__donor1"      "Astrocyte_-_cerebellum__donor2"     
## [3] "Astrocyte_-_cerebellum__donor3"      "Astrocyte_-_cerebral_cortex__donor1"
## [5] "Astrocyte_-_cerebral_cortex__donor2" "Astrocyte_-_cerebral_cortex__donor3"

and to import first three samples type:

astrocyteCAGEset <- importPublicData(source = "FANTOM5", dataset = "human", 
                    sample = astrocyteSamples[1:3,"sample"])

The resulting astrocyteCAGEset is a CAGEset object that can be included in the CAGEr workflow described above to perform normalisation, clustering, visualisation, etc.

2.2 FANTOM3and4CAGE data package

To use TSS data from FANTOM3 and FANTOM4 projects, a data package FANTOM3and4CAGE has to be installed and loaded. This package is available from Bioconductor and can be installed by calling:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("FANTOM3and4CAGE")

For the list of available datasets with group and sample labels for specific human or mouse samples load the data package and get list of samples:

library(FANTOM3and4CAGE)
data(FANTOMhumanSamples)
head(FANTOMhumanSamples)
##                 dataset           group          sample
## 1 FANTOMtissueCAGEhuman        cerebrum        cerebrum
## 2 FANTOMtissueCAGEhuman    renal_artery    renal_artery
## 3 FANTOMtissueCAGEhuman          ureter          ureter
## 4 FANTOMtissueCAGEhuman urinary_bladder urinary_bladder
## 5 FANTOMtissueCAGEhuman          kidney      malignancy
## 6 FANTOMtissueCAGEhuman          kidney          kidney
data(FANTOMmouseSamples)
head(FANTOMmouseSamples)
##                 dataset           group
## 1 FANTOMtissueCAGEmouse           brain
## 2 FANTOMtissueCAGEmouse           brain
## 3 FANTOMtissueCAGEmouse           brain
## 4 FANTOMtissueCAGEmouse           brain
## 5 FANTOMtissueCAGEmouse cerebral_cortex
## 6 FANTOMtissueCAGEmouse     hippocampus
##                                          sample
## 1                                         brain
## 2                      CCL-131_Neuro-2a_control
## 3  CCL-131_Neuro-2a_treatment_for_6hr_with_MPP+
## 4 CCL-131_Neuro-2a_treatment_for_12hr_with_MPP+
## 5                               cerebral_cortex
## 6                                   hippocampus

In the above data frames, the columns dataset, group and sample provide values that should be passed to corresponding arguments in importPublicData() function. For example to import human kidney normal and malignancy samples call:

kidneyCAGEset <- importPublicData(source = "FANTOM3and4",
            dataset = "FANTOMtissueCAGEhuman", 
            group = "kidney", sample = c("kidney", "malignancy"))

When the samples belong to different groups or different datasets, it is necessary to provide the dataset and group assignment for each sample separately:

mixedCAGEset <- importPublicData(source = "FANTOM3and4",
        dataset = c("FANTOMtissueCAGEmouse", "FANTOMtissueCAGEmouse", 
        "FANTOMtimecourseCAGEmouse"), group = c("liver", "liver", 
        "liver_under_constant_darkness"), 
        sample = c("cloned_mouse", "control_mouse", "4_hr"))

For more details about datasets available in the data package please refer to the vignette accompanying the package.

2.3 ENCODEprojectCAGE data package

TSS data derived from ENCODE CAGE datasets has been collected into ENCODEprojectCAGE data package, which is available for download from the CAGEr web site (http://promshift.genereg.net/CAGEr/). Downloaded package can be installed from local using install.packages() function from within R and used with CAGEr as described below. List of datasets available in this data package can be obtained like this:

library(ENCODEprojectCAGE)
data(ENCODEhumanCellLinesSamples)

The information provided in this data frame is analogous to the one in previously discussed data package and provides values to be used with importPublicData() function. The command to import whole cell CAGE samples for three different cell lines would look like this:

ENCODEset <- importPublicData(source = "ENCODE", 
    dataset = c("A549", "H1-hESC", "IMR90"), 
    group = c("cell", "cell", "cell"), sample = c("A549_cell_rep1", 
    "H1-hESC_cell_rep1", "IMR90_cell_rep1"))

For more details about datasets available in the ENCODEprojectCAGE data package please refer to the vignette accompanying the package.

2.4 ZebrafishDevelopmentalCAGE data package

The zebrafish TSS data for 12 developmental stages is collected in ZebrafishDevelopmentalCAGE data package, which is also available for download from the CAGEr web site (http://promshift.genereg.net/CAGEr/). It can be installed from local using install.packages() function. To get a list of samples within the package type:

library(ZebrafishDevelopmentalCAGE)
data(ZebrafishSamples)

In this package there is only one dataset called ZebrafishCAGE and all samples belong to the same group called development. To import selected samples from this dataset type:

zebrafishCAGEset <- importPublicData(source = "ZebrafishDevelopment", 
            dataset = "ZebrafishCAGE", group = "development", 
            sample = c("zf_64cells", "zf_prim6"))

For more details please refer to the vignette accompanying the data package.

Importing TSS data from any of the four above explained resources results in the CAGEset object that can be directly included into the workflow provided by CAGEr to perform normalisation, clustering, promoter width analysis, visualisation, etc. This high-resolution TSS data can then easily be integrated with other genomic data types to perform various promoter-centred analyses, which does not rely on annotation but uses precise and matched cell/tissue type TSSs.

Session info

sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] FANTOM3and4CAGE_1.27.0      CAGEr_1.34.0               
##  [3] MultiAssayExperiment_1.18.0 SummarizedExperiment_1.22.0
##  [5] Biobase_2.52.0              GenomicRanges_1.44.0       
##  [7] GenomeInfoDb_1.28.0         IRanges_2.26.0             
##  [9] S4Vectors_0.30.0            BiocGenerics_0.38.0        
## [11] MatrixGenerics_1.4.0        matrixStats_0.58.0         
## [13] BiocStyle_2.20.0           
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-152             bitops_1.0-7             tools_4.1.0             
##  [4] bslib_0.2.5.1            utf8_1.2.1               R6_2.5.0                
##  [7] vegan_2.5-7              KernSmooth_2.23-20       mgcv_1.8-35             
## [10] DBI_1.1.1                colorspace_2.0-1         permute_0.9-5           
## [13] tidyselect_1.1.1         compiler_4.1.0           DelayedArray_0.18.0     
## [16] rtracklayer_1.52.0       bookdown_0.22            sass_0.4.0              
## [19] scales_1.1.1             stringr_1.4.0            digest_0.6.27           
## [22] Rsamtools_2.8.0          rmarkdown_2.8            stringdist_0.9.6.3      
## [25] XVector_0.32.0           pkgconfig_2.0.3          htmltools_0.5.1.1       
## [28] fastmap_1.1.0            BSgenome_1.60.0          rlang_0.4.11            
## [31] VGAM_1.1-5               jquerylib_0.1.4          BiocIO_1.2.0            
## [34] generics_0.1.0           jsonlite_1.7.2           BiocParallel_1.26.0     
## [37] gtools_3.8.2             dplyr_1.0.6              RCurl_1.98-1.3          
## [40] magrittr_2.0.1           GenomeInfoDbData_1.2.6   Matrix_1.3-3            
## [43] Rcpp_1.0.6               munsell_0.5.0            fansi_0.4.2             
## [46] lifecycle_1.0.0          stringi_1.6.2            yaml_2.2.1              
## [49] MASS_7.3-54              zlibbioc_1.38.0          plyr_1.8.6              
## [52] grid_4.1.0               formula.tools_1.7.1      crayon_1.4.1            
## [55] lattice_0.20-44          Biostrings_2.60.0        splines_4.1.0           
## [58] knitr_1.33               beanplot_1.2             pillar_1.6.1            
## [61] rjson_0.2.20             codetools_0.2-18         XML_3.99-0.6            
## [64] glue_1.4.2               evaluate_0.14            data.table_1.14.0       
## [67] BiocManager_1.30.15      operator.tools_1.6.3     vctrs_0.3.8             
## [70] gtable_0.3.0             purrr_0.3.4              reshape_0.8.8           
## [73] assertthat_0.2.1         cachem_1.0.5             ggplot2_3.3.3           
## [76] xfun_0.23                restfulr_0.0.13          tibble_3.1.2            
## [79] som_0.3-5.1              GenomicAlignments_1.28.0 memoise_2.0.0           
## [82] cluster_2.1.2            ellipsis_0.3.2

References

Consortium, The FANTOM. 2005. “The Transcriptional Landscape of the Mammalian Genome.” Science 309 (5740): 1559–63.

———. 2014. “A promoter-level mammalian expression atlas.” Nature 507 (7493): 462–70.

Djebali, Sarah, Carrie A Davis, Angelika Merkel, Alex Dobin, Timo Lassmann, Ali Mortazavi, Andrea Tanzer, et al. 2012. “Landscape of transcription in human cells.” Nature 488 (7414): 101–8.

Faulkner, Geoffrey J, Yasumasa Kimura, Carsten O Daub, Shivangi Wani, Charles Plessy, Katharine M Irvine, Kate Schroder, et al. 2009. “The regulated retrotransposon transcriptome of mammalian cells.” Nature Genetics 41 (5): 563–71.

Nepal, Chirag, Yavor Hadzhiev, Christopher Previti, Vanja Haberle, Nan Li, Hazuki Takahashi, Ana Maria S. Suzuki, et al. 2013. “Dynamic regulation of coding and non-coding transcription initiation landscape at single nucleotide resolution during vertebrate embryogenesis.” Genome Research 23 (11): 1938–50.

Suzuki, Harukazu, Alistair R R Forrest, Erik van Nimwegen, Carsten O Daub, Piotr J Balwierz, Katharine M Irvine, Timo Lassmann, et al. 2009. “The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line.” Nature Genetics 41 (5): 553–62.