1 Introduction
2 Metadata on the ChromImpute archive
- 2.1 Sample metadata
- 2.2 Metadata on the inferred states
3 Managing access to imputed chromatin states for a set of cell types
4 Enumerating states in the vicinity of a gene, across cell types

1 Introduction

The epigenomics road map describes locations of epigenetic marks in DNA from a variety of cell types. Of interest are locations of histone modifications, sites of DNA methylation, and regions of accessible chromatin.

This package presents a selection of elements of the road map including metadata and outputs of the ChromImpute procedure applied to ENCODE cell lines by Ernst and Kellis.

2 Metadata on the ChromImpute archive

2.1 Sample metadata

I have retrieved a Google Docs spreadsheet with comprehensive information. The mapmeta() function provides access to a local DataFrame image of the file as retrieved in mid April 2015. We provide a dynamic view of a selection of columns. Use the search box to filter records shown, for example .

library(DT)
library(erma)
meta = mapmeta()

## NOTE: input data had non-ASCII characters replaced by ' '.

kpc = c("Comments", "Epigenome.ID..EID.", "Epigenome.Mnemonic", "Quality.Rating", 
"Standardized.Epigenome.name", "ANATOMY", "TYPE")
datatable(as.data.frame(meta[,kpc]))

Show entries

Search:

	Comments	Epigenome.ID..EID.	Epigenome.Mnemonic	Quality.Rating	Standardized.Epigenome.name	ANATOMY	TYPE
1	Outlier DNA methylation data due (older platform)	E017	LNG.IMR90	1	IMR90 fetal lung fibroblasts Cell Line	LUNG	CellLine
2		E002	ESC.WA7	-1	ES-WA7 Cells	ESC	PrimaryCulture
3		E008	ESC.H9	1	H9 Cells	ESC	PrimaryCulture
4	Bad Methylation dataset. This epigenome is NOT to be used for DNA methylation analysis. All other data types are fine to use.	E001	ESC.I3	1	ES-I3 Cells	ESC	PrimaryCulture
5		E015	ESC.HUES6	1	HUES6 Cells	ESC	PrimaryCulture
6		E014	ESC.HUES48	1	HUES48 Cells	ESC	PrimaryCulture
7		E016	ESC.HUES64	1	HUES64 Cells	ESC	PrimaryCulture
8	Outlier DNA methylation dataset (older platform)	E003	ESC.H1	1	H1 Cells	ESC	PrimaryCulture
9		E024	ESC.4STAR	1	ES-UCSF4 Cells	ESC	PrimaryCulture
10		E020	IPSC.20B	0	iPS-20b Cells	IPSC	PrimaryCulture

Showing 1 to 10 of 127 entries

Previous1 2 3 4 5…13Next

2.2 Metadata on the inferred states

The chromatin states and standard colorings used are enumerated in states_25:

data(states_25)
datatable(states_25)

Show entries

Search:

	STATENO.	MNEMONIC	DESCRIPTION	COLOR.NAME	COLOR.CODE	rgb
1	1	TssA	Active TSS	Red	255,0,0	#FE0000
2	2	PromU	Promoter Upstream TSS	Orange Red	255,69,0	#FE4500
3	3	PromD1	Promoter Downstream TSS 1	Orange Red	255,69,0	#FE4500
4	4	PromD2	Promoter Downstream TSS 2	Orange Red	255,69,0	#FE4500
5	5	Tx5'	Transcribed - 5' preferential	Green	0,128,0	#008000
6	6	Tx	Strong transcription	Green	0,128,0	#008000
7	7	Tx3'	Transcribed - 3' preferential	Green	0,128,0	#008000
8	8	TxWk	Weak transcription	Lighter Green	0,150,0	#009500
9	9	TxReg	Transcribed Regulatory (Prom/Enh)	Electric Lime	194,225,5	#C1E005
10	10	TxEnh5'	Transcribed 5' preferential and Enh	Electric Lime	194,225,5	#C1E005

Showing 1 to 10 of 25 entries

Previous1 2 3Next

The emission parameters of the 25 state model are depicted in the supplementary Figure 33 of Ernst and Kellis:

library(png)
im = readPNG(system.file("pngs/emparms.png", package="erma"))
grid.raster(im)

3 Managing access to imputed chromatin states for a set of cell types

I have retrieved a modest number of roadmap bed files with ChromImpute mnemonic labeling of chromatin by states. These can be managed with an ErmaSet instance, a trivial extension of GenomicFiles class. The cellTypes method yields a character vector. The colData component has full metadata on the cell lines available.

ermaset = makeErmaSet()

## NOTE: input data had non-ASCII characters replaced by ' '.

ermaset

## ErmaSet object with 0 ranges and 31 files: 
## files: E002_25_imputed12marks_mnemonics.bed.gz, E003_25_imputed12marks_mnemonics.bed.gz, ..., E088_25_imputed12marks_mnemonics.bed.gz, E096_25_imputed12marks_mnemonics.bed.gz 
## detail: use files(), rowRanges(), colData(), ... 
## cellTypes() for type names; data(short_celltype) for abbr.

cellTypes(ermaset)[1:5]

## [1] "ES-WA7 Cells"                         
## [2] "H1 Cells"                             
## [3] "iPS DF 6.9 Cells"                     
## [4] "Primary B cells from peripheral blood"
## [5] "Primary T cells from cord blood"

datatable(as.data.frame(colData(ermaset)[,kpc]))

Show entries

Search:

	Comments	Epigenome.ID..EID.	Epigenome.Mnemonic	Quality.Rating	Standardized.Epigenome.name	ANATOMY	TYPE
E002		E002	ESC.WA7	-1	ES-WA7 Cells	ESC	PrimaryCulture
E003	Outlier DNA methylation dataset (older platform)	E003	ESC.H1	1	H1 Cells	ESC	PrimaryCulture
E021		E021	IPSC.DF.6.9	1	iPS DF 6.9 Cells	IPSC	PrimaryCulture
E032		E032	BLD.CD19.PPC	0	Primary B cells from peripheral blood	BLOOD	PrimaryCell
E033		E033	BLD.CD3.CPC	1	Primary T cells from cord blood	BLOOD	PrimaryCell
E034		E034	BLD.CD3.PPC	1	Primary T cells from peripheral blood	BLOOD	PrimaryCell
E035		E035	BLD.CD34.PC	1	Primary hematopoietic stem cells	BLOOD	PrimaryCell
E037		E037	BLD.CD4.MPC	0	Primary T helper memory cells from peripheral blood 2	BLOOD	PrimaryCell
E038		E038	BLD.CD4.NPC	0	Primary T helper naive cells from peripheral blood	BLOOD	PrimaryCell
E040		E040	BLD.CD4.CD25M.CD45RO.MPC	0	Primary T helper memory cells from peripheral blood 1	BLOOD	PrimaryCell

Showing 1 to 10 of 31 entries

Previous1 2 3 4Next

4 Enumerating states in the vicinity of a gene, across cell types

We form a GRanges representing 50kb upstream of IL33.

uil33 = flank(resize(range(genemodel("IL33")), 1), width=50000)

## 'select()' returned 1:many mapping between keys and columns

uil33

## GRanges object with 1 range and 0 metadata columns:
##       seqnames             ranges strand
##          <Rle>          <IRanges>  <Rle>
##   [1]     chr9 [6165786, 6215785]      +
##   -------
##   seqinfo: 1 sequence from hg19 genome

Bind this to the ErmaSet instance.

rowRanges(ermaset) = uil33  
ermaset

## ErmaSet object with 1 ranges and 31 files: 
## files: E002_25_imputed12marks_mnemonics.bed.gz, E003_25_imputed12marks_mnemonics.bed.gz, ..., E088_25_imputed12marks_mnemonics.bed.gz, E096_25_imputed12marks_mnemonics.bed.gz 
## detail: use files(), rowRanges(), colData(), ... 
## cellTypes() for type names; data(short_celltype) for abbr.

Now query the files for cell-specific states in this interval.

library(BiocParallel)
register(MulticoreParam(workers=2))  # reduce will be done according to registered bpparam; lapply just extracts
suppressWarnings({
csstates = lapply(reduceByFile(ermaset, MAP=function(range, file) {
  imp = import(file, which=range, genome=genome(range)[1])
  seqlevels(imp) = seqlevels(range)
  imp$rgb = erma:::rgbByState(imp$name)
  imp
}), "[[", 1) 
})
tys = cellTypes(ermaset)  # need to label with cell types
csstates = lapply(1:length(csstates), function(x) {
   csstates[[x]]$celltype = tys[x]
   csstates[[x]]
   })
csstates[1:2]

## [[1]]
## GRanges object with 15 ranges and 3 metadata columns:
##        seqnames             ranges strand |        name         rgb
##           <Rle>          <IRanges>  <Rle> | <character> <character>
##    [1]     chr9 [6161801, 6166600]      * |    25_Quies     #FEFEFE
##    [2]     chr9 [6166601, 6166800]      * |    17_EnhW2     #FEFE00
##    [3]     chr9 [6166801, 6171200]      * |    25_Quies     #FEFEFE
##    [4]     chr9 [6171201, 6171800]      * |    17_EnhW2     #FEFE00
##    [5]     chr9 [6171801, 6172000]      * |    16_EnhW1     #FEFE00
##    ...      ...                ...    ... .         ...         ...
##   [11]     chr9 [6183401, 6197400]      * |    25_Quies     #FEFEFE
##   [12]     chr9 [6197401, 6197600]      * |    19_DNase     #FEFE66
##   [13]     chr9 [6197601, 6208800]      * |    25_Quies     #FEFEFE
##   [14]     chr9 [6208801, 6211000]      * |      21_Het     #8990CF
##   [15]     chr9 [6211001, 6217800]      * |    25_Quies     #FEFEFE
##            celltype
##         <character>
##    [1] ES-WA7 Cells
##    [2] ES-WA7 Cells
##    [3] ES-WA7 Cells
##    [4] ES-WA7 Cells
##    [5] ES-WA7 Cells
##    ...          ...
##   [11] ES-WA7 Cells
##   [12] ES-WA7 Cells
##   [13] ES-WA7 Cells
##   [14] ES-WA7 Cells
##   [15] ES-WA7 Cells
##   -------
##   seqinfo: 1 sequence from hg19 genome
## 
## [[2]]
## GRanges object with 14 ranges and 3 metadata columns:
##        seqnames             ranges strand |        name         rgb
##           <Rle>          <IRanges>  <Rle> | <character> <character>
##    [1]     chr9 [6161801, 6166600]      * |    25_Quies     #FEFEFE
##    [2]     chr9 [6166601, 6166800]      * |    17_EnhW2     #FEFE00
##    [3]     chr9 [6166801, 6171200]      * |    25_Quies     #FEFEFE
##    [4]     chr9 [6171201, 6173000]      * |    17_EnhW2     #FEFE00
##    [5]     chr9 [6173001, 6175400]      * |      21_Het     #8990CF
##    ...      ...                ...    ... .         ...         ...
##   [10]     chr9 [6183401, 6197400]      * |    25_Quies     #FEFEFE
##   [11]     chr9 [6197401, 6197600]      * |    19_DNase     #FEFE66
##   [12]     chr9 [6197601, 6209000]      * |    25_Quies     #FEFEFE
##   [13]     chr9 [6209001, 6211000]      * |      21_Het     #8990CF
##   [14]     chr9 [6211001, 6218200]      * |    25_Quies     #FEFEFE
##           celltype
##        <character>
##    [1]    H1 Cells
##    [2]    H1 Cells
##    [3]    H1 Cells
##    [4]    H1 Cells
##    [5]    H1 Cells
##    ...         ...
##   [10]    H1 Cells
##   [11]    H1 Cells
##   [12]    H1 Cells
##   [13]    H1 Cells
##   [14]    H1 Cells
##   -------
##   seqinfo: 1 sequence from hg19 genome

This sort of code underlies the csProfile utility to visualize variation in state assignments in promoter regions for various genes.

csProfile(ermaset[,1:5], symbol="CD28", useShiny=FALSE)

## 'select()' returned 1:many mapping between keys and columns

## Warning: executing %dopar% sequentially: no parallel backend registered

## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.

Set useShiny to TRUE to permit interactive selection of region to visualize.

erma: epigenomics road map adventures

August 2015

Contents

1 Introduction

2 Metadata on the ChromImpute archive

2.1 Sample metadata

2.2 Metadata on the inferred states

3 Managing access to imputed chromatin states for a set of cell types

4 Enumerating states in the vicinity of a gene, across cell types