Here, we perform a window-based DB analysis to identify regions of differential H3K27me3 enrichment in mouse lung epithelium. H3K27me3 is associated with transcriptional repression and is usually observed with broad regions of enrichment. The aim of this workflow is to demonstrate how to analyze these broad marks with csaw, especially at variable resolutions with multiple window sizes. We use H3K27me3 ChIP-seq data from a study comparing wild-type (WT) and Ezh2 knock-out (KO) animals (Galvis et al. 2015), contains two biological replicates for each genotype. We download BAM files and indices using chipseqDBData.
library(chipseqDBData)
h3k27me3data <- H3K27me3Data()
h3k27me3data
## DataFrame with 4 rows and 3 columns
## Name Description
## <character> <character>
## 1 SRR1274188 control H3K27me3 (1)
## 2 SRR1274189 control H3K27me3 (2)
## 3 SRR1274190 Ezh2 knock-out H3K27me3 (1)
## 4 SRR1274191 Ezh2 knock-out H3K27me3 (2)
## Path
## <character>
## 1 /tmp/RtmpYT6shT/file5dc4b70252c/SRR1274188.bam
## 2 /tmp/RtmpYT6shT/file5dc4b70252c/SRR1274189.bam
## 3 /tmp/RtmpYT6shT/file5dc4b70252c/SRR1274190.bam
## 4 /tmp/RtmpYT6shT/file5dc4b70252c/SRR1274191.bam
We check some mapping statistics with Rsamtools.
library(Rsamtools)
diagnostics <- list()
for (bam in h3k27me3data$Path) {
total <- countBam(bam)$records
mapped <- countBam(bam, param=ScanBamParam(
flag=scanBamFlag(isUnmapped=FALSE)))$records
marked <- countBam(bam, param=ScanBamParam(
flag=scanBamFlag(isUnmapped=FALSE, isDuplicate=TRUE)))$records
diagnostics[[basename(bam)]] <- c(Total=total, Mapped=mapped, Marked=marked)
}
diag.stats <- data.frame(do.call(rbind, diagnostics))
diag.stats$Prop.mapped <- diag.stats$Mapped/diag.stats$Total*100
diag.stats$Prop.marked <- diag.stats$Marked/diag.stats$Mapped*100
diag.stats
## Total Mapped Marked Prop.mapped Prop.marked
## SRR1274188.bam 24445704 18605240 2769679 76.10842 14.88655
## SRR1274189.bam 21978677 17014171 2069203 77.41217 12.16164
## SRR1274190.bam 26910067 18606352 4361393 69.14272 23.44034
## SRR1274191.bam 21354963 14092438 4392541 65.99140 31.16949
We construct a readParam
object to standardize the parameter settings in this analysis.
For consistency with the original analysis by Galvis et al. (2015),
we will define the blacklist using the predicted repeats from the RepeatMasker software.
library(BiocFileCache)
bfc <- BiocFileCache("local", ask=FALSE)
black.path <- bfcrpath(bfc, file.path("http://hgdownload.cse.ucsc.edu",
"goldenPath/mm10/bigZips/chromOut.tar.gz"))
tmpdir <- tempfile()
dir.create(tmpdir)
untar(black.path, exdir=tmpdir)
# Iterate through all chromosomes.
collected <- list()
for (x in list.files(tmpdir, full=TRUE)) {
f <- list.files(x, full=TRUE, pattern=".fa.out")
to.get <- vector("list", 15)
to.get[[5]] <- "character"
to.get[6:7] <- "integer"
collected[[length(collected)+1]] <- read.table(f, skip=3,
stringsAsFactors=FALSE, colClasses=to.get)
}
collected <- do.call(rbind, collected)
blacklist <- GRanges(collected[,1], IRanges(collected[,2], collected[,3]))
blacklist
## GRanges object with 5147737 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 3000001-3002128 *
## [2] chr1 3003153-3003994 *
## [3] chr1 3003994-3004054 *
## [4] chr1 3004041-3004206 *
## [5] chr1 3004207-3004270 *
## ... ... ... ...
## [5147733] chrY_JH584303_random 152557-155890 *
## [5147734] chrY_JH584303_random 155891-156883 *
## [5147735] chrY_JH584303_random 157070-157145 *
## [5147736] chrY_JH584303_random 157909-157960 *
## [5147737] chrY_JH584303_random 157953-158099 *
## -------
## seqinfo: 66 sequences from an unspecified genome; no seqlengths
We set the minimum mapping quality score to 10 to remove poorly or non-uniquely aligned reads.
library(csaw)
param <- readParam(minq=10, discard=blacklist)
param
## Extracting reads in single-end mode
## Duplicate removal is turned off
## Minimum allowed mapping score is 10
## Reads are extracted from both strands
## No restrictions are placed on read extraction
## Reads in 5147737 regions will be discarded
Reads are then counted into sliding windows using csaw (Lun and Smyth 2015). At this stage, we use a large 2 kbp window to reflect the fact that H3K27me3 exhibits broad enrichment. This allows us to increase the size of the counts and thus detection power, without having to be concerned about loss of genomic resolution to detect sharp binding events.
win.data <- windowCounts(h3k27me3data$Path, param=param, width=2000,
spacing=500, ext=200)
win.data
## class: RangedSummarizedExperiment
## dim: 4614717 4
## metadata(6): spacing width ... param final.ext
## assays(1): counts
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(4): bam.files totals ext rlen
We use spacing=500
to avoid redundant work when sliding a large window across the genome.
The default spacing of 50 bp would result in many windows with over 90% overlap in their positions,
increasing the amount of computational work without a meaningful improvement in resolution.
We also set the fragment length to 200 bp based on experimental knowledge of the size selection procedure.
Unlike the previous analyses, the fragment length cannot easily estimated here due to weak strand bimodality of diffuse marks.
As in the CBP example, we normalize for composition biases resulting from imbalanced DB between conditions (Lun and Smyth 2014).
We expect systematic DB in one direction as Ezh2 function (and thus some H3K27me3 deposition activity) is lost in the KO genotype.
We apply the TMM method (Robinson and Oshlack 2010) to counts for large 10 kbp bins,
and store the resulting normalization factors back in win.data
for use in the DB analysis with the window counts.
bins <- windowCounts(h3k27me3data$Path, bin=TRUE, width=10000, param=param)
win.data <- normFactors(bins, se.out=win.data)
(normfacs <- win.data$norm.factors)
## [1] 0.9946875 0.9984751 1.0080895 0.9987965
Figure 1 shows the effect of normalization on the relative enrichment between pairs of samples. We see that log-ratio of normalization factors passes through the centre of the cloud of background regions in each plot, indicating that the bias has been successfully identified and removed.
bin.ab <- scaledAverage(bins)
adjc <- calculateCPM(bins, use.norm.factors=FALSE)
par(cex.lab=1.5, mfrow=c(1,3))
smoothScatter(bin.ab, adjc[,1]-adjc[,2], ylim=c(-6, 6),
xlab="Average abundance", ylab="Log-ratio (1 vs 2)")
abline(h=log2(normfacs[1]/normfacs[4]), col="red")
smoothScatter(bin.ab, adjc[,1]-adjc[,3], ylim=c(-6, 6),
xlab="Average abundance", ylab="Log-ratio (1 vs 3)")
abline(h=log2(normfacs[2]/normfacs[4]), col="red")
smoothScatter(bin.ab, adjc[,1]-adjc[,4], ylim=c(-6, 6),
xlab="Average abundance", ylab="Log-ratio (1 vs 4)")
abline(h=log2(normfacs[3]/normfacs[4]), col="red")
We estimate the global background and remove low-abundance windows that are not enriched above this background level. To retain a window, we require it to have at least 2-fold more coverage than the average background. This is less stringent than the thresholds used in previous analyses, owing the weaker enrichment observed for diffuse marks.
filter.stat <- filterWindowsGlobal(win.data, bins)
min.fc <- 2
Figure 2 shows that chosen threshold is greater than the abundances of most bins in the genome, presumably those corresponding to background regions. This suggests that the filter will remove most windows lying within background regions.
hist(filter.stat$back.abundances, main="", breaks=50,
xlab="Background abundance (log2-CPM)")
threshold <- filter.stat$abundances[1] - filter.stat$filter[1] + log2(min.fc)
abline(v=threshold, col="red")
We filter out the majority of windows in background regions upon applying a modest fold-change threshold. This leaves a small set of relevant windows for further analysis.
keep <- filter.stat$filter > log2(min.fc)
summary(keep)
## Mode FALSE TRUE
## logical 4555355 59362
filtered.data <- win.data[keep,]
Counts for each window are modelled using edgeR (McCarthy, Chen, and Smyth 2012; Robinson, McCarthy, and Smyth 2010).
We first convert our RangedSummarizedExperiment
object into a DGEList
.
library(edgeR)
y <- asDGEList(filtered.data)
str(y)
## Formal class 'DGEList' [package "edgeR"] with 1 slot
## ..@ .Data:List of 2
## .. ..$ : int [1:59362, 1:4] 17 19 32 33 30 31 31 36 33 29 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:59362] "1" "2" "3" "4" ...
## .. .. .. ..$ : chr [1:4] "Sample1" "Sample2" "Sample3" "Sample4"
## .. ..$ :'data.frame': 4 obs. of 3 variables:
## .. .. ..$ group : Factor w/ 1 level "1": 1 1 1 1
## .. .. ..$ lib.size : int [1:4] 10602308 9971590 8605634 5659274
## .. .. ..$ norm.factors: num [1:4] 0.995 0.998 1.008 0.999
We then construct a design matrix for our experimental design. Here, we use a simple one-way layout with two groups of two replicates.
genotype <- h3k27me3data$Description
genotype[grep("control", genotype)] <- "wt"
genotype[grep("knock-out", genotype)] <- "ko"
genotype <- factor(genotype)
design <- model.matrix(~0+genotype)
colnames(design) <- levels(genotype)
design
## ko wt
## 1 0 1
## 2 0 1
## 3 1 0
## 4 1 0
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$genotype
## [1] "contr.treatment"
We estimate the negative binomial (NB) and quasi-likelihood (QL) dispersions for each window (Lund et al. 2012). We again observe an increasing trend in the NB dispersions with respect to abundance (Figure 3).
y <- estimateDisp(y, design)
summary(y$trended.dispersion)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.008979 0.009484 0.010091 0.010970 0.010446 0.025661
plotBCV(y)
The QL dispersions are strongly shrunk towards the trend (Figure 4), indicating that there is little variability in the dispersions across windows.
fit <- glmQLFit(y, design, robust=TRUE)
summary(fit$df.prior)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4548 54.7389 54.7389 54.7155 54.7389 54.7389
plotQLDisp(fit)
These results are consistent with the presence of systematic differences in enrichment between replicates, as discussed in the CBP analysis. Nonetheless, the samples separate by genotype in the MDS plot (Figure 5), which suggests that the downstream analysis will be able to detect DB regions.
plotMDS(cpm(y, log=TRUE), top=10000, labels=genotype,
col=c("red", "blue")[as.integer(genotype)])
We then test for DB between conditions in each window using the QL F-test.
contrast <- makeContrasts(wt-ko, levels=design)
res <- glmQLFTest(fit, contrast=contrast)
Consolidation allows the analyst to incorporate information from a range of different window sizes, each of which has a different trade-off between resolution and count size. This is particularly useful for broad marks where the width of an enriched region can be variable, as can the width of the differentially bound interval of an enriched region. To demonstrate, we repeat the entire analysis using 500 bp windows. Compared to our previous 2 kbp analysis, this provides greater spatial resolution at the cost of lowering the counts.
# Counting into 500 bp windows.
win.data2 <- windowCounts(h3k27me3data$Path, param=param, width=500,
spacing=100, ext=200)
# Re-using the same normalization factors.
win.data2$norm.factors <- win.data$norm.factors
# Filtering on abundance.
filter.stat2 <- filterWindows(win.data2, bins, type="global")
keep2 <- filter.stat2$filter > log2(min.fc)
filtered.data2 <- win.data2[keep2,]
# Performing the statistical analysis.
y2 <- asDGEList(filtered.data2)
y2 <- estimateDisp(y2, design)
fit2 <- glmQLFit(y2, design, robust=TRUE)
res2 <- glmQLFTest(fit2, contrast=contrast)
We consolidate the 500 bp analysis with our previous 2 kbp analysis using the mergeWindowsList()
function.
This clusters both sets of windows together into a single set of regions.
To limit chaining effects, each region cannot be more than 30 kbp in size.
merged <- mergeResultsList(list(filtered.data, filtered.data2),
tab.list=list(res$table, res2$table),
equiweight=TRUE, tol=100, merge.args=list(max.width=30000))
merged$regions
## GRanges object with 80389 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 3050001-3052500 *
## [2] chr1 3086401-3087200 *
## [3] chr1 3098001-3098800 *
## [4] chr1 3139501-3140100 *
## [5] chr1 3211501-3212600 *
## ... ... ... ...
## [80385] chrY 90766001-90769500 *
## [80386] chrY 90774801-90775900 *
## [80387] chrY 90788501-90789000 *
## [80388] chrY 90793101-90793700 *
## [80389] chrY 90797001-90814000 *
## -------
## seqinfo: 66 sequences from an unspecified genome
We compute combined \(p\)-values using Simes’ method for region-level FDR control (Simes 1986; Lun and Smyth 2014). This is done after weighting the contributions from the two sets of windows to ensure that the combined \(p\)-value for each region is not dominated by the analysis with more (smaller) windows.
tabcom <- merged$combined
is.sig <- tabcom$FDR <= 0.05
summary(is.sig)
## Mode FALSE TRUE
## logical 73760 6629
Ezh2 is one of the proteins responsible for depositing H3K27me3, so we might expect that most DB regions have increased enrichment in the WT condition. However, the opposite seems to be true here, which is an interesting result that may warrant further investigation.
table(tabcom$direction[is.sig])
##
## down up
## 4465 2164
We also obtain statistics for the window with the lowest \(p\)-value in each region.
The signs of the log-fold changes are largely consistent with the direction
numbers above.
tabbest <- merged$best
is.sig.pos <- (tabbest$logFC > 0)[is.sig]
summary(is.sig.pos)
## Mode FALSE TRUE
## logical 4465 2164
Finally, we save these results to file for future reference.
out.ranges <- merged$regions
mcols(out.ranges) <- data.frame(tabcom,
best.logFC=tabbest$logFC)
saveRDS(file="h3k27me3_results.rds", out.ranges)
We add annotation for each region using the detailRanges()
function,
as previously described.
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
library(org.Mm.eg.db)
anno <- detailRanges(out.ranges, orgdb=org.Mm.eg.db,
txdb=TxDb.Mmusculus.UCSC.mm10.knownGene)
mcols(out.ranges) <- cbind(mcols(out.ranges), anno)
We visualize one of the DB regions overlapping the Cdx2 gene to reproduce the results in Holik et al. (2015).
cdx2 <- genes(TxDb.Mmusculus.UCSC.mm10.knownGene)["12591"] # Cdx2 Entrez ID
cur.region <- subsetByOverlaps(out.ranges, cdx2)[1]
cur.region
## GRanges object with 1 range and 10 metadata columns:
## seqnames ranges strand | nWindows logFC.up logFC.down
## <Rle> <IRanges> <Rle> | <integer> <integer> <integer>
## [1] chr5 147299501-147322500 * | 138 114 2
## PValue FDR direction best.logFC
## <numeric> <numeric> <character> <numeric>
## [1] 2.71572404881912e-06 0.000506548758627048 up 2.61655768778193
## overlap left right
## <character> <character> <character>
## [1] Cdx2:-:PE,Urad:-:PE
## -------
## seqinfo: 66 sequences from an unspecified genome
We use Gviz (F. and R. 2016) to plot the results. As in the H3K9ac analysis, we set up some tracks to display genome coordinates and gene annotation.
library(Gviz)
gax <- GenomeAxisTrack(col="black", fontsize=15, size=2)
greg <- GeneRegionTrack(TxDb.Mmusculus.UCSC.mm10.knownGene, showId=TRUE,
geneSymbol=TRUE, name="", background.title="transparent")
symbols <- unlist(mapIds(org.Mm.eg.db, gene(greg), "SYMBOL",
"ENTREZID", multiVals = "first"))
symbol(greg) <- symbols[gene(greg)]
In Figure 6, we see enrichment of H3K27me3 in the WT condition at the Cdx2 locus. This is consistent with the known regulatory relationship between Ezh2 and Cdx2.
collected <- list()
lib.sizes <- filtered.data$totals/1e6
for (i in seq_along(h3k27me3data$Path)) {
reads <- extractReads(bam.file=h3k27me3data$Path[i], cur.region, param=param)
cov <- as(coverage(reads)/lib.sizes[i], "GRanges")
collected[[i]] <- DataTrack(cov, type="histogram", lwd=0, ylim=c(0,1),
name=h3k27me3data$Description[i], col.axis="black", col.title="black",
fill="darkgray", col.histogram=NA)
}
plotTracks(c(gax, collected, greg), chromosome=as.character(seqnames(cur.region)),
from=start(cur.region), to=end(cur.region))
In contrast, if we look at a constitutively expressed gene, such as Col1a2, we find little evidence of H3K27me3 deposition. Only a single window is retained after filtering on abundance, and this window shows no evidence of changes in H3K27me3 enrichment between WT and KO samples.
col1a2 <- genes(TxDb.Mmusculus.UCSC.mm10.knownGene)["12843"] # Col1a2 Entrez ID
cur.region <- subsetByOverlaps(out.ranges, col1a2)
cur.region
## GRanges object with 1 range and 10 metadata columns:
## seqnames ranges strand | nWindows logFC.up logFC.down
## <Rle> <IRanges> <Rle> | <integer> <integer> <integer>
## [1] chr6 4511701-4512200 * | 1 0 1
## PValue FDR direction best.logFC
## <numeric> <numeric> <character> <numeric>
## [1] 0.222135384805278 0.436596695658089 down -0.712940757651797
## overlap left right
## <character> <character> <character>
## [1] Col1a2:+:I Col1a2:+:907 Col1a2:+:161
## -------
## seqinfo: 66 sequences from an unspecified genome
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid stats4 parallel stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] Gviz_1.30.0
## [2] org.Mm.eg.db_3.10.0
## [3] TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0
## [4] GenomicFeatures_1.38.0
## [5] AnnotationDbi_1.48.0
## [6] edgeR_3.28.0
## [7] limma_3.42.0
## [8] csaw_1.20.0
## [9] SummarizedExperiment_1.16.0
## [10] DelayedArray_0.12.0
## [11] BiocParallel_1.20.0
## [12] matrixStats_0.55.0
## [13] Biobase_2.46.0
## [14] rtracklayer_1.46.0
## [15] BiocFileCache_1.10.0
## [16] dbplyr_1.4.2
## [17] Rsamtools_2.2.0
## [18] Biostrings_2.54.0
## [19] XVector_0.26.0
## [20] GenomicRanges_1.38.0
## [21] GenomeInfoDb_1.22.0
## [22] IRanges_2.20.0
## [23] S4Vectors_0.24.0
## [24] BiocGenerics_0.32.0
## [25] chipseqDBData_1.1.0
## [26] knitr_1.25
## [27] BiocStyle_2.14.0
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.4-1 biovizBase_1.34.0
## [3] htmlTable_1.13.2 base64enc_0.1-3
## [5] dichromat_2.0-0 rstudioapi_0.10
## [7] bit64_0.9-7 interactiveDisplayBase_1.24.0
## [9] codetools_0.2-16 splines_3.6.1
## [11] zeallot_0.1.0 Formula_1.2-3
## [13] cluster_2.1.0 shiny_1.4.0
## [15] BiocManager_1.30.9 compiler_3.6.1
## [17] httr_1.4.1 backports_1.1.5
## [19] assertthat_0.2.1 Matrix_1.2-17
## [21] fastmap_1.0.1 lazyeval_0.2.2
## [23] later_1.0.0 acepack_1.4.1
## [25] htmltools_0.4.0 prettyunits_1.0.2
## [27] tools_3.6.1 gtable_0.3.0
## [29] glue_1.3.1 GenomeInfoDbData_1.2.2
## [31] dplyr_0.8.3 rappdirs_0.3.1
## [33] Rcpp_1.0.2 vctrs_0.2.0
## [35] ExperimentHub_1.12.0 xfun_0.10
## [37] stringr_1.4.0 mime_0.7
## [39] ensembldb_2.10.0 statmod_1.4.32
## [41] XML_3.98-1.20 AnnotationHub_2.18.0
## [43] zlibbioc_1.32.0 scales_1.0.0
## [45] BSgenome_1.54.0 VariantAnnotation_1.32.0
## [47] ProtGenerics_1.18.0 hms_0.5.1
## [49] promises_1.1.0 AnnotationFilter_1.10.0
## [51] RColorBrewer_1.1-2 yaml_2.2.0
## [53] curl_4.2 memoise_1.1.0
## [55] gridExtra_2.3 ggplot2_3.2.1
## [57] biomaRt_2.42.0 rpart_4.1-15
## [59] latticeExtra_0.6-28 stringi_1.4.3
## [61] RSQLite_2.1.2 BiocVersion_3.10.1
## [63] highr_0.8 checkmate_1.9.4
## [65] rlang_0.4.1 pkgconfig_2.0.3
## [67] bitops_1.0-6 evaluate_0.14
## [69] lattice_0.20-38 purrr_0.3.3
## [71] GenomicAlignments_1.22.0 htmlwidgets_1.5.1
## [73] bit_1.1-14 tidyselect_0.2.5
## [75] magrittr_1.5 bookdown_0.14
## [77] R6_2.4.0 Hmisc_4.2-0
## [79] DBI_1.0.0 pillar_1.4.2
## [81] foreign_0.8-72 survival_2.44-1.1
## [83] RCurl_1.95-4.12 nnet_7.3-12
## [85] tibble_2.1.3 crayon_1.3.4
## [87] KernSmooth_2.23-16 rmarkdown_1.16
## [89] progress_1.2.2 locfit_1.5-9.1
## [91] data.table_1.12.6 blob_1.2.0
## [93] digest_0.6.22 xtable_1.8-4
## [95] httpuv_1.5.2 openssl_1.4.1
## [97] munsell_0.5.0 askpass_1.1
F., Hahne, and Ivanek R. 2016. “Visualizing Genomic Data Using Gviz and Bioconductor.” In Statistical Genomics: Methods and Protocols, edited by Ewy Mathé and Sean Davis, 335–51. New York, NY: Springer New York. https://doi.org/10.1007/978-1-4939-3578-9_16.
Galvis, L. A., A. Z. Holik, K. M. Short, J. Pasquet, A. T. Lun, M. E. Blewitt, I. M. Smyth, M. E. Ritchie, and M. L. Asselin-Labat. 2015. “Repression of Igf1 expression by Ezh2 prevents basal cell differentiation in the developing lung.” Development 142 (8):1458–69.
Holik, A. Z., L. A. Galvis, A. T. Lun, M. E. Ritchie, and M. L. Asselin-Labat. 2015. “Transcriptome and H3K27 tri-methylation profiling of Ezh2-deficient lung epithelium.” Genom Data 5 (September):346–51.
Lun, A. T., and G. K. Smyth. 2014. “De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly.” Nucleic Acids Res. 42 (11):e95.
———. 2015. “csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows.” Nucleic Acids Res. https://doi.org/10.1093/nar/gkv1191.
Lund, S. P., D. Nettleton, D. J. McCarthy, and G. K. Smyth. 2012. “Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates.” Stat. Appl. Genet. Mol. Biol. 11 (5):Article 8.
McCarthy, D. J., Y. Chen, and G. K. Smyth. 2012. “Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation.” Nucleic Acids Res. 40 (10):4288–97.
Robinson, M. D., D. J. McCarthy, and G. K. Smyth. 2010. “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.” Bioinformatics 26 (1):139–40.
Robinson, M. D., and A. Oshlack. 2010. “A scaling normalization method for differential expression analysis of RNA-seq data.” Genome Biol. 11 (3):R25.
Simes, R. J. 1986. “An Improved Bonferroni Procedure for Multiple Tests of Significance.” Biometrika 73 (3):751–54.