When comparing samples, it is common to perform the task of identifying overlapping loops among two or more sets of genomic interactions. Traditionally, this is achieved through the use of visualizations such as vennDiagram
or UpSet
plots. However, it is frequently observed that the total count displayed in these plots does not match the original counts for each individual list. The reason behind this discrepancy is that a single overlap may encompass multiple interactions for one or more samples. This issue is extensively discussed in the realm of overlapping caller for ChIP-Seq peaks.
The hicVennDiagram aims to provide a easy to use tool for overlapping interactions calculation and proper visualization methods. The hicVennDiagram generates plots specifically crafted to eliminate the deceptive visual representation caused by the counts method.
Here is an example using hicVennDiagram with 3 files in BEDPE
format.
First, install hicVennDiagram and other packages required to run the examples.
library(BiocManager)
BiocManager::install("hicVennDiagram")
library(hicVennDiagram)
library(ggplot2)
# list the BEDPE files
file_folder <- system.file("extdata",
package = "hicVennDiagram",
mustWork = TRUE)
file_list <- dir(file_folder, pattern = ".bedpe", full.names = TRUE)
names(file_list) <- sub(".bedpe", "", basename(file_list))
basename(file_list)
## [1] "group1.bedpe" "group2.bedpe" "group3.bedpe"
venn <- vennCount(file_list)
## upset plot
## temp fix for https://github.com/krassowski/complex-upset/issues/195
upset_themes_fix <- lapply(ComplexUpset::upset_themes, function(.ele){
lapply(.ele, function(.e){
do.call(theme, .e[names(.e) %in% names(formals(theme))])
})
})
upsetPlot(venn,
themes = upset_themes_fix)
## venn plot
vennPlot(venn)
## use browser to adjust the text position, and shape colors.
browseVenn(vennPlot(venn))
vennCount
The vennCount
function borrows the power of InteractionSet:findOverlaps
to calculate the overlaps and then summarizes the results for each category. Users may want to try different combinations of maxgap
and minoverlap
parameters to calculate the overlapping loops.
venn <- vennCount(file_list, maxgap=50000, FUN = max) # by default FUN = min
upsetPlot(venn, label_all=list(
na.rm = TRUE,
color = 'black',
alpha = .9,
label.padding = unit(0.1, "lines")
),
themes = upset_themes_fix)
ChIPpeakAnno
library(ChIPpeakAnno)
bed <- system.file("extdata", "MACS_output.bed", package="ChIPpeakAnno")
gr1 <- toGRanges(bed, format="BED", header=FALSE)
gff <- system.file("extdata", "GFF_peaks.gff", package="ChIPpeakAnno")
gr2 <- toGRanges(gff, format="GFF", header=FALSE, skip=3)
ol <- findOverlapsOfPeaks(gr1, gr2)
overlappingPeaksToVennTable <- function(.ele){
.venn <- .ele$venn_cnt
k <- which(colnames(.venn)=="Counts")
rownames(.venn) <- apply(.venn[, seq.int(k-1)], 1, paste, collapse="")
colnames(.venn) <- sub("count.", "", colnames(.venn))
vennTable(combinations=.venn[, seq.int(k-1)],
counts=.venn[, k],
vennCounts=.venn[, seq.int(ncol(.venn))[-seq.int(k)]])
}
venn <- overlappingPeaksToVennTable(ol)
vennPlot(venn)
## or you can simply try vennPlot(vennCount(c(bed, gff)))
upsetPlot(venn, themes = upset_themes_fix)
## change the font size of labels and numbers
updated_theme <- ComplexUpset::upset_modify_themes(
## get help by vignette('Examples_R', package = 'ComplexUpset')
list('intersections_matrix'=
ggplot2::theme(
## font size of label: gr1/gr2
axis.text.y=ggplot2::element_text(size=24),
## font size of label `group`
axis.title.x=ggplot2::element_text(size=24)),
'overall_sizes'=
ggplot2::theme(
## font size of x-axis 0-200
axis.text=ggplot2::element_text(size=12),
## font size of x-label `Set size`
axis.title=ggplot2::element_text(size=18)),
'Intersection size'=
ggplot2::theme(
## font size of y-axis 0-150
axis.text=ggplot2::element_text(size=20),
## font size of y-label `Intersection size`
axis.title=ggplot2::element_text(size=16)
),
'default'=ggplot2::theme_minimal())
)
updated_theme <- lapply(updated_theme, function(.ele){
lapply(.ele, function(.e){
do.call(theme, .e[names(.e) %in% names(formals(theme))])
})
})
upsetPlot(venn,
label_all=list(na.rm = TRUE, color = 'gray30', alpha = .7,
label.padding = unit(0.1, "lines"),
size = 8 #control the font size of the individual num
),
base_annotations=list('Intersection size'=
ComplexUpset::intersection_size(
## font size of counts in the bar-plot
text = list(size=6)
)),
themes = updated_theme
)
sessionInfo()
R version 4.4.0 beta (2024-04-15 r86425) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 22.04.4 LTS
Matrix products: default BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York tzcode source: system (glibc)
attached base packages: [1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages: [1] ChIPpeakAnno_3.38.0 ggplot2_3.5.1 GenomicRanges_1.56.0 [4] GenomeInfoDb_1.40.0 IRanges_2.38.0 S4Vectors_0.42.0
[7] BiocGenerics_0.50.0 hicVennDiagram_1.2.0
loaded via a namespace (and not attached): [1] eulerr_7.0.2 jsonlite_1.8.8
[3] magrittr_2.0.3 GenomicFeatures_1.56.0
[5] farver_2.1.1 rmarkdown_2.26
[7] BiocIO_1.14.0 zlibbioc_1.50.0
[9] ragg_1.3.0 vctrs_0.6.5
[11] multtest_2.60.0 memoise_2.0.1
[13] Rsamtools_2.20.0 RCurl_1.98-1.14
[15] htmltools_0.5.8.1 S4Arrays_1.4.0
[17] progress_1.2.3 lambda.r_1.2.4
[19] curl_5.2.1 ComplexUpset_1.3.3
[21] SparseArray_1.4.0 sass_0.4.9
[23] bslib_0.7.0 htmlwidgets_1.6.4
[25] plyr_1.8.9 httr2_1.0.1
[27] futile.options_1.0.1 cachem_1.0.8
[29] GenomicAlignments_1.40.0 lifecycle_1.0.4
[31] pkgconfig_2.0.3 Matrix_1.7-0
[33] R6_2.5.1 fastmap_1.1.1
[35] GenomeInfoDbData_1.2.12 MatrixGenerics_1.16.0
[37] digest_0.6.35 colorspace_2.1-0
[39] patchwork_1.2.0 AnnotationDbi_1.66.0
[41] regioneR_1.36.0 textshaping_0.3.7
[43] RSQLite_2.3.6 filelock_1.0.3
[45] labeling_0.4.3 fansi_1.0.6
[47] httr_1.4.7 polyclip_1.10-6
[49] abind_1.4-5 compiler_4.4.0
[51] bit64_4.0.5 withr_3.0.0
[53] BiocParallel_1.38.0 DBI_1.2.2
[55] highr_0.10 biomaRt_2.60.0
[57] MASS_7.3-60.2 rappdirs_0.3.3
[59] DelayedArray_0.30.0 rjson_0.2.21
[61] tools_4.4.0 glue_1.7.0
[63] VennDiagram_1.7.3 restfulr_0.0.15
[65] InteractionSet_1.32.0 grid_4.4.0
[67] polylabelr_0.2.0 reshape2_1.4.4
[69] generics_0.1.3 BSgenome_1.72.0
[71] gtable_0.3.5 tidyr_1.3.1
[73] ensembldb_2.28.0 data.table_1.15.4
[75] hms_1.1.3 xml2_1.3.6
[77] utf8_1.2.4 XVector_0.44.0
[79] pillar_1.9.0 stringr_1.5.1
[81] splines_4.4.0 dplyr_1.1.4
[83] BiocFileCache_2.12.0 lattice_0.22-6
[85] survival_3.6-4 rtracklayer_1.64.0
[87] bit_4.0.5 universalmotif_1.22.0
[89] tidyselect_1.2.1 RBGL_1.80.0
[91] Biostrings_2.72.0 knitr_1.46
[93] ProtGenerics_1.36.0 SummarizedExperiment_1.34.0 [95] svglite_2.1.3 futile.logger_1.4.3
[97] xfun_0.43 Biobase_2.64.0
[99] matrixStats_1.3.0 stringi_1.8.3
[101] UCSC.utils_1.0.0 lazyeval_0.2.2
[103] yaml_2.3.8 evaluate_0.23
[105] codetools_0.2-20 tibble_3.2.1
[107] BiocManager_1.30.22 graph_1.82.0
[109] cli_3.6.2 systemfonts_1.0.6
[111] munsell_0.5.1 jquerylib_0.1.4
[113] Rcpp_1.0.12 dbplyr_2.5.0
[115] png_0.1-8 XML_3.99-0.16.1
[117] parallel_4.4.0 blob_1.2.4
[119] prettyunits_1.2.0 AnnotationFilter_1.28.0
[121] bitops_1.0-7 pwalign_1.0.0
[123] scales_1.3.0 purrr_1.0.2
[125] crayon_1.5.2 BiocStyle_2.32.0
[127] rlang_1.1.3 KEGGREST_1.44.0
[129] formatR_1.14