SEtools 1.0.0
The SEtools package is a set of convenience functions for the Bioconductor class SummarizedExperiment. It facilitates merging, melting, and plotting SummarizedExperiment
objects.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SEtools")
Or, until the new bioconductor release:
devtools::install_github("plger/SEtools")
To showcase the main functions, we will use an example object which contains (a subset of) whole-hippocampus RNAseq of mice after different stressors:
suppressPackageStartupMessages({
library(SummarizedExperiment)
library(SEtools)
})
data("SE", package="SEtools")
SE
## class: SummarizedExperiment
## dim: 100 20
## metadata(0):
## assays(2): counts logcpm
## rownames(100): Egr1 Nr4a1 ... CH36-200G6.4 Bhlhe22
## rowData names(2): meanCPM meanTPM
## colnames(20): HC.Homecage.1 HC.Homecage.2 ... HC.Swim.4 HC.Swim.5
## colData names(2): Region Condition
This is taken from Floriou-Servou et al., Biol Psychiatry 2018.
The sehm
function simplifies the generation of heatmaps from SummarizedExperiment
. It uses pheatmap, so any argument supported by it can in principle be passed:
g <- c("Egr1", "Nr4a1", "Fos", "Egr2", "Sgk1", "Arc", "Dusp1", "Fosb", "Sik1")
sehm(SE, genes=g)
## Using assay logcpm
sehm(SE, genes=g, scale="row")
## Using assay logcpm
Annotation from the object’s rowData
and colData
can be plotted simply by specifying the column name (some will be shown by default if found):
sehm(SE, genes=g, scale="row", anno_rows="meanTPM")
## Using assay logcpm
These can also be used to create gaps:
sehm(SE, genes=g, scale="row", anno_rows="meanTPM", gaps_at="Condition")
## Using assay logcpm
The specific assay to use for plotting can be specified with the assayName
argument.
By default, rows are sorted not with hierarchical clustering, but from the angle on a MDS plot, which tends to give nicer results than bottom-up hierarchical clustering. This can be disabled using sortRowsOn=NULL
or cluster_rows=TRUE
(to avoid any row reordering and use the order given, use sortRowsOn=NULL, cluster_rows=FALSE
). Column clustering is disabled by default, but this can be changed with cluster_cols=TRUE
.
It is common to cluster features into groups, and such a clustering can be used simultaneously with row sorting using the toporder
argument. For instance:
lfcs <- assays(SE)$logcpm-rowMeans(assays(SE)$logcpm[,which(SE$Condition=="Homecage")])
rowData(SE)$cluster <- as.character(kmeans(lfcs,4)$cluster)
sehm(SE, scale="row", anno_rows="cluster", toporder="cluster", gaps_at="Condition")
## Using assay logcpm
Heatmaps from multiple SE can be created either by merging the objects (see below), or using the crossHm
function, which uses the ComplexHeatmap pacakge:
crossHm( list(se1=SE, se2=SE), g,
anno_colors = list( Condition=c( Homecage="green",
Handling="orange",
Restraint="red",
Swim="blue")
)
)
## Using assay logcpm
## Using assay logcpm
For some arguments (for instance colors), if they are not specified in the function call, SEtools
will try to see whether the corresponding global options have been set, before using default colors. This means that if, in the context of a given project, the same colors are repeatedly being used, they can be specified a single time, and all subsequent plots will be affected:
options("SEtools_def_hmcols"=c("white","grey","black"))
ancols <- list( Condition=c( Homecage="green",
Handling="orange",
Restraint="red",
Swim="blue" ) )
options("SEtools_def_anno_colors"=ancols)
sehm(SE, g, do.scale = TRUE)
## Using assay logcpm
At the moment, the following arguments can be set as global options:
assayName
, hmcols
, anno_columns
, anno_rows
, anno_colors
, gaps_at
, breaks
.
Options must be set with the prefix SEtools_def_
, followed by the name of the argument.
You may use resetAllSEtoolsOptions()
to remove all global options relative to the package.
se1 <- SE[,1:10]
se2 <- SE[,11:20]
se3 <- mergeSEs( list(se1=se1, se2=se2) )
se3
## class: SummarizedExperiment
## dim: 100 20
## metadata(0):
## assays(2): counts logcpm
## rownames(100): AC139063.2 Actr6 ... Zfp667 Zfp930
## rowData names(3): meanCPM meanTPM cluster
## colnames(20): se1.HC.Homecage.1 se1.HC.Homecage.2 ... se2.HC.Swim.4
## se2.HC.Swim.5
## colData names(3): Dataset Region Condition
All assays were merged, along with rowData and colData slots.
By default, row z-scores are calculated for each object when merging. This can be prevented with:
se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE)
If more than one assay is present, one can specify a different scaling behavior for each assay:
se3 <- mergeSEs( list(se1=se1, se2=se2), use.assays=c("counts", "logcpm"), do.scale=c(FALSE, TRUE))
To facilitate plotting features with ggplot2, the meltSE
function combines assay values along with row/column data:
d <- meltSE(SE, genes=g[1:4])
head(d)
## feature sample Region Condition counts logcpm
## 1 Egr1 HC.Homecage.1 HC Homecage 1581.0 4.4284969
## 2 Nr4a1 HC.Homecage.1 HC Homecage 750.0 3.6958917
## 3 Fos HC.Homecage.1 HC Homecage 91.4 1.7556317
## 4 Egr2 HC.Homecage.1 HC Homecage 15.1 0.5826999
## 5 Egr1 HC.Homecage.2 HC Homecage 1423.0 4.4415828
## 6 Nr4a1 HC.Homecage.2 HC Homecage 841.0 3.9237691
suppressPackageStartupMessages(library(ggplot2))
ggplot(d, aes(Condition, counts)) + geom_violin() + facet_wrap(~feature, scale="free")
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ggplot2_3.2.1 SEtools_1.0.0
## [3] SummarizedExperiment_1.16.0 DelayedArray_0.12.0
## [5] BiocParallel_1.20.0 matrixStats_0.55.0
## [7] Biobase_2.46.0 GenomicRanges_1.38.0
## [9] GenomeInfoDb_1.22.0 IRanges_2.20.0
## [11] S4Vectors_0.24.0 BiocGenerics_0.32.0
## [13] BiocStyle_2.14.0
##
## loaded via a namespace (and not attached):
## [1] viridis_0.5.1 viridisLite_0.3.0 foreach_1.4.7
## [4] gtools_3.8.1 assertthat_0.2.1 highr_0.8
## [7] BiocManager_1.30.9 GenomeInfoDbData_1.2.2 yaml_2.2.0
## [10] pillar_1.4.2 lattice_0.20-38 glue_1.3.1
## [13] digest_0.6.22 RColorBrewer_1.1-2 XVector_0.26.0
## [16] colorspace_1.4-1 htmltools_0.4.0 Matrix_1.2-17
## [19] pkgconfig_2.0.3 pheatmap_1.0.12 GetoptLong_0.1.7
## [22] bookdown_0.14 zlibbioc_1.32.0 purrr_0.3.3
## [25] scales_1.0.0 gdata_2.18.0 tibble_2.1.3
## [28] withr_2.1.2 lazyeval_0.2.2 magrittr_1.5
## [31] crayon_1.3.4 evaluate_0.14 MASS_7.3-51.4
## [34] gplots_3.0.1.1 tools_3.6.1 registry_0.5-1
## [37] data.table_1.12.6 GlobalOptions_0.1.1 ComplexHeatmap_2.2.0
## [40] stringr_1.4.0 munsell_0.5.0 cluster_2.1.0
## [43] compiler_3.6.1 caTools_1.17.1.2 rlang_0.4.1
## [46] grid_3.6.1 RCurl_1.95-4.12 iterators_1.0.12
## [49] rjson_0.2.20 circlize_0.4.8 labeling_0.3
## [52] bitops_1.0-6 rmarkdown_1.16 gtable_0.3.0
## [55] codetools_0.2-16 TSP_1.1-7 R6_2.4.0
## [58] gridExtra_2.3 seriation_1.2-8 knitr_1.25
## [61] dplyr_0.8.3 clue_0.3-57 KernSmooth_2.23-16
## [64] dendextend_1.12.0 shape_1.4.4 stringi_1.4.3
## [67] Rcpp_1.0.2 png_0.1-7 gclus_1.3.2
## [70] tidyselect_0.2.5 xfun_0.10