TreeSummarizedExperiment 2.2.0
TreeSummarizedExperiment
objectsMultiple TreeSummarizedExperiemnt
objects (TSE) can be combined by using
rbind
or cbind
. Here, we create a toy TreeSummarizedExperiment
object
using makeTSE()
(see ?makeTSE()
). As the tree in the row/column tree slot is
generated randomly using ape::rtree()
, set.seed()
is used to create
reproducible results.
library(TreeSummarizedExperiment)
set.seed(1)
# TSE: without the column tree
(tse_a <- makeTSE(include.colTree = FALSE))
## class: TreeSummarizedExperiment
## dim: 10 4
## metadata(0):
## assays(1): ''
## rownames(10): entity1 entity2 ... entity9 entity10
## rowData names(2): var1 var2
## colnames(4): sample1 sample2 sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (10 rows)
## rowTree: 1 phylo tree(s) (10 leaves)
## colLinks: NULL
## colTree: NULL
# combine two TSEs by row
(tse_aa <- rbind(tse_a, tse_a))
## class: TreeSummarizedExperiment
## dim: 20 4
## metadata(0):
## assays(1): ''
## rownames(20): entity1 entity2 ... entity9 entity10
## rowData names(2): var1 var2
## colnames(4): sample1 sample2 sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (20 rows)
## rowTree: 1 phylo tree(s) (10 leaves)
## colLinks: NULL
## colTree: NULL
The generated tse_aa
has 20 rows, which is two times of that in tse_a
. The row tree in tse_aa
is the same as that in tse_a
.
identical(rowTree(tse_aa), rowTree(tse_a))
## [1] TRUE
If we rbind
two TSEs (e.g., tse_a
and tse_b
) that have different row trees, the obtained TSE (e.g., tse_ab
) will have two row trees.
set.seed(2)
tse_b <- makeTSE(include.colTree = FALSE)
# different row trees
identical(rowTree(tse_a), rowTree(tse_b))
## [1] FALSE
# 2 phylo tree(s) in rowTree
(tse_ab <- rbind(tse_a, tse_b))
## class: TreeSummarizedExperiment
## dim: 20 4
## metadata(0):
## assays(1): ''
## rownames(20): entity1 entity2 ... entity9 entity10
## rowData names(2): var1 var2
## colnames(4): sample1 sample2 sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (20 rows)
## rowTree: 2 phylo tree(s) (20 leaves)
## colLinks: NULL
## colTree: NULL
In the row link data, the whichTree
column gives information about which tree the row is mapped to.
For tse_aa
, there is only one tree named as phylo
. However, for tse_ab
, there are two trees (phylo
and phylo.1
).
rowLinks(tse_aa)
## LinkDataFrame with 20 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity1 entity1 alias_1 1 TRUE phylo
## entity2 entity2 alias_2 2 TRUE phylo
## entity3 entity3 alias_3 3 TRUE phylo
## entity4 entity4 alias_4 4 TRUE phylo
## entity5 entity5 alias_5 5 TRUE phylo
## ... ... ... ... ... ...
## entity6 entity6 alias_6 6 TRUE phylo
## entity7 entity7 alias_7 7 TRUE phylo
## entity8 entity8 alias_8 8 TRUE phylo
## entity9 entity9 alias_9 9 TRUE phylo
## entity10 entity10 alias_10 10 TRUE phylo
rowLinks(tse_ab)
## LinkDataFrame with 20 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity1 entity1 alias_1 1 TRUE phylo
## entity2 entity2 alias_2 2 TRUE phylo
## entity3 entity3 alias_3 3 TRUE phylo
## entity4 entity4 alias_4 4 TRUE phylo
## entity5 entity5 alias_5 5 TRUE phylo
## ... ... ... ... ... ...
## entity6 entity6 alias_6 6 TRUE phylo.1
## entity7 entity7 alias_7 7 TRUE phylo.1
## entity8 entity8 alias_8 8 TRUE phylo.1
## entity9 entity9 alias_9 9 TRUE phylo.1
## entity10 entity10 alias_10 10 TRUE phylo.1
The name of trees can be accessed using rowTreeNames
. If the input TSEs use the same name for trees, rbind
will automatically create valid and unique names for trees by using make.names
. tse_a
and tse_b
both use phylo
as the name of their row trees. In tse_ab
, the row tree that originates from tse_b
is named as phylo.1
instead.
rowTreeNames(tse_aa)
## [1] "phylo"
rowTreeNames(tse_ab)
## [1] "phylo" "phylo.1"
# The original tree names in the input TSEs
rowTreeNames(tse_a)
## [1] "phylo"
rowTreeNames(tse_b)
## [1] "phylo"
Once the name of trees is changed, the column whichTree
in the rowLinks()
is updated accordingly.
rowTreeNames(tse_ab) <- paste0("tree", 1:2)
rowLinks(tse_ab)
## LinkDataFrame with 20 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity1 entity1 alias_1 1 TRUE tree1
## entity2 entity2 alias_2 2 TRUE tree1
## entity3 entity3 alias_3 3 TRUE tree1
## entity4 entity4 alias_4 4 TRUE tree1
## entity5 entity5 alias_5 5 TRUE tree1
## ... ... ... ... ... ...
## entity6 entity6 alias_6 6 TRUE tree2
## entity7 entity7 alias_7 7 TRUE tree2
## entity8 entity8 alias_8 8 TRUE tree2
## entity9 entity9 alias_9 9 TRUE tree2
## entity10 entity10 alias_10 10 TRUE tree2
To run cbind
, TSEs should agree in the row dimension. If TSEs only differ in the row tree, the row tree and the row link data are dropped.
cbind(tse_a, tse_a)
## class: TreeSummarizedExperiment
## dim: 10 8
## metadata(0):
## assays(1): ''
## rownames(10): entity1 entity2 ... entity9 entity10
## rowData names(2): var1 var2
## colnames(8): sample1 sample2 ... sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (10 rows)
## rowTree: 1 phylo tree(s) (10 leaves)
## colLinks: NULL
## colTree: NULL
cbind(tse_a, tse_b)
## Warning in cbind(...): rowTree & rowLinks differ in the provided TSEs.
## rowTree & rowLinks are dropped after 'cbind'
## class: TreeSummarizedExperiment
## dim: 10 8
## metadata(0):
## assays(1): ''
## rownames(10): entity1 entity2 ... entity9 entity10
## rowData names(2): var1 var2
## colnames(8): sample1 sample2 ... sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (10 rows)
## rowTree: 1 phylo tree(s) (10 leaves)
## colLinks: NULL
## colTree: NULL
We obtain a subset of tse_ab
by extracting the data on rows 11:15
. These rows are mapped to the same tree named as phylo.1
. So, the rowTree
slot of sse
has only one tree.
(sse <- tse_ab[11:15, ])
## class: TreeSummarizedExperiment
## dim: 5 4
## metadata(0):
## assays(1): ''
## rownames(5): entity1 entity2 entity3 entity4 entity5
## rowData names(2): var1 var2
## colnames(4): sample1 sample2 sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (5 rows)
## rowTree: 1 phylo tree(s) (10 leaves)
## colLinks: NULL
## colTree: NULL
rowLinks(sse)
## LinkDataFrame with 5 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity1 entity1 alias_1 1 TRUE tree2
## entity2 entity2 alias_2 2 TRUE tree2
## entity3 entity3 alias_3 3 TRUE tree2
## entity4 entity4 alias_4 4 TRUE tree2
## entity5 entity5 alias_5 5 TRUE tree2
[
works not only as a getter but also a setter to replace a subset of sse
.
set.seed(3)
tse_c <- makeTSE(include.colTree = FALSE)
rowTreeNames(tse_c) <- "new_tree"
# the first two rows are from tse_c, and are mapped to 'new_tree'
sse[1:2, ] <- tse_c[5:6, ]
rowLinks(sse)
## LinkDataFrame with 5 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity5 entity5 alias_5 5 TRUE new_tree
## entity6 entity6 alias_6 6 TRUE new_tree
## entity3 entity3 alias_3 3 TRUE tree2
## entity4 entity4 alias_4 4 TRUE tree2
## entity5 entity5 alias_5 5 TRUE tree2
The TSE object can be subset also by nodes or/and trees using subsetByNodes
# by tree
sse_a <- subsetByNode(x = sse, whichRowTree = "new_tree")
rowLinks(sse_a)
## LinkDataFrame with 2 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity5 entity5 alias_5 5 TRUE new_tree
## entity6 entity6 alias_6 6 TRUE new_tree
# by node
sse_b <- subsetByNode(x = sse, rowNode = 5)
rowLinks(sse_b)
## LinkDataFrame with 2 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity5 entity5 alias_5 5 TRUE new_tree
## entity5 entity5 alias_5 5 TRUE tree2
# by tree and node
sse_c <- subsetByNode(x = sse, rowNode = 5, whichRowTree = "tree2")
rowLinks(sse_c)
## LinkDataFrame with 1 row and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity5 entity5 alias_5 5 TRUE tree2
By using colTree
, we can add a column tree to sse
that has no column tree before.
colTree(sse)
## NULL
library(ape)
set.seed(1)
col_tree <- rtree(ncol(sse))
# To use 'colTree` as a setter, the input tree should have node labels matching
# with column names of the TSE.
col_tree$tip.label <- colnames(sse)
colTree(sse) <- col_tree
colTree(sse)
##
## Phylogenetic tree with 4 tips and 3 internal nodes.
##
## Tip labels:
## sample1, sample2, sample3, sample4
##
## Rooted; includes branch lengths.
sse
has two row trees. We can replace one of them with a new tree by
specifying whichTree
of the rowTree
.
# the original row links
rowLinks(sse)
## LinkDataFrame with 5 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity5 entity5 alias_5 5 TRUE new_tree
## entity6 entity6 alias_6 6 TRUE new_tree
## entity3 entity3 alias_3 3 TRUE tree2
## entity4 entity4 alias_4 4 TRUE tree2
## entity5 entity5 alias_5 5 TRUE tree2
# the new row tree
set.seed(1)
row_tree <- rtree(4)
row_tree$tip.label <- paste0("entity", 5:7)
# replace the tree named as the 'new_tree'
nse <- sse
rowTree(nse, whichTree = "new_tree") <- row_tree
rowLinks(nse)
## LinkDataFrame with 5 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity5 entity5 alias_1 1 TRUE new_tree
## entity6 entity6 alias_2 2 TRUE new_tree
## entity3 entity3 alias_3 3 TRUE tree2
## entity4 entity4 alias_4 4 TRUE tree2
## entity5 entity5 alias_5 5 TRUE tree2
In the row links, the first two rows now have new values in nodeNum
and
nodeLab_alias
. The name in whichTree
is not changed but the tree is actually
updated.
# FALSE is expected
identical(rowTree(sse, whichTree = "new_tree"),
rowTree(nse, whichTree = "new_tree"))
## [1] FALSE
# TRUE is expected
identical(rowTree(nse, whichTree = "new_tree"),
row_tree)
## [1] TRUE
If nodes of the input tree and rows of the TSE are named differently, users
can match rows with nodes via changeTree
with rowNodeLab
provided.
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggplot2_3.3.5 ggtree_3.2.0
## [3] ape_5.5 TreeSummarizedExperiment_2.2.0
## [5] Biostrings_2.62.0 XVector_0.34.0
## [7] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
## [9] Biobase_2.54.0 GenomicRanges_1.46.0
## [11] GenomeInfoDb_1.30.0 IRanges_2.28.0
## [13] S4Vectors_0.32.0 BiocGenerics_0.40.0
## [15] MatrixGenerics_1.6.0 matrixStats_0.61.0
## [17] BiocStyle_2.22.0
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.0 tidyr_1.1.4 jsonlite_1.7.2
## [4] bslib_0.3.1 assertthat_0.2.1 BiocManager_1.30.16
## [7] highr_0.9 yulab.utils_0.0.4 GenomeInfoDbData_1.2.7
## [10] yaml_2.2.1 pillar_1.6.4 lattice_0.20-45
## [13] glue_1.4.2 digest_0.6.28 colorspace_2.0-2
## [16] ggfun_0.0.4 htmltools_0.5.2 Matrix_1.3-4
## [19] pkgconfig_2.0.3 magick_2.7.3 bookdown_0.24
## [22] zlibbioc_1.40.0 purrr_0.3.4 patchwork_1.1.1
## [25] tidytree_0.3.5 scales_1.1.1 ggplotify_0.1.0
## [28] BiocParallel_1.28.0 tibble_3.1.5 farver_2.1.0
## [31] generics_0.1.1 ellipsis_0.3.2 withr_2.4.2
## [34] lazyeval_0.2.2 magrittr_2.0.1 crayon_1.4.1
## [37] evaluate_0.14 fansi_0.5.0 nlme_3.1-153
## [40] tools_4.1.1 lifecycle_1.0.1 stringr_1.4.0
## [43] aplot_0.1.1 munsell_0.5.0 DelayedArray_0.20.0
## [46] compiler_4.1.1 jquerylib_0.1.4 gridGraphics_0.5-1
## [49] rlang_0.4.12 grid_4.1.1 RCurl_1.98-1.5
## [52] labeling_0.4.2 bitops_1.0-7 rmarkdown_2.11
## [55] gtable_0.3.0 DBI_1.1.1 R6_2.5.1
## [58] knitr_1.36 dplyr_1.0.7 fastmap_1.1.0
## [61] utf8_1.2.2 treeio_1.18.0 stringi_1.7.5
## [64] parallel_4.1.1 Rcpp_1.0.7 vctrs_0.3.8
## [67] tidyselect_1.1.1 xfun_0.27