TreeSummarizedExperiment 1.2.0
phylo
object.phylo
object to a matrixTreeSummarizedExperiment
classThe TreeSummarizedExperiment
class is an extension of the
SingleCellExperiment
class (Lun and Risso 2019). It’s used to store rectangular data
of experimental results as in a SingleCellExperiment
, and also supports the
storage of a hierarchical structure and its link information to the rectangular
data.
Compared with the SingleCellExperiment
class, TreeSummarizedExperiment
has
four more slots.
rowTree
: the hierarchical structure on the rows of the assays
tables.rowLinks
: the link between rows of the assays
tables and the rowTree
.colTree
: the hierarchical structure on the columns of the assays
tables.colLinks
: the link information between columns of assays
tables and the
colTree
.The rowTree
and colTree
could be empty (NULL
) if no trees are available.
Correspondingly, the rowLinks
and colLinks
would be NULL. All the other slots in
TreeSummarizedExperimentare inherited from
SingleCellExperiment`.
The slots rowTree
and colTree
only accept the tree data as the phylo
class. If the tree is available in other formats, one would need to convert it
to phylo
with other R packages. For example, the package treeio
provides 12 functions to import different tree formats and output phylo
object
in the slot phylo
.
suppressPackageStartupMessages({
library(TreeSummarizedExperiment)
library(S4Vectors)
library(ggtree)
library(ape)})
We generate a assay_data with observations of 5 entities collected from 4 samples.
# assays data
assay_data <- rbind(rep(0, 4), matrix(1:16, nrow = 4))
colnames(assay_data) <- paste(rep(LETTERS[1:2], each = 2),
rep(1:2, 2), sep = "_")
rownames(assay_data) <- paste("entity", seq_len(5), sep = "")
assay_data
## A_1 A_2 B_1 B_2
## entity1 0 0 0 0
## entity2 1 5 9 13
## entity3 2 6 10 14
## entity4 3 7 11 15
## entity5 4 8 12 16
The descriptions of the 5 entities and 4 samples are given in the row_data and col_data, respectively.
# row data
row_data <- DataFrame(var1 = sample(letters[1:2], 5, replace = TRUE),
var2 = sample(c(TRUE, FALSE), 5, replace = TRUE),
row.names = rownames(assay_data))
row_data
## DataFrame with 5 rows and 2 columns
## var1 var2
## <character> <logical>
## entity1 a TRUE
## entity2 a FALSE
## entity3 b TRUE
## entity4 a FALSE
## entity5 a FALSE
# column data
col_data <- DataFrame(gg = c(1, 2, 3, 3),
group = rep(LETTERS[1:2], each = 2),
row.names = colnames(assay_data))
col_data
## DataFrame with 4 rows and 2 columns
## gg group
## <numeric> <character>
## A_1 1 A
## A_2 2 A
## B_1 3 B
## B_2 3 B
The hierarchical structure of the 5 entities is denoted as
row_tree. The hierarchical structure of the 4 samples is
denoted as col_tree. We create them by using the function rtree
from the
package ape.
# Toy tree 1
set.seed(1)
row_tree <- rtree(5)
class(row_tree)
## [1] "phylo"
# Toy tree 2
set.seed(4)
col_tree <- rtree(4)
col_tree$tip.label <- colnames(assay_data)
col_tree$node.label <- c("All", "GroupA", "GroupB")
The created trees are phylo
objects. The phylo
object is actually a list
with at least four elements: edge
, tip.label
, edge.length
, and Nnode
.
class(row_tree)
## [1] "phylo"
str(row_tree)
## List of 4
## $ edge : int [1:8, 1:2] 6 6 7 8 8 9 9 7 1 7 ...
## $ tip.label : chr [1:5] "t2" "t1" "t3" "t4" ...
## $ edge.length: num [1:8] 0.0618 0.206 0.1766 0.687 0.3841 ...
## $ Nnode : int 4
## - attr(*, "class")= chr "phylo"
## - attr(*, "order")= chr "cladewise"
The package ggtree (Yu et al. 2017) has been used to visualize the tree. The node labels and node numbers are in blue and orange texts, respectively. The row_tree has no labels for internal nodes.
# Visualize the row tree
ggtree(row_tree, size = 2) +
geom_text2(aes(label = node), color = "darkblue",
hjust = -0.5, vjust = 0.7, size = 6) +
geom_text2(aes(label = label), color = "darkorange",
hjust = -0.1, vjust = -0.7, size = 6)
The col_tree has labels for internal nodes.
# Visualize the column tree
ggtree(col_tree, size = 2) +
geom_text2(aes(label = node), color = "darkblue",
hjust = -0.5, vjust = 0.7, size = 6) +
geom_text2(aes(label = label), color = "darkorange",
hjust = -0.1, vjust = -0.7, size = 6)
TreeSummarizedExperiment
The TreeSummarizedExperiment
class is used to store the toy data:
assay_data, row_data, col_data, col_tree and row_tree, To
correctly store data, the link information between the rows (or columns) of
assay_data and the nodes of the row_tree (or col_tree) is requried
to provide via a charactor vector rowNodeLab
(or colNodeLab
). Those columns
or rows that don’t match with any node of the tree structure are removed with
warnings. The link data between the assays
tables and the tree data is
automatically generated in the construction.
Below shows an example to construct TreeSummarizedExperiment
without the
column tree.
# provide the node labels in rowNodeLab
node_lab <- row_tree$tip.label
row_tse <- TreeSummarizedExperiment(assays = list(assay_data),
rowData = row_data,
colData = col_data,
rowTree = row_tree,
rowNodeLab = node_lab)
When printing out row_tse, we see a similar message as
SingleCellExperiment
with four additional lines about rowLinks
, rowTree
,
colLinks
and colTree
. Here, row_tse stores a row tree (phylo
object),
and the rowLinks
has 5 rows that is exactly the same as the
number of rows in the assays
tables. More details about the link data could be
found in Section 2.4.2.
row_tse
## class: TreeSummarizedExperiment
## dim: 5 4
## metadata(0):
## assays(1): ''
## rownames(5): entity1 entity2 entity3 entity4 entity5
## rowData names(2): var1 var2
## colnames(4): A_1 A_2 B_1 B_2
## colData names(2): gg group
## reducedDimNames(0):
## spikeNames(0):
## altExpNames(0):
## rowLinks: a LinkDataFrame (5 rows)
## rowTree: a phylo (5 leaves)
## colLinks: NULL
## colTree: NULL
If the row tree and the column tree are both available, the
TreeSummarizedExperiment
could be constructed similarly as below. Here, the
column names of the assays
table match with the node labels used in the column
tree. So, we could omit the step of providing colNodeLab
.
all(colnames(assay_data) %in% c(col_tree$tip.label, col_tree$node.label))
## [1] TRUE
both_tse <- TreeSummarizedExperiment(assays = list(assay_data),
rowData = row_data,
colData = col_data,
rowTree = row_tree,
rowNodeLab = node_lab,
colTree = col_tree)
Compared to row_tse, both_tse includes also a column tree. The column
link data (colLinks
) with 4 rows is automatically generated.
The number of rows in the link data is decided by the column dimension of the
assays
tables.
both_tse
## class: TreeSummarizedExperiment
## dim: 5 4
## metadata(0):
## assays(1): ''
## rownames(5): entity1 entity2 entity3 entity4 entity5
## rowData names(2): var1 var2
## colnames(4): A_1 A_2 B_1 B_2
## colData names(2): gg group
## reducedDimNames(0):
## spikeNames(0):
## altExpNames(0):
## rowLinks: a LinkDataFrame (5 rows)
## rowTree: a phylo (5 leaves)
## colLinks: a LinkDataFrame (4 rows)
## colTree: a phylo (4 leaves)
For slots inherited from the SingleCellExperiment
class, the accessors are
exactly the same as shown in SingleCellExperiment.
# to get the first table in the assays
(count <- assays(both_tse)[[1]])
## A_1 A_2 B_1 B_2
## entity1 0 0 0 0
## entity2 1 5 9 13
## entity3 2 6 10 14
## entity4 3 7 11 15
## entity5 4 8 12 16
# to get row data
rowData(both_tse)
## DataFrame with 5 rows and 2 columns
## var1 var2
## <character> <logical>
## entity1 a TRUE
## entity2 a FALSE
## entity3 b TRUE
## entity4 a FALSE
## entity5 a FALSE
# to get column data
colData(both_tse)
## DataFrame with 4 rows and 2 columns
## gg group
## <numeric> <character>
## A_1 1 A
## A_2 2 A
## B_1 3 B
## B_2 3 B
# to get metadata: it's empty here
metadata(both_tse)
## list()
The row link and column link could be accessed via rowLinks
and colLinks
,
respectively. The output would be a LinkDataFrame
object. The LinkDataFrame
class is extended from the DataFrame
class with the restriction that it has at
least four columns: nodeLab, nodeLab_alias, nodeNum, and
isLeaf. More details about the DataFrame
class could be found in the
S4Vectors package.
When a phylo
tree is available in the rowTree
, we could see a
LinkDataFrame
object in the rowLinks
. The number of rows of rowLinks
data
matches with the number of rows of assays
tables.
(rLink <- rowLinks(both_tse))
## LinkDataFrame with 5 rows and 4 columns
## nodeLab nodeLab_alias nodeNum isLeaf
## <character> <character> <integer> <logical>
## 1 t2 alias_1 1 TRUE
## 2 t1 alias_2 2 TRUE
## 3 t3 alias_3 3 TRUE
## 4 t4 alias_4 4 TRUE
## 5 t5 alias_5 5 TRUE
class(rLink)
## [1] "LinkDataFrame"
## attr(,"package")
## [1] "TreeSummarizedExperiment"
showClass("LinkDataFrame")
## Class "LinkDataFrame" [package "TreeSummarizedExperiment"]
##
## Slots:
##
## Name: rownames nrows listData
## Class: character_OR_NULL integer list
##
## Name: elementType elementMetadata metadata
## Class: character DataTable_OR_NULL list
##
## Extends:
## Class "DFrame", directly
## Class "LinkDataFrame_Or_NULL", directly
## Class "DataFrame", by class "DFrame", distance 2
## Class "DataTable", by class "DFrame", distance 3
## Class "SimpleList", by class "DFrame", distance 3
## Class "DataTable_OR_NULL", by class "DFrame", distance 4
## Class "List", by class "DFrame", distance 4
## Class "Vector", by class "DFrame", distance 5
## Class "list_OR_List", by class "DFrame", distance 5
## Class "Annotated", by class "DFrame", distance 6
## Class "vector_OR_Vector", by class "DFrame", distance 6
nrow(rLink) == nrow(both_tse)
## [1] TRUE
Similarly, the number of rows of colLinks
data matches with the number of
columns of assays
table.
(cLink <- colLinks(both_tse))
## LinkDataFrame with 4 rows and 4 columns
## nodeLab nodeLab_alias nodeNum isLeaf
## <character> <character> <integer> <logical>
## 1 A_1 alias_1 1 TRUE
## 2 A_2 alias_2 2 TRUE
## 3 B_1 alias_3 3 TRUE
## 4 B_2 alias_4 4 TRUE
nrow(cLink) == ncol(both_tse)
## [1] TRUE
If the tree is not available, the corresponding link data is NULL
.
colTree(row_tse)
## NULL
colLinks(row_tse)
## NULL
The link data is automatically generated when constructing the
TreeSummarizedExperiment
object. We highly recommend users not to modify it
manually; otherwise the link might be broken. For R packages developers, we show
in the Section 5.2 about how to update the link.
We could use [
to subset the TreeSummarizedExperiment
. To keep track of the
original data, the rowTree
and colTree
stay the same in the subsetting.
sub_tse <- both_tse[1:2, 1]
sub_tse
## class: TreeSummarizedExperiment
## dim: 2 1
## metadata(0):
## assays(1): ''
## rownames(2): entity1 entity2
## rowData names(2): var1 var2
## colnames(1): A_1
## colData names(2): gg group
## reducedDimNames(0):
## spikeNames(0):
## altExpNames(0):
## rowLinks: a LinkDataFrame (2 rows)
## rowTree: a phylo (5 leaves)
## colLinks: a LinkDataFrame (1 rows)
## colTree: a phylo (4 leaves)
The annotation data on the row and column dimension is changed accordingly.
# The first four columns are from rowLinks data and the others from rowData
cbind(rowLinks(sub_tse), rowData(sub_tse))
## DataFrame with 2 rows and 6 columns
## nodeLab nodeLab_alias nodeNum isLeaf var1 var2
## <character> <character> <integer> <logical> <character> <logical>
## 1 t2 alias_1 1 TRUE a TRUE
## 2 t1 alias_2 2 TRUE a FALSE
# The first four columns are from colLinks data and the others from colData
cbind(colLinks(sub_tse), colData(sub_tse))
## DataFrame with 1 row and 6 columns
## nodeLab nodeLab_alias nodeNum isLeaf gg group
## <character> <character> <integer> <logical> <numeric> <character>
## 1 A_1 alias_1 1 TRUE 1 A
The aggregation is allowed on the row and the column dimension.
Here, we show the aggregation on the column dimension. The
TreeSummarizedExperiment
object is assigned to the argument x
. The desired
aggregation level is given in colLevel
. The level could be specified via the
node label (the orange texts in Figure 3) or the node number (the
blue texts in Figure 3). We could further decide how to aggregate
via the argument FUN
.
# use node labels to specify colLevel
aggCol <- aggValue(x = both_tse,
colLevel = c("GroupA", "GroupB"),
FUN = sum)
# or use node numbers to specify colLevel
aggCol <- aggValue(x = both_tse, colLevel = c(6, 7), FUN = sum)
assays(aggCol)[[1]]
## alias_6 alias_7
## entity1 0 0
## entity2 15 14
## entity3 18 16
## entity4 21 18
## entity5 24 20
The rowData
doesn’t change, but the colData
adjusts with the change of the
table. For example, the column group has the A
value for
GroupA
because the descendant nodes of GroupA
all have the value A
; the
column gg has the NA
value for GroupA
because the descendant nodes of
GroupA
have different values, (1 and 2).
# before aggregation
colData(both_tse)
## DataFrame with 4 rows and 2 columns
## gg group
## <numeric> <character>
## A_1 1 A
## A_2 2 A
## B_1 3 B
## B_2 3 B
# after aggregation
colData(aggCol)
## DataFrame with 2 rows and 2 columns
## gg group
## <logical> <logical>
## alias_6 NA NA
## alias_7 NA NA
The colLinks
is updated to link the new rows of assays
tables and the column
tree.
# the link data is updated
colLinks(aggCol)
## LinkDataFrame with 2 rows and 4 columns
## nodeLab nodeLab_alias nodeNum isLeaf
## <character> <character> <integer> <logical>
## 1 GroupA alias_6 6 FALSE
## 2 GroupB alias_7 7 FALSE
From the Figure 2, we could see that the nodes 6 and 7 are
labelled with GroupA
and GroupB
, respectively. This agrees with the
column link data.
It’s similar to the aggregation on the row dimension, except that the level
should be specified via rowLevel
.
agg_row <- aggValue(x = both_tse, rowLevel = 7:9, FUN = sum)
Now, the output assays
table has 3 rows.
assays(agg_row)[[1]]
## A_1 A_2 B_1 B_2
## alias_7 10 26 42 58
## alias_8 6 18 30 42
## alias_9 5 13 21 29
We could see which row corresponds to which nodes via the rowLinks
data.
rowLinks(agg_row)
## LinkDataFrame with 3 rows and 4 columns
## nodeLab nodeLab_alias nodeNum isLeaf
## <character> <character> <integer> <logical>
## 1 NA alias_7 7 FALSE
## 2 NA alias_8 8 FALSE
## 3 NA alias_9 9 FALSE
The Figure 2 shows that the nodes 7, 8 and 9 have no labels.
Therefore, the nodeLab
column in LinkData
of the row data has missing value.
They are all internal nodes and hence the column isLeaf
has only FALSE
value.
The aggregation on both row and column dimensions could be performed in one step
using the same function specified via FUN
. If different functions are required
for different dimension, it’s suggested to do it in two steps as described in
Section 3.2 and Section 3.1 because the order of aggregation
might matter.
agg_both <- aggValue(x = both_tse, colLevel = c(6, 7),
rowLevel = 7:9, FUN = sum)
As expected, we obtain a table with 3 rows (rowLevel = 7:9
) and 2 columns
(colLevel = c(6, 7)
).
assays(agg_both)[[1]]
## alias_6 alias_7
## alias_7 78 68
## alias_8 54 48
## alias_9 39 34
In some case, the information of the hierarchical structure is available as a
data.frame
instead of the phylo
object mentioned above. To do the work
listed above, we could convert the data.frame
to the phylo
class.
The function toTree
outputs the hierarchical information into a phylo
object. If the data set is large, we suggest to allow cache = TRUE
to speed up
the aggregation step.
# The toy taxonomic table
taxa <- data.frame(Kindom = rep("A", 5),
Phylum = c("B1", rep("B2", 4)),
Class = c("C1", "C2", "C3", "C3", NA),
OTU = c("D1", "D2", "D3", "D4", NA))
# convert to a phylo tree
taxa_tree <- toTree(data = taxa, cache = FALSE)
ggtree(taxa_tree)+
geom_text2(aes(label = node), color = "darkblue",
hjust = -0.5, vjust = 0.7, size = 6) +
geom_text2(aes(label = label), color = "darkorange",
hjust = -0.1, vjust = -0.7, size = 6) +
geom_point2()
# construct a TreeSummarizedExperiment object
taxa_tse <- TreeSummarizedExperiment(assays = list(assay_data),
rowData = row_data,
rowTree = taxa_tree,
rowNodeLab = taxa_tree$tip.label)
Here is about how to aggregate to the phylum level.
# specify the level
taxa_lab <- c(taxa_tree$tip.label, taxa_tree$node.label)
ii <- startsWith(taxa_lab, "Phylum:")
(l1 <- taxa_lab[ii])
## [1] "Phylum:B1" "Phylum:B2"
# aggregate
agg_taxa <- aggValue(x = taxa_tse, rowLevel = l1, FUN = sum)
assays(agg_taxa)[[1]]
## A_1 A_2 B_1 B_2
## alias_7 0 0 0 0
## alias_9 10 26 42 58
rowData(agg_taxa)
## DataFrame with 2 rows and 2 columns
## var1 var2
## <character> <logical>
## alias_7 a TRUE
## alias_9 NA NA
The aggregation could be on any freely combined level.
# specify the level
l2 <- c("Class:C3", "Phylum:B1")
# aggregate
agg_any <- aggValue(x = taxa_tse, rowLevel = l2, FUN = sum)
assays(agg_any)[[1]]
## A_1 A_2 B_1 B_2
## alias_11 5 13 21 29
## alias_7 0 0 0 0
rowData(agg_any)
## DataFrame with 2 rows and 2 columns
## var1 var2
## <character> <logical>
## alias_11 NA NA
## alias_7 a TRUE
phylo
object.Here, we show some functions as examples to manipulate or to extract information
from the phylo
object. More functions could be found in other packages, such
as ape (Paradis and Schliep 2018), tidytree. These functions might
be useful when R package developers want to create their own functions to work
on the TreeSummarizedExperiment
class.
Below shows the node label (black texts) and node number (blue texts) of each node on an example tree.
ggtree(tinyTree, branch.length = "none") +
geom_text2(aes(label = label), hjust = -0.3) +
geom_text2(aes(label = node), vjust = -0.8,
hjust = -0.3, color = 'blue')
We could specify to print out all nodes (type = "all"
), the leaves (type = "leaf"
) or the internal nodes (type = "internal"
).
printNode(tree = tinyTree, type = "all")
## nodeLab nodeLab_alias nodeNum isLeaf
## 1 t2 alias1 1 TRUE
## 2 t7 alias2 2 TRUE
## 3 t6 alias3 3 TRUE
## 4 t9 alias4 4 TRUE
## 5 t4 alias5 5 TRUE
## 6 t8 alias6 6 TRUE
## 7 t10 alias7 7 TRUE
## 8 t1 alias8 8 TRUE
## 9 t5 alias9 9 TRUE
## 10 t3 alias10 10 TRUE
## 11 Node_11 alias_11 11 FALSE
## 12 Node_12 alias_12 12 FALSE
## 13 Node_13 alias_13 13 FALSE
## 14 Node_14 alias_14 14 FALSE
## 15 Node_15 alias_15 15 FALSE
## 16 Node_16 alias_16 16 FALSE
## 17 Node_17 alias_17 17 FALSE
## 18 Node_18 alias_18 18 FALSE
## 19 Node_19 alias_19 19 FALSE
# The number of leaves
countLeaf(tree = tinyTree)
## [1] 10
# The number of nodes (leaf nodes and internal nodes)
countNode(tree = tinyTree)
## [1] 19
The translation between the labels and the numbers of nodes could be achieved by
the function transNode
.
transNode(tree = tinyTree, node = c(12, 1, 4))
## [1] "Node_12" "t2" "t9"
transNode(tree = tinyTree, node = c("t4", "Node_18"))
## t4 Node_18
## 5 18
To get descendants that are on the leaf level, we could set the argument
only.leaf = TRUE
.
# only the leaf nodes
findOS(tree = tinyTree, node = 17, only.leaf = TRUE)
## $Node_17
## [1] 6 4 5
The argument only.leaf = FALSE
is set to get all descendants
# all descendant nodes
findOS(tree = tinyTree, node = 17, only.leaf = FALSE)
## $Node_17
## [1] 6 4 18 5
The input node
could be either the node label or the node number.
# node = 5, node = "t4" are the same node
findSibling(tree = tinyTree, node = 5)
## t9
## 4
findSibling(tree = tinyTree, node = "t4")
## t9
## 4
isLeaf(tree = tinyTree, node = 5)
## [1] TRUE
isLeaf(tree = tinyTree, node = 17)
## [1] FALSE
The distance between any two nodes on the tree could be calculated by
distNode
.
distNode(tree = tinyTree, node = c(1, 5))
## [1] 2.699212
We could specify the leaf nodes rmLeaf
to remove parts of a tree. If
mergeSingle = TRUE
, the internal node that is connected to the removed leaf
nodes is removed too; otherwise, it is kept.
NT1 <- pruneTree(tree = tinyTree, rmLeaf = c(4, 5),
mergeSingle = TRUE)
ggtree(NT1, branch.length = "none") +
geom_text2(aes(label = label), color = "darkorange",
hjust = -0.1, vjust = -0.7) +
geom_point2()
NT2 <- pruneTree(tree = tinyTree, rmLeaf = c(4, 5),
mergeSingle = FALSE)
ggtree(NT2, branch.length = "none") +
geom_text2(aes(label = label), color = "darkorange",
hjust = -0.1, vjust = -0.7) +
geom_point2()
phylo
object to a matrixEach row gives a path that connects a leaf and the root.
matTree(tree = tinyTree)
## L1 L2 L3 L4 L5 L6 L7
## [1,] 1 13 12 11 NA NA NA
## [2,] 2 14 13 12 11 NA NA
## [3,] 3 14 13 12 11 NA NA
## [4,] 4 18 17 16 15 12 11
## [5,] 5 18 17 16 15 12 11
## [6,] 6 17 16 15 12 11 NA
## [7,] 7 19 16 15 12 11 NA
## [8,] 8 19 16 15 12 11 NA
## [9,] 9 15 12 11 NA NA NA
## [10,] 10 11 NA NA NA NA NA
TreeSummarizedExperiment
classWe show examples about how to create functions for the
TreeSummarizedExperiment
. R package developers could customize their functions
based on the functions provided above on the phylo
object or develop their own
ones.
Here, a function rmRows
is created to remove entities (on rows) that have zero
in all samples (on columns) in the first assays
table.
# dat: a TreeSummarizedExperiment
rmRows <- function(dat) {
# calculate the total counts of each row
count <- assays(dat)[[1]]
tot <- apply(count, 1, sum)
# find the row with zero in all columns
ind <- which(tot == 0)
# remove those rows
out <- dat[-ind, ]
return(out)
}
(rte <- rmRows(dat = both_tse))
## class: TreeSummarizedExperiment
## dim: 4 4
## metadata(0):
## assays(1): ''
## rownames(4): entity2 entity3 entity4 entity5
## rowData names(2): var1 var2
## colnames(4): A_1 A_2 B_1 B_2
## colData names(2): gg group
## reducedDimNames(0):
## spikeNames(0):
## altExpNames(0):
## rowLinks: a LinkDataFrame (4 rows)
## rowTree: a phylo (5 leaves)
## colLinks: a LinkDataFrame (4 rows)
## colTree: a phylo (4 leaves)
rowLinks(rte)
## LinkDataFrame with 4 rows and 4 columns
## nodeLab nodeLab_alias nodeNum isLeaf
## <character> <character> <integer> <logical>
## 1 t1 alias_2 2 TRUE
## 2 t3 alias_3 3 TRUE
## 3 t4 alias_4 4 TRUE
## 4 t5 alias_5 5 TRUE
The function rmRows
doesn’t update the tree data. To update the tree, we could
do it as below with the help of ape::drop.tip
.
updateRowTree <- function(tse, dropLeaf) {
## -------------- new tree: drop leaves ----------
oldTree <- rowTree(tse)
newTree <- ape::drop.tip(phy = oldTree, tip = dropLeaf)
## -------------- update the row link ----------
# track the tree
track <- trackNode(oldTree)
track <- ape::drop.tip(phy = track, tip = dropLeaf)
# row links
rowL <- rowLinks(tse)
rowL <- DataFrame(rowL)
# update the row links:
# 1. use the alias label to track and updates the nodeNum
# 2. the nodeLab should be updated based on the new tree using the new
# nodeNum
# 3. lastly, update the nodeLab_alias
rowL$nodeNum <- transNode(tree = track, node = rowL$nodeLab_alias,
message = FALSE)
rowL$nodeLab <- transNode(tree = newTree, node = rowL$nodeNum,
use.alias = FALSE, message = FALSE)
rowL$nodeLab_alias <- transNode(tree = newTree, node = rowL$nodeNum,
use.alias = TRUE, message = FALSE)
rowL$isLeaf <- isLeaf(tree = newTree, node = rowL$nodeNum)
rowNL <- as(rowL, "LinkDataFrame")
## update the row tree and links
newDat <- BiocGenerics:::replaceSlots(tse,
rowLinks = rowNL,
rowTree = list(phylo = newTree))
return(newDat)
}
Now the row tree has four leaves.
# find the mismatch between the rows of the 'assays' table and the leaves of the
# tree
row_tree <- rowTree(rte)
row_link <- rowLinks(rte)
leaf_tree <- printNode(tree = row_tree,type = "leaf")$nodeNum
leaf_data <- row_link$nodeNum[row_link$isLeaf]
leaf_rm <- setdiff(leaf_tree, leaf_data)
ntse <- updateRowTree(tse = rte, dropLeaf = leaf_rm)
ntse
## class: TreeSummarizedExperiment
## dim: 4 4
## metadata(0):
## assays(1): ''
## rownames(4): entity2 entity3 entity4 entity5
## rowData names(2): var1 var2
## colnames(4): A_1 A_2 B_1 B_2
## colData names(2): gg group
## reducedDimNames(0):
## spikeNames(0):
## altExpNames(0):
## rowLinks: a LinkDataFrame (4 rows)
## rowTree: a phylo (4 leaves)
## colLinks: a LinkDataFrame (4 rows)
## colTree: a phylo (4 leaves)
rowLinks(ntse)
## LinkDataFrame with 4 rows and 4 columns
## nodeLab nodeLab_alias nodeNum isLeaf
## <character> <character> <integer> <logical>
## 1 t1 alias_1 1 TRUE
## 2 t3 alias_2 2 TRUE
## 3 t4 alias_3 3 TRUE
## 4 t5 alias_4 4 TRUE
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ape_5.3 ggtree_2.0.0
## [3] TreeSummarizedExperiment_1.2.0 SingleCellExperiment_1.8.0
## [5] SummarizedExperiment_1.16.0 DelayedArray_0.12.0
## [7] BiocParallel_1.20.0 matrixStats_0.55.0
## [9] Biobase_2.46.0 GenomicRanges_1.38.0
## [11] GenomeInfoDb_1.22.0 IRanges_2.20.0
## [13] S4Vectors_0.24.0 BiocGenerics_0.32.0
## [15] BiocStyle_2.14.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.2 lattice_0.20-38 tidyr_1.0.0
## [4] assertthat_0.2.1 zeallot_0.1.0 digest_0.6.22
## [7] R6_2.4.0 backports_1.1.5 evaluate_0.14
## [10] ggplot2_3.2.1 highr_0.8 pillar_1.4.2
## [13] zlibbioc_1.32.0 rlang_0.4.1 lazyeval_0.2.2
## [16] Matrix_1.2-17 rmarkdown_1.16 labeling_0.3
## [19] stringr_1.4.0 RCurl_1.95-4.12 munsell_0.5.0
## [22] compiler_3.6.1 xfun_0.10 pkgconfig_2.0.3
## [25] htmltools_0.4.0 tidyselect_0.2.5 tibble_2.1.3
## [28] GenomeInfoDbData_1.2.2 bookdown_0.14 crayon_1.3.4
## [31] dplyr_0.8.3 bitops_1.0-6 grid_3.6.1
## [34] nlme_3.1-141 jsonlite_1.6 gtable_0.3.0
## [37] lifecycle_0.1.0 magrittr_1.5 scales_1.0.0
## [40] tidytree_0.2.9 stringi_1.4.3 XVector_0.26.0
## [43] rvcheck_0.1.5 vctrs_0.2.0 tools_3.6.1
## [46] treeio_1.10.0 glue_1.3.1 purrr_0.3.3
## [49] yaml_2.2.0 colorspace_1.4-1 BiocManager_1.30.9
## [52] knitr_1.25
Lun, Aaron, and Davide Risso. 2019. SingleCellExperiment: S4 Classes for Single Cell Data.
Paradis, E., and K. Schliep. 2018. “Ape 5.0: An Environment for Modern Phylogenetics and Evolutionary Analyses in R.” Bioinformatics 35:526–28.
Yu, Guangchuang, David Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. 2017. “Ggtree: An R Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data.” Methods in Ecology and Evolution 8 (1):28–36. https://doi.org/10.1111/2041-210X.12628.