clusterSNNGraph {scran} | R Documentation |
Perform graph-based clustering using community detection methods on a nearest-neighbor graph, where nodes represent cells or k-means centroids.
clusterSNNGraph(x, ...) ## S4 method for signature 'ANY' clusterSNNGraph( x, ..., clusterFUN = cluster_walktrap, subset.row = NULL, transposed = FALSE, use.kmeans = FALSE, kmeans.centers = NULL, kmeans.args = list(), full.stats = FALSE ) ## S4 method for signature 'SummarizedExperiment' clusterSNNGraph(x, ..., assay.type = "logcounts") ## S4 method for signature 'SingleCellExperiment' clusterSNNGraph(x, ..., use.dimred = NULL) clusterKNNGraph(x, ...) ## S4 method for signature 'ANY' clusterKNNGraph( x, ..., clusterFUN = cluster_walktrap, subset.row = NULL, transposed = FALSE, use.kmeans = FALSE, kmeans.centers = NULL, kmeans.args = list(), full.stats = FALSE ) ## S4 method for signature 'SummarizedExperiment' clusterKNNGraph(x, ..., assay.type = "logcounts") ## S4 method for signature 'SingleCellExperiment' clusterKNNGraph(x, ..., use.dimred = NULL)
x |
A matrix-like object containing expression values for each gene (row) in each cell (column).
These dimensions can be transposed if Alternatively, a SummarizedExperiment or SingleCellExperiment containing such an expression matrix. If |
... |
For the generics, additional arguments to pass to the specific methods. For the ANY methods, additional arguments to pass to For the SummarizedExperiment method, additional arguments to pass to the corresponding ANY method. For the SingleCellExperiment method, additional arguments to pass to the corresponding SummarizedExperiment method. |
clusterFUN |
Function that can take a graph object and return a communities object, see examples in the igraph package. |
subset.row |
See |
transposed |
A logical scalar indicating whether |
use.kmeans |
Logical scalar indicating whether k-means clustering should be performed. |
kmeans.centers |
Integer scalar specifying the number of clusters to use for k-means clustering.
Defaults to the square root of the number of cells in |
kmeans.args |
List containing additional named arguments to pass to |
full.stats |
Logical scalar indicating whether to return more statistics regarding the k-means clustering. |
assay.type |
A string specifying which assay values to use. |
use.dimred |
A string specifying whether existing values in |
By default, these functions simply call buildSNNGraph
or buildKNNGraph
followed by the specified clusterFUN
to generate a clustering.
We use the Walktrap algorithm by default as it has a cool-sounding name,
but users can feel free to swap it for some other algorithm (e.g., cluster_louvain
).
For large datasets, we can perform vector quantization with k-means to create centroids that are then subjected to graph-based clustering. The label for each cell is then defined as the label of the centroid to which it was assigned. In this approach, k-means and graph-based clustering complement each other; the former improves computational efficiency and mitigates density-dependent dilation, while the latter aggregates the centroids for easier interpretation.
If full.stats=FALSE
, a factor is returned containing the cluster assignment for each cell.
If full.stats=TRUE
and use.kmeans=TRUE
, a DataFrame is returned with one row per cell.
This contains the columns kmeans
, specifying the assignment of each cell to a k-means cluster;
and igraph
, specifying the assignment of each cell to a graph-based cluster operating on the k-means clusters.
In addition, the metadata
contains graph
, a graph object where each node is a k-means cluster;
and membership
, the graph-based cluster to which each node is assigned.
Aaron Lun
buildSNNGraph
and buildKNNGraph
, to build the nearest-neighbor graphs.
cluster_walktrap
and related functions, to detect communities within those graphs.
library(scater) sce <- mockSCE(ncells=500) sce <- logNormCounts(sce) clusters <- clusterSNNGraph(sce) table(clusters) # Can pass usual arguments to buildSNNGraph: clusters2 <- clusterSNNGraph(sce, k=5) table(clusters2) # Works with low-dimensional inputs: sce <- runPCA(sce, ncomponents=10) clusters3 <- clusterSNNGraph(sce, use.dimred="PCA") table(clusters3) # Turn on k-means for larger datasets, e.g., # assuming we already have a PCA result: set.seed(101010) bigpc <- matrix(rnorm(2000000), ncol=20) clusters4 <- clusterSNNGraph(bigpc, d=NA, use.kmeans=TRUE, transposed=TRUE) table(clusters4) # Extract the graph for more details: clusters5 <- clusterSNNGraph(sce, use.dimred="PCA", use.kmeans=TRUE, full.stats=TRUE) head(clusters5) metadata(clusters5)$graph