ExtractTopFeatures {CountClust} | R Documentation |
This function uses relative gene expression profile of the GoM clusters and applies a KL-divergence based method to obtain a list of top features that drive each of the clusters.
ExtractTopFeatures(theta, top_features = 10, method = c("poisson", "bernoulli"), options = c("min", "max"), shared = FALSE)
theta |
\boldsymbol{theta} matrix, the relative gene expression profile of the GoM clusters (cluster probability distributions) from the GoM model fitting (a G x K matrix where G is number of features, K number of topics). |
top_features |
The top features in each cluster k that are selected based on the feature's ability to distinguish cluster k from cluster 1, …, K for all cluster k \ne l. Default: 10. |
method |
The underlying model assumed for KL divergence measurement. Two choices considered are "bernoulli" and "poisson". Default: poisson. |
options |
if "min", for each cluster k, we select features that maximize the minimum KL divergence of cluster k against all other clusters for each feature. If "max", we select features that maximize the maximum KL divergence of cluster k against all other clusters for each feature. |
shared |
if TRUE, then we report genes that can be highly expressed in more than one cluster. Else, we stick to only those genes that are highest expressed only in a specific cluster. |
A matrix (K x top_features) which tabulates in k-th row the top feature indices driving the cluster k.
data("MouseDeng2014.FitGoM") theta_mat <- MouseDeng2014.FitGoM$clust_6$theta; top_features <- ExtractTopFeatures(theta_mat, top_features=100, method="poisson", options="min"); top_features$indices top_features$scores