recursiveSplitModule {celda} | R Documentation |
Uses the 'celda_G' model to cluster features into modules for a range of possible L's. The module labels of the previous "L-1" model are used as the initial values in the current model with L modules. The best split of an existing module is found to create the L-th module. This procedure is much faster than randomly initializing each model with a different L.
recursiveSplitModule( counts, initialL = 10, maxL = 100, tempK = 100, zInit = NULL, sampleLabel = NULL, alpha = 1, beta = 1, delta = 1, gamma = 1, minFeature = 3, reorder = TRUE, perplexity = TRUE, verbose = TRUE, logfile = NULL )
counts |
Integer matrix. Rows represent features and columns represent cells. |
initialL |
Integer. Minimum number of modules to try. |
maxL |
Integer. Maximum number of modules to try. |
tempK |
Integer. Number of temporary cell populations to identify and use in module splitting. Only used if 'zInit=NULL' Collapsing cells to a relatively smaller number of cell popluations will increase the speed of module clustering and tend to produce better modules. This number should be larger than the number of true cell populations expected in the dataset. Default 100. |
zInit |
Integer vector. Collapse cells to cell populations based on labels in 'zInit' and then perform module splitting. If NULL, no collapasing will be performed unless 'tempK' is specified. Default NULL. |
sampleLabel |
Vector or factor. Denotes the sample label for each cell (column) in the count matrix. Only used if 'zInit' is set. |
alpha |
Numeric. Concentration parameter for Theta. Adds a pseudocount to each cell population in each sample. Only used if 'zInit' is set. Default 1. |
beta |
Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature module in each cell. Default 1. |
delta |
Numeric. Concentration parameter for Psi. Adds a pseudocount to each feature in each module. Default 1. |
gamma |
Numeric. Concentration parameter for Eta. Adds a pseudocount to the number of features in each module. Default 1. |
minFeature |
Integer. Only attempt to split modules with at least this many features. |
reorder |
Logical. Whether to reorder modules using hierarchical clustering after each model has been created. If FALSE, module numbers will correspond to the split which created the module (i.e. 'L15' was created at split 15, 'L16' was created at split 16, etc.). Default TRUE. |
perplexity |
Logical. Whether to calculate perplexity for each model. If FALSE, then perplexity can be calculated later with 'resamplePerplexity()'. Default TRUE. |
verbose |
Logical. Whether to print log messages. Default TRUE. |
logfile |
Character. Messages will be redirected to a file named 'logfile'. If NULL, messages will be printed to stdout. Default NULL. |
Object of class 'celda_list', which contains results for all model parameter combinations and summaries of the run parameters. The models in the list will be of class 'celda_G' if 'zInit=NULL' or 'celda_CG' if 'zInit' is set.
'recursiveSplitCell()' for recursive splitting of cell populations.
data(celdaCGSim) ## Create models that range from L=3 to L=20 by recursively splitting modules ## into two moduleSplit <- recursiveSplitModule(celdaCGSim$counts, initialL = 3, maxL = 20) ## Example results with perplexity plotGridSearchPerplexity(moduleSplit) ## Select model for downstream analysis celdaMod <- subsetCeldaList(moduleSplit, list(L = 10))