mnem {mnem} | R Documentation |
This function simultaneously learns a mixture of causal networks and clusters of a cell population from single cell perturbation data (e.g. log odds of fold change) with a multi-trait readout. E.g. Pooled CRISPR scRNA-Seq data (Perturb-Seq. Dixit et al., 2016, Crop-Seq. Datlinger et al., 2017).
mnem( D, inference = "em", search = "greedy", phi = NULL, theta = NULL, mw = NULL, method = "llr", parallel = NULL, reduce = FALSE, runs = 1, starts = 3, type = "networks", complete = FALSE, p = NULL, k = NULL, kmax = 10, verbose = FALSE, max_iter = 100, parallel2 = NULL, converged = -Inf, redSpace = NULL, affinity = 0, evolution = FALSE, lambda = 1, subtopoX = NULL, ratio = TRUE, logtype = 2, domean = TRUE, modulesize = 5, compress = FALSE, increase = TRUE, fpfn = c(0.1, 0.1), Rho = NULL, ksel = c("kmeans", "silhouette", "cor") )
D |
data with cells indexing the columns and features (E-genes) indexing the rows |
inference |
inference method "em" for expectation maximization |
search |
search method for single network inference "greedy", "exhaustive" or "modules" (also possible: "small", which is greedy with only one edge change per M-step to make for a smooth convergence) |
phi |
a list of n lists of k networks for n starts of the EM and k components |
theta |
a list of n lists of k attachment vector for the E-genes for n starts of the EM and k components |
mw |
mixture weights; if NULL estimated or uniform |
method |
"llr" for log ratios or foldchanges as input (see ratio) |
parallel |
number of threads for parallelization of the number of em runs |
reduce |
logical - reduce search space for exhaustive search to unique networks |
runs |
number of runs for greedy search |
starts |
number of starts for the em |
type |
initialize with responsibilities either by "random", "cluster" (each S-gene is clustered and the different S-gene clustered differently combined for several starts), "cluster2" (clustNEM is used to infer reasonable phis, which are then used as a start for one EM run), "cluster3" (global clustering as a start), or "networks" (initialize with random phis) |
complete |
if TRUE, optimizes the expected complete log likelihood of the model, otherwise the log likelihood of the observed data |
p |
initial probabilities as a k (components) times l (cells) matrix |
k |
number of components |
kmax |
maximum number of components when k=NULL is inferred |
verbose |
verbose output |
max_iter |
maximum iteration, if likelihood does not converge |
parallel2 |
if parallel=NULL, number of threads for single component optimization |
converged |
absolute distance for convergence between new and old log likelihood; if set to -Inf, the EM stops if neither the phis nor thetas were changed in the most recent iteration |
redSpace |
space for "exhaustive" search |
affinity |
0 is default for soft clustering, 1 is for hard clustering |
evolution |
logical. If TRUE components are penelized for being different from each other. |
lambda |
smoothness value for the prior put on the components, if evolution set to TRUE |
subtopoX |
hard prior on theta as a vector with entry i equal to j, if E-gene i is attached to S-gene j |
ratio |
logical, if true data is log ratios, if false foldchanges |
logtype |
logarithm type of the data (e.g. 2 for log2 data or exp(1) for natural) |
domean |
average the data, when calculating a single NEM (speed improvment) |
modulesize |
max number of S-genes per module in module search |
compress |
compress networks after search (warning: penelized likelihood not interpretable) |
increase |
if set to FALSE, the algorithm will not stop if the likelihood decreases |
fpfn |
numeric vector of length two with false positive and false negative rates for discrete data |
Rho |
perturbation matrix with dimensions nxl with n S-genes and l samples; either as probabilities with the sum of probabilities for a sample less or equal to 1 or discrete with 1s and 0s |
ksel |
character vector of methods for the inference of k; can combine "hc" (hierarchical clustering) or "kmeans" with "silhouette", "BIC" or "AIC"; can also include "cor" for correlation distance (preferred) instead of euclidean |
object of class mnem
comp |
list of the component with each component being a list of the causal network phi and the E-gene attachment theta |
data |
input data matrix |
limits |
list of results for all indpendent searches |
ll |
log likelihood of the best model |
lls |
log likelihood ascent of the best model search |
mw |
vector with mixture weights |
probs |
kxl matrix containing the cell log likelihoods of the model |
Martin Pirkl
sim <- simData(Sgenes = 3, Egenes = 2, Nems = 2, mw = c(0.4,0.6)) data <- (sim$data - 0.5)/0.5 data <- data + rnorm(length(data), 0, 1) result <- mnem(data, k = 2, starts = 1)