A Quick Start of cola Package

Zuguang Gu ( z.gu@dkfz.de )

2020-10-27

Assume your matrix is stored in an object called mat, to perform consensus partitioning with cola, you only need to run following code:

# code only for demonstration
mat = adjust_matrix(mat)  # optional
rl = run_all_consensus_partition_methods(mat, mc.cores = ...)
cola_report(rl, output_dir = ..., mc.cores = ...)

In above code, there are three steps:

Adjust the matrix. In this step, rows with too many NAs are removed. Rows with very low variance are removed. NA values are imputed if there are less than 50% in each row. Outliers are adjusted in each row.
Run consensus partitioning with several methods. Partitioning methods are hclust (hierarchical clustering with cutree), kmeans (k-means clustering), skmeans::skmeans (spherical k-means clustering), cluster::pam (partitioning around medoids) and Mclust::mclust (model-based clustering). The default methods to extract top n rows are SD (standard deviation), CV (coefficient of variation), MAD (median absolute deviation) and ATC (ability to correlate to other rows).
Generate a detailed HTML report for the complete analysis.

run_all_consensus_partition_methods() runs multiple methods in sequence, which might take long time for big datasets. Users can also run consensus partitioining with a specific top-value methods (e.g. SD) and partitioning methods (e.g. skmeans) by consensus_partition() function:

res = consensus_partition(mat, top_value_method = ..., partition_method = ...)
cola_report(res, output_dir = ..., mc.cores = ...)

For extremely large datasets, users can run consensus_partition_by_down_sampling() by randomly sampling a subset of samples for classification, later the classes of the remaining samples are predicted by the signatures of the cola classification. More details can be found in the vignette “Work with Big Datasets”.

res = consensus_partition_by_down_sampling(mat, subset = ...,
    top_value_method = ..., partition_method = ...)
cola_report(res, output_dir = ..., mc.cores = ...)

There are examples on real datasets for cola analysis that can be found at https://jokergoo.github.io/cola_collection/.