scPCA {scPCA}R Documentation

Sparse Constrastive Principal Component Analysis

Description

Given target and background dataframes or matrices, scPCA will perform the sparse contrastive principal component analysis (scPCA) of the target data for a given number of eigenvectors, a vector of real valued contrast parameters and a vector of penalty terms. For more information on the contrastive PCA method, consult Abid A, Zhang MJ, Bagaria VK, Zou J (2018). “Exploring patterns enriched in a dataset with contrastive principal component analysis.” Nature communications, 9(1), 2134.. Sparse PCA is performed via the method of Zou H, Hastie T, Tibshirani R (2006). “Sparse principal component analysis.” Journal of computational and graphical statistics, 15(2), 265–286..

Usage

scPCA(target, background, center = TRUE, scale = FALSE, n_eigen = 2,
  cv = NULL, contrasts = exp(seq(log(0.1), log(1000), length.out =
  40)), penalties = seq(0.05, 1, length.out = 20),
  clust_method = c("kmeans", "pam"), n_centers, max_iter = 10,
  n_medoids = 8, parallel = FALSE)

Arguments

target

The target (experimental) data set, in a standard format such as a data.frame or matrix.

background

The background data set, in a standard format such as a data.frame or matrix. Note that the number of features must match the number of features in the target data.

center

A logical indicating whether the target and background data sets should be centered to mean zero.

scale

A logical indicating whether the target and background data sets should be scaled to unit variance.

n_eigen

A numeric indicating the number of eigenvectors (or sparse contrastive components) to be computed. The default is to compute two such eigenvectors.

cv

A numeric indicating the number of cross-validation folds to use in choosing the optimal contrastive and penalization parameters from over the grids of contrasts and penalties. Cross-validation is expected to improve the robustness and generalization of the choice of these parameters; however, it increases the time the procedure costs, thus, the default is NULL, corresponding to no cross-validation.

contrasts

A numeric vector of the contrastive parameters. Each element must be a unique non-negative real number. The default is to use 40 logarithmically spaced values between 0.1 and 1000.

penalties

A numeric vector of the L1 penalty terms on the loadings. The default is to use 20 equidistant values between 0.05 and 1.

clust_method

A character specifying the clustering method to use for choosing the optimal constrastive parameter. Currently, this is limited to either k-means or partitioning around medoids (PAM). The default is k-means clustering.

n_centers

A numeric giving the number of centers to use in the clustering algorithm. If set to 1, cPCA, as first proposed by Abid et al., is performed, regardless of what the penalties argument is set to.

max_iter

A numeric giving the maximum number of iterations to be used in k-means clustering, defaulting to 10.

n_medoids

A numeric indicating the number of medoids to consider if n_centers is set to 1. The default is 8 such medoids.

parallel

A logical indicating whether to invoke parallel processing via the BiocParallel infrastructure. The default is FALSE for sequential evaluation.

Value

A list containing the following components:

Examples

# perform cPCA on the simulated data set
scPCA(
  target = toy_df[, 1:30],
  background = background_df,
  contrasts = exp(seq(log(0.1), log(100), length.out = 5)),
  penalties = 0,
  n_centers = 4
)

# perform scPCA on the simulated data set
scPCA(
  target = toy_df[, 1:30],
  background = background_df,
  contrasts = exp(seq(log(0.1), log(100), length.out = 5)),
  penalties = seq(0.1, 1, length.out = 3),
  n_centers = 4
)

# cPCA as implemented in Abid et al.
scPCA(
  target = toy_df[, 1:30],
  background = background_df,
  contrasts = exp(seq(log(0.1), log(100), length.out = 10)),
  penalties = 0,
  n_centers = 1
)

[Package scPCA version 1.0.0 Index]