evaluateK {tradeSeq}R Documentation

Evaluate the optimal number of knots required for fitGAM.

Description

Evaluate the optimal number of knots required for fitGAM.

Evaluate an appropriate number of knots.

Usage

evaluateK(counts, ...)

## S4 method for signature 'matrix'
evaluateK(
  counts,
  k = 3:10,
  nGenes = 500,
  sds = NULL,
  pseudotime = NULL,
  cellWeights = NULL,
  plot = TRUE,
  U = NULL,
  weights = NULL,
  offset = NULL,
  aicDiff = 2,
  verbose = TRUE,
  control = mgcv::gam.control(),
  sce = FALSE,
  family = "nb",
  gcv = FALSE,
  ...
)

## S4 method for signature 'dgCMatrix'
evaluateK(
  counts,
  k = 3:10,
  nGenes = 500,
  sds = NULL,
  pseudotime = NULL,
  cellWeights = NULL,
  plot = TRUE,
  U = NULL,
  weights = NULL,
  offset = NULL,
  aicDiff = 2,
  verbose = TRUE,
  control = mgcv::gam.control(),
  sce = FALSE,
  family = "nb",
  gcv = FALSE,
  ...
)

## S4 method for signature 'SingleCellExperiment'
evaluateK(
  counts,
  k = 3:10,
  nGenes = 500,
  sds = NULL,
  pseudotime = NULL,
  cellWeights = NULL,
  plot = TRUE,
  U = NULL,
  weights = NULL,
  offset = NULL,
  aicDiff = 2,
  verbose = TRUE,
  control = mgcv::gam.control(),
  sce = FALSE,
  family = "nb",
  gcv = FALSE,
  ...
)

## S4 method for signature 'CellDataSet'
evaluateK(
  counts,
  k = 3:10,
  nGenes = 500,
  sds = NULL,
  pseudotime = NULL,
  cellWeights = NULL,
  plot = TRUE,
  U = NULL,
  weights = NULL,
  offset = NULL,
  aicDiff = 2,
  verbose = TRUE,
  control = mgcv::gam.control(),
  sce = FALSE,
  family = "nb",
  gcv = FALSE,
  ...
)

Arguments

counts

The count matrix, genes in rows and cells in columns.

...

parameters including:

k

The range of knots to evaluate. '3:10' by default.

nGenes

The number of genes to use in the evaluation. Genes will be randomly selected. 500 by default.

sds

Slingshot object containing the lineages.

pseudotime

a matrix of pseudotime values, each row represents a cell and each column represents a lineage.

cellWeights

a matrix of cell weights defining the probability that a cell belongs to a particular lineage. Each row represents a cell and each column represents a lineage.

plot

Whether to display diagnostic plots. Default to TRUE.

U

The design matrix of fixed effects. The design matrix should not contain an intercept to ensure identifiability.

weights

Optional: a matrix of weights with identical dimensions as the counts matrix. Usually a matrix of zero-inflation weights.

offset

Optional: the offset, on log-scale. If NULL, TMM is used to account for differences in sequencing depth, see fitGAM.

aicDiff

Used for selecting genes with significantly varying AIC values over the range of evaluated knots to make the barplot output. Default is set to 2, meaning that only genes whose AIC range is larger than 2 will be used to check for the optimal number of knots through the barplot visualization that is part of the output of this function.

verbose

logical, should progress be verbose?

control

Control object for GAM fitting, see mgcv::gam.control().

sce

Logical, should a SingleCellExperiment object be returned?

family

The distribution assumed, currently only "nb" (negative binomial) is supported.

gcv

(In development). Logical, should a GCV score also be returned?

Value

A plot of average AIC value over the range of selected knots, and a matrix of AIC and GCV values for the selected genes (rows) and the range of knots (columns).

Examples

## This is an artifical example, please check the vignette for a realistic one.
set.seed(8)
data(sds, package="tradeSeq")
loadings <- matrix(runif(2000*2,-2,2), nrow=2, ncol=2000)
counts <- round(abs(t(slingshot::reducedDim(sds) %*% loadings)))+100
aicK <- evaluateK(counts = counts, sds=sds,
                  nGenes=100, k=3:5, verbose=FALSE)

[Package tradeSeq version 1.3.13 Index]