glm_gp_impl {glmGamPoi}R Documentation

Internal Function to Fit a Gamma-Poisson GLM

Description

Internal Function to Fit a Gamma-Poisson GLM

Usage

glm_gp_impl(
  Y,
  model_matrix,
  offset = 0,
  size_factors = TRUE,
  overdispersion = TRUE,
  do_cox_reid_adjustment = TRUE,
  subsample = FALSE,
  verbose = FALSE
)

Arguments

Y

any matrix-like object (e.g. matrix(), DelayedArray(), HDF5Matrix()) with one column per sample and row per gene.

model_matrix

a numeric matrix that specifies the experimental design. It can be produced using stats::model.matrix(). Default: matrix(1, nrow = length(y), ncol = 1), which is the model matrix for a 'just-intercept-model'.

offset

Constant offset in the model in addition to log(size_factors). It can either be a single number, a vector of length ncol(data) or a matrix with the same dimensions as dim(data). Note that if data is a DelayedArray or HDF5Matrix, offset must be as well. Default: 0.

size_factors

in large scale experiments, each sample is typically of different size (for example different sequencing depths). A size factor is an internal mechanism of GLMs to correct for this effect.
size_factors can either be a single boolean that indicates if the size factor for each sample should be calculated. Or it is a numeric vector that specifies the size factor for each sample. Note that size_factors = 1 and size_factors = FALSE are equivalent. Default: TRUE.

overdispersion

the simplest count model is the Poisson model. However, the Poisson model assumes that variance = mean. For many applications this is too rigid and the Gamma-Poisson allows a more flexible mean-variance relation (variance = mean + mean^2 * overdispersion).
overdispersion can either be a single boolean that indicates if an overdispersion is estimated for each gene. Or it can be a numeric vector of length nrow(data). Note that overdispersion = 0 and overdispersion = FALSE are equivalent and both reduce the Gamma-Poisson to the classical Poisson model. Default: TRUE.

do_cox_reid_adjustment

the classical maximum likelihood estimator of the overdisperion is biased towards small values. McCarthy et al. (2012) showed that it is preferable to optimize the Cox-Reid adjusted profile likelihood.
do_cox_reid_adjustment can be either be TRUE or FALSE to indicate if the adjustment is added during the optimization of the overdispersion parameter. Default: TRUE.

subsample

the estimation of the overdispersion is the slowest step when fitting a Gamma-Poisson GLM. For datasets with many samples, the estimation can be considerably sped up without loosing much precision by fitting the overdispersion only on a random subset of the samples. Default: FALSE which means that the data is not subsampled. If set to TRUE, at most 1,000 samples are considered. Otherwise the parameter just specifies the number of samples that are considered for each gene to estimate the overdispersion.

verbose

a boolean that indicates if information about the individual steps are printed while fitting the GLM. Default: FALSE.

Value

a list with four elements

See Also

glm_gp() and gampoi_overdispersion_mle()


[Package glmGamPoi version 1.0.0 Index]