gampoi_overdispersion_mle {glmGamPoi}R Documentation

Estimate the Overdispersion for a Vector of Counts

Description

Estimate the Overdispersion for a Vector of Counts

Usage

gampoi_overdispersion_mle(
  y,
  mean = base::mean(y),
  model_matrix = matrix(1, nrow = length(y), ncol = 1),
  do_cox_reid_adjustment = TRUE,
  subsample = FALSE,
  verbose = FALSE
)

Arguments

y

a numeric or integer vector with the counts for which the overdispersion is estimated

mean

a numeric vector of either length 1 or length(y) with the predicted value for that sample. Default: mean(y).

model_matrix

a numeric matrix that specifies the experimental design. It can be produced using stats::model.matrix(). Default: matrix(1, nrow = length(y), ncol = 1), which is the model matrix for a 'just-intercept-model'.

do_cox_reid_adjustment

the classical maximum likelihood estimator of the overdisperion is biased towards small values. McCarthy et al. (2012) showed that it is preferable to optimize the Cox-Reid adjusted profile likelihood.
do_cox_reid_adjustment can be either be TRUE or FALSE to indicate if the adjustment is added during the optimization of the overdispersion parameter. Default: TRUE.

subsample

the estimation of the overdispersion is the slowest step when fitting a Gamma-Poisson GLM. For datasets with many samples, the estimation can be considerably sped up without loosing much precision by fitting the overdispersion only on a random subset of the samples. Default: FALSE which means that the data is not subsampled. If set to TRUE, at most 1,000 samples are considered. Otherwise the parameter just specifies the number of samples that are considered for each gene to estimate the overdispersion.

verbose

a boolean that indicates if information about the individual steps are printed while fitting the GLM. Default: FALSE.

Details

The function employs a rough heuristic to decide if the iterative or the Bandara approach is used to calculate the overdispersion. If max(y) < length(y) Bandara's approach is used, otherwise the conventional one is used.

Value

The function returs a list with the following elements:

estimate

the numerical estimate of the overdispersion.

iterations

the number of iterations it took to calculate the result.

method

the method that was used to calculate the overdispersion: either "conventional" or "bandara".

message

additional information about the fitting process.

See Also

glm_gp()

Examples

 set.seed(1)
 # true overdispersion = 2.4
 y <- rnbinom(n = 10, mu = 3, size = 1/2.4)
 # estimate = 1.7
 gampoi_overdispersion_mle(y)


 # true overdispersion = 0
 y <- rpois(n = 10, lambda = 3)
 # estimate = 0
 gampoi_overdispersion_mle(y)
 # with different mu, overdispersion estimate changes
 gampoi_overdispersion_mle(y, mean = 15)
 # Cox-Reid adjustment changes the result
 gampoi_overdispersion_mle(y, mean = 15, do_cox_reid_adjustment = FALSE)


 # Many very small counts, true overdispersion = 50
 y <- rnbinom(n = 1000, mu = 0.01, size = 1/50)
 summary(y)
 # estimate = 31
 gampoi_overdispersion_mle(y)


[Package glmGamPoi version 1.0.0 Index]