This vignette covers an introduction to network-regularized generalized linear regression models (GLMs). Network-regularization makes use of graphs to model interactions of covariables and responses in generalized linear regression models and penalizes the coefficients in GLMs using this information.
edgenet
uses a \((p \times p)\)-dimensional affinity matrix \(\mathbf{G}_X \in \mathbb{R}_+^{p \times p}\) to model the interaction of \(p\)
covariables in \(\mathbf{X} \in \mathbb{R}^{n \times p}\) and analogously a matrix \(\mathbf{G}_Y \in \mathbb{R}_+^{q \times q}\)
to model interactions of \(q\) response variables in \(\mathbf{Y} \in \mathbb{R}^{n \times q}\). The affinity matrices
are used for regularization of regression coefficients like this:
\[\begin{equation} \small \begin{split} \hat{\mathbf{B}} = \arg \min_{\mathbf{B}} \, & -\ell(\mathbf{B}) + \lambda ||\mathbf{B} ||_1 \\ & + \frac{\psi_1}{2} \sum_{i=1}^p \sum_{j=1}^p || \boldsymbol \beta_{i,*} - \boldsymbol \beta_{j,*} ||_2^2 (\mathbf{G}_X)_{i,j} \\ & + \frac{\psi_2}{2} \sum_{i=1}^q \sum_{j=1}^q || \boldsymbol \beta_{*,i} - \boldsymbol \beta_{*,j} ||_2^2 (\mathbf{G}_Y)_{i,j} \\ = \arg \min_{\mathbf{B}} \, & -\ell(\mathbf{B}) + \lambda ||\mathbf{B} ||_1 \\ & + \psi_1 \text{tr}\left( \mathbf{B}^T \left( \mathbf{D}_{\mathbf{G}_X} - \mathbf{G}_X \right) \mathbf{B} \right) \\ & + \psi_2 \text{tr}\left( \mathbf{B} \left( \mathbf{D}_{\mathbf{G}_Y} - \mathbf{G}_Y \right) \mathbf{B}^T \right) \end{split} \label{equ:graphreg} \end{equation}\]
Matrix \(\mathbf{B} \in \mathbb{R}^{(p + 1) \times q}\) is the matrix of regression coefficients to be estimated (including an intercept). Vectors \(\boldsymbol \beta_{i, *}\) and \(\boldsymbol \beta_{*, i}\) are the \(i\)-th row or column of \(\mathbf{B}\), respectively. Shrinkage parameters \(\lambda\), \(\psi_1\) and \(\psi_2\) are fixed or need to be estimated (e.g., using cross-validation). The matrices \(\mathbf{D}_{\mathbf{G}} - \mathbf{G}\) are the combinatorial (or normalized) graph Laplacians of an affinity matrix \(\mathbf{G}\) (Chung and Graham 1997).
TODO (Yuan and Lin 2006) and (Meier, Van De Geer, and Bühlmann 2008)
So far netReg
supports the following exponential family random variables:
The log-likelihood function \(\ell(\mathbf{B})\) over \(q\) response variables is defined as:
\[\begin{equation} \small \ell(\mathbf{B}) = \sum_{j}^q \sum_i^n \log \ \mathbb{P}\left({y}_{i, j} \mid h\left(\mathbf{x}_{i,*} \cdot \beta_{*,j}\right), \phi \right) \end{equation}\]
where \(h\) is the inverse of a link function, such as the logarithm, and \(\phi\) is a dispersion parameter.
The following section shows how to use network regularization models. We shall use edgenet
, but the calls for the other methods are analogous and examples can be found in the method documentation. We first load some libraries:
library(tensorflow)
library(tfprobability)
library(netReg)
## Warning: Package 'netReg' is deprecated and will be removed from Bioconductor
## version 3.13
##
## Attaching package: 'netReg'
## The following objects are masked from 'package:stats':
##
## binomial, family, gaussian, inverse.gaussian, poisson
## The following objects are masked from 'package:base':
##
## beta, gamma
Set some parameters and create affinity matrices:
# parameters
n <- 100
p <- 10
q <- 10
# affinity matrices
G.X <- abs(rWishart(1, 10, diag(p))[,,1])
G.Y <- abs(rWishart(1, 10, diag(q))[,,1])
We created the affinity matrix absolutely random, although in practice a real (biological) observed affinity matrix should be used, because in the end the affinity matrix decides the shrinkage of the coefficients.
The actual fit is straightforward. We create Gaussian data first and then fit the model:
# data
X <- matrix(rnorm(n * p), n)
B <- matrix(rnorm(p * q), p)
Y <- X %*% B + matrix(rnorm(n * q, 0, 0.1), n)
fit <- edgenet(X=X, Y=Y, G.X=G.X, G.Y=G.Y, family=gaussian, maxit=10)
summary(fit)
## 'gaussian.edgenet' object
##
## call:
## edgenet(X = X, Y = Y, G.X = G.X, G.Y = G.Y, maxit = 10, family = gaussian)
##
## parameters:
## lambda psigx psigy
## 1 1 1
##
## family: gaussian
## link: identity
##
## -> call coef(fit) for coefficients
The fit
object contains information about coefficients, intercepts etc.
coef(fit)[,1:5]
## y[1] y[2] y[3] y[4] y[5]
## (Intercept) 0.9002230 0.9005039 0.9003168 0.9004461 0.9002941
## x[1] 0.9001085 0.9001319 0.8999682 0.9000759 0.9001001
## x[2] 0.8999267 0.9000134 0.8997293 0.8999051 0.8999385
## x[3] 0.9000749 0.9001032 0.8999503 0.9000534 0.9000960
## x[4] 0.9001007 0.9000886 0.8996047 0.9000757 0.9000938
## x[5] 0.9000545 0.9001142 0.8999050 0.9000709 0.9000744
## x[6] 0.9001465 0.9001626 0.9000795 0.9001029 0.9002031
## x[7] 0.9001109 0.9001762 0.9000571 0.9001200 0.9001300
## x[8] 0.9000897 0.9001146 0.9000235 0.9001106 0.9001170
## x[9] 0.9000339 0.9000850 0.8998514 0.9000128 0.9000368
## x[10] 0.9000951 0.9001490 0.9000207 0.9001081 0.9001137
Having the coefficients estimated we are able to predict novel data-sets:
pred <- predict(fit, X)
pred[1:5, 1:5]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.0754258 1.0758017 1.0755811 1.0756590 1.0754980
## [2,] 1.3414276 1.3416002 1.3409907 1.3415776 1.3414993
## [3,] 6.0946679 6.0948892 6.0928457 6.0946259 6.0947814
## [4,] 1.0827088 1.0831625 1.0826976 1.0829180 1.0827223
## [5,] -0.4983141 -0.4978667 -0.4969878 -0.4980346 -0.4982316
The binomial case is the same only with a different family
argument:
# data
X <- matrix(rnorm(n * p), n)
B <- matrix(rnorm(p * q), p)
eta <- 1 / (1 + exp(-X %*% B))
Y.binom <- do.call("cbind", lapply(seq(10), function(.) rbinom(n, 1, eta[,.])))
fit <- edgenet(X=X, Y=Y, G.X=G.X, G.Y=G.Y, family=binomial, maxit=10)
summary(fit)
## 'binomial.edgenet' object
##
## call:
## edgenet(X = X, Y = Y, G.X = G.X, G.Y = G.Y, maxit = 10, family = binomial)
##
## parameters:
## lambda psigx psigy
## 1 1 1
##
## family: binomial
## link: logit
##
## -> call coef(fit) for coefficients
The Poisson case of course works analogously:
# data
X <- matrix(rnorm(n * p), n)
B <- matrix(rnorm(p * q), p)
eta <- exp(-X %*% B)
Y.pois <- do.call("cbind", lapply(seq(10), function(.) rpois(n, eta[,.])))
fit <- edgenet(X=X, Y=Y.pois, G.X=G.X, G.Y=G.Y, family=poisson, maxit=10)
summary(fit)
## 'poisson.edgenet' object
##
## call:
## edgenet(X = X, Y = Y.pois, G.X = G.X, G.Y = G.Y, maxit = 10,
## family = poisson)
##
## parameters:
## lambda psigx psigy
## 1 1 1
##
## family: poisson
## link: log
##
## -> call coef(fit) for coefficients
In most cases we do not have the optimal shrinkage parameters \(\lambda\),
\(\psi_{1}\) and \(\psi_{2}\). For these settings you can use netReg
’s
included model selection. We use Powell’s BOBYQA algorithm
(Powell 2009) for gradient-free optimization in a coss-validation framework.
Doing the model selection only requires calling cv.edgenet
:
cv <- cv.edgenet(X=X, Y=Y, G.X=G.Y, G.Y, family=gaussian, optim.maxit=10, maxit=10)
summary(cv)
## 'gaussian.cv.edgenet' object
##
## call:
## cv.edgenet(X = X, Y = Y, G.X = G.Y, G.Y = G.Y, maxit = 10, family = gaussian,
## optim.maxit = 10)
##
## optimal parameters:
## lambda (estimated) psigx (estimated) psigy (estimated)
## 0.1474264 0.1697474 0.1000000
##
## family: gaussian
## link: identity
##
## -> call coef(cv) for coefficients
The cross-validation and BOBYQA is quite computationally intensive, so we set optim.maxit
to \(10\).
I recommend using TensorFlow
on a GPU for that, since edgenet
needs to compute many matrix multiplications.
The cv.edgenet
object also contains a fit with the optimal parameters.
summary(cv$fit)
## 'gaussian.edgenet' object
##
## call:
## edgenet(X = X, Y = Y, G.X = G.X, G.Y = G.Y, lambda = ret$lambda,
## psigx = ret$psigx, psigy = ret$psigy, thresh = thresh, maxit = maxit,
## learning.rate = learning.rate, family = family)
##
## parameters:
## lambda psigx psigy
## 0.1474264 0.1697474 0.1000000
##
## family: gaussian
## link: identity
##
## -> call coef(cv$fit) for coefficients
You can obtain the coefficients like this:
coef(cv)[,1:5]
## y[1] y[2] y[3] y[4] y[5]
## (Intercept) 0.9001945 0.9006118 0.9003052 0.9005013 0.9002763
## x[1] 0.9001585 0.9001997 0.9001080 0.9001906 0.9001827
## x[2] 0.9002181 0.9002358 0.9001611 0.9002684 0.9001958
## x[3] 0.9001448 0.9000999 0.9003048 0.9001292 0.9001681
## x[4] 0.9002277 0.9001873 0.9002112 0.9002714 0.9001873
## x[5] 0.9001690 0.9001964 0.9002044 0.9001927 0.9002346
## x[6] 0.9001558 0.9001593 0.9001416 0.9001563 0.9001610
## x[7] 0.9002012 0.9001892 0.9001433 0.9001558 0.9001992
## x[8] 0.9001633 0.9002054 0.9001842 0.9002528 0.9001706
## x[9] 0.9002589 0.9002165 0.9001809 0.9001721 0.9002059
## x[10] 0.9001919 0.9002664 0.9002001 0.9002475 0.9001894
Furthermore, you can also directly predict using a cv.edgenet
object:
pred <- predict(cv, X)
pred[1:5, 1:5]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 5.9169331 5.917364 5.9172899 5.9173329 5.9171036
## [2,] 4.8573529 4.857692 4.8576471 4.8578522 4.8574471
## [3,] -3.6402032 -3.639846 -3.6401756 -3.6401373 -3.6401391
## [4,] -0.3251633 -0.324793 -0.3248117 -0.3249424 -0.3249141
## [5,] -1.8490140 -1.848673 -1.8488010 -1.8487738 -1.8488228
This section explains how to fit a linear model and do parameter estimation.
At first we load the library and some data:
library(netReg)
data("yeast")
ls(yeast)
X <- yeast$X
Y <- yeast$Y
G.Y <- yeast$GY
The yeast data \(\mathbf{X}\) and \(\mathbf{Y}\) set is taken and adopted from (Brem et al. 2005), (Storey, Akey, and Kruglyak 2005), and (Cheng et al. 2014). \(\mathbf{GY}\) is taken from BioGRID.
\(\mathbf{X}\) is a \((n \times p)\) matrix of genetic markers where \(n\) is the number of samples (112) and \(p\) is the number of markers. \(\mathbf{Y}\) is a \((n \times q)\) matrix of expression values for \(q\) yeast genes. \(n\) is again the numer of samples (112). \(\mathbf{GY}\) is a \((q \times q)\) adjacency matrix representing protein-protein interactions for the \(q\) response variables.
We only use a smaller network in order to be able to print the results here.
fit <- edgenet(X=X, Y=Y, G.Y=G.Y, lambda=5, family=gaussian, maxit=10, thresh=1e-3)
summary(fit)
## 'gaussian.edgenet' object
##
## call:
## edgenet(X = X, Y = Y, G.Y = G.Y, lambda = 5, thresh = 0.001,
## maxit = 10, family = gaussian)
##
## parameters:
## lambda psigx psigy
## 5 0 1
##
## family: gaussian
## link: identity
##
## -> call coef(fit) for coefficients
For the response variables we use an affinity matrix that represents biological relationships with \(\mathbf{GY}\). We promote sparsity by setting \(\lambda = 5\) and put a weak (default) penalty on similar coefficients by setting \(\psi_{gy} = 1\). Other than that we used standard parameters in this case.
The fit
object contains information about coefficients and intercepts. Having the coefficients estimated we are able to
predict novel data-sets:
X.new <- matrix(rnorm(10 * ncol(X)), 10)
pred <- predict(fit, X.new)
pred[1:10, 1:5]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 4.2008971 4.2013672 4.2009115 4.2011951 4.2009904
## [2,] -0.2441918 -0.2437913 -0.2440459 -0.2438810 -0.2441129
## [3,] -0.9478844 -0.9474961 -0.9477214 -0.9475729 -0.9478075
## [4,] 0.6630633 0.6634801 0.6631809 0.6633710 0.6631440
## [5,] 2.7012859 2.7017320 2.7013431 2.7015877 2.7013746
## [6,] -3.1042542 -3.1038992 -3.1040256 -3.1039374 -3.1041863
## [7,] -0.7581015 -0.7577092 -0.7579414 -0.7577913 -0.7580262
## [8,] 8.5471919 8.5477266 8.5470778 8.5474799 8.5473005
## [9,] -5.8865323 -5.8862183 -5.8862246 -5.8862092 -5.8864744
## [10,] 1.3180825 1.3185045 1.3181819 1.3183872 1.3181655
Brem, Rachel B, John D Storey, Jacqueline Whittle, and Leonid Kruglyak. 2005. “Genetic Interactions Between Polymorphisms That Affect Gene Expression in Yeast.” Nature 436 (7051): 701.
Cheng, Wei, Xiang Zhang, Zhishan Guo, Yu Shi, and Wei Wang. 2014. “Graph-Regularized Dual Lasso for Robust eQTL Mapping.” Bioinformatics 30 (12): i139–i148.
Chung, Fan RK, and Fan Chung Graham. 1997. Spectral Graph Theory. 92. American Mathematical Soc.
Meier, Lukas, Sara Van De Geer, and Peter Bühlmann. 2008. “The Group Lasso for Logistic Regression.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70 (1): 53–71.
Powell, Michael JD. 2009. “The Bobyqa Algorithm for Bound Constrained Optimization Without Derivatives.” Cambridge NA Report NA2009/06, University of Cambridge, Cambridge.
Storey, John D, Joshua M Akey, and Leonid Kruglyak. 2005. “Multiple Locus Linkage Analysis of Genomewide Expression in Yeast.” PLoS Biology 3 (8): e267.
Yuan, Ming, and Yi Lin. 2006. “Model Selection and Estimation in Regression with Grouped Variables.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1): 49–67.