multisplit {hierGWAS} | R Documentation |
Performs repeated variable selection via the lasso on random sample splits.
multisplit(x, y, covar = NULL, B = 50)
x |
The SNP data matrix, of size |
y |
The response vector. It can be continuous or discrete. |
covar |
NULL or the matrix of covariates one wishes to control for, of
size |
B |
The number of random splits. Default value is 50. |
The samples are divided into two random splits of approximately
equal size. The first subsample is used for variable selection, which is
implemented using glmnet. The first [nobs/6]
variables
which enter the lasso path are selected. The procedure is repeated B
times.
If one or more covariates are specified, these will be added unpenalized to the regression.
A data frame with 2 components. A matrix of size B x [nobs/2]
containing the second subsample of each split, and a matrix of size
B x [nobs/6]
containing the selected variables in each split.
Meinshausen, N., Meier, L. and Buhlmann, P. (2009), P-values for high-dimensional regression, Journal of the American Statistical Association 104, 1671-1681.
library(MASS) x <- mvrnorm(60,mu = rep(0,200), Sigma = diag(200)) beta <- rep(1,200) beta[c(5,9,3)] <- 3 y <- x %*% beta + rnorm(60) res.multisplit <- multisplit(x, y)