affiXcanTrain {AffiXcan} | R Documentation |
Train the model needed to impute a GReX for each gene
affiXcanTrain( exprMatrix, assay, tbaPaths, regionAssoc, cov = NULL, varExplained = 80, scale = TRUE, BPPARAM = bpparam(), kfold = 1 )
exprMatrix |
A SummarizedExperiment object containing expression data |
assay |
A string with the name of the object in SummarizedExperiment::assays(exprMatrix) that contains expression values |
tbaPaths |
A vector of strings, which are the paths to MultiAssayExperiment RDS files containing the tba values |
regionAssoc |
A data.frame with the association between regulatory regions and expressed genes and with colnames = c("REGULATORY_REGION", "EXPRESSED_REGION") |
cov |
Optional. A data.frame with covariates values for the population structure where the columns are the PCs and the rows are the individual IIDs Default is NULL |
varExplained |
An integer between 0 and 100; varExplained=80 means that the principal components selected to fit the models must explain at least 80 percent of variation of TBA values; default is 80 |
scale |
A logical; if scale=FALSE the TBA values will be only centered, not scaled before performing PCA; default is TRUE |
BPPARAM |
A BiocParallelParam object. Default is bpparam(). For details on BiocParallelParam virtual base class see browseVignettes("BiocParallel") |
kfold |
An integer. The k definition of k-fold cross-validation on the training dataset. Default is 1. This argument controls the behavior of the function in the following way:
|
The output depends on the parameter kfold
If kfold<2: a list containing three objects: pca, bs, regionsCount
pca: A list containing lists named as the MultiAssayExperiment::experiments() found in the MultiAssayExperiment objects listed in the param tbaPaths. Each of these lists contain two objects:
eigenvectors: A matrix containing eigenvectors for those principal components of the TBA selected according to the param varExplained
pcs: A matrix containing the principal components values of the TBA selected according to the param varExplained
eigenvalues: A vector containing eigenvalues for those principal components of the TBA selected according to the param varExplained
bs: A list containing lists named as the EXPRESSED_REGIONS found in the param regionAssoc that have a correspondent rowname in the expression values stored SummarizedExperiment::assays(exprMatrix)$assay. Each of the lists in bs contains three objects:
coefficients: The coefficients of the principal components used in the model, completely similar to the "coefficients" from the results of lm()
p.val: The uncorrected anova pvalue of the model
r.sq: The coefficient of determination between the real total expression values and the imputed GReX, retrived from summary(model)$r.squared
corrected.p.val: The p-value of the model, corrected for multiple testing with benjamini-hochberg procedure
regionsCount: An integer, that is the number of genomic regions taken into account during the training phase
If kfold>=2: a list containing k-fold objects, named from 1 to kfold and corresponding to the different cross-validations [i]; each one of these objects is a list containing lists named as the expressed gene IDs [y] (i.e. the rownames() of the object in SummarizedExperiment::assays(exprMatrix) containing the expression values), for which a GReX could be imputed. Each of these inner lists contain two objects:
rho: the pearson's correlation coefficient (R) between the real expression values and the imputed GReX for the cross-validation i on the expressed gene y, computed with cor()
rho.sq: the coefficient of determination (R^2) between the real expression values and the imputed GReX for the cross-validation i on the expressed gene y, computed as pearson^2
cor.test.p.val: the p-value of the cor.test() between the real expression values and the imputed GReX for the cross-validation i on the expressed gene y
model.p.val: The uncorrected anova pvalue of the model
model.corrected.p.val: The p-value of the model, corrected for multiple testing with benjamini-hochberg procedure
model.r.sq: the model's coefficient of determination (R^2) on the training data
if(interactive()) { trainingTbaPaths <- system.file("extdata","training.tba.toydata.rds", package="AffiXcan") data(exprMatrix) data(regionAssoc) data(trainingCovariates) assay <- "values" training <- affiXcanTrain(exprMatrix=exprMatrix, assay=assay, tbaPaths=trainingTbaPaths, regionAssoc=regionAssoc, cov=trainingCovariates, varExplained=80, scale=TRUE, kfold=3) }