seqGLMM_GxG_spa {SAIGEgds} | R Documentation |
SNP interaction testing with Saddlepoint approximation method in the mixed framework.
seqGLMM_GxG_spa(formula, data, gds_grm, gds_assoc, snp_pair, trait.type=c("binary", "quantitative"), sample.col="sample.id", maf=0.005, missing.rate=0.01, max.num.snp=1000000L, variant.id=NULL, inv.norm=TRUE, X.transform=TRUE, tol=0.02, maxiter=20L, nrun=30L, tolPCG=1e-5, maxiterPCG=500L, tau.init=c(0,0), use_approx_tau=FALSE, glm_threshold=FALSE, traceCVcutoff=0.0025, ratioCVcutoff=0.001, geno.sparse=TRUE, num.thread=1L, model.savefn="", seed=200L, fork.loading=FALSE, verbose=TRUE, verbose.detail=TRUE)
formula |
an object of class |
data |
a data frame for the formulas |
gds_grm |
a SeqArray GDS filename, or a GDS object |
gds_assoc |
a SeqArray GDS filename, a GDS object, or a 0/1/2/NA matrix with row names for sample IDs |
snp_pair |
a |
trait.type |
"binary" for binary outcomes, "quantitative" for continuous outcomes |
sample.col |
the column name of sample IDs corresponding to the GDS file |
maf |
minor allele frequency for imported genotypes (checking >= maf),
if |
missing.rate |
threshold of missing rate (checking <= missing.rate),
if |
max.num.snp |
the maximum number of SNPs used, or -1 for no limit |
variant.id |
a list of variant IDs, used to construct GRM |
inv.norm |
if |
X.transform |
if |
tol |
overall tolerance for model fitting |
maxiter |
the maximum number of iterations for model fitting |
nrun |
the number of random vectors in the trace estimation |
tolPCG |
tolerance of PCG iterations |
maxiterPCG |
the maximum number of PCG iterations |
tau.init |
a 2-length numeric vector, the initial values for variance
components, tau; for binary traits, the first element is always be set
to 1. if |
use_approx_tau |
if |
glm_threshold |
FALSE, TRUE or a numeric value for p-value threshold; if TRUE use 0.01 as a threshold |
traceCVcutoff |
the threshold for coefficient of variation (CV) for the trace estimator, and the number of runs for trace estimation will be increased until the CV is below the threshold |
ratioCVcutoff |
the threshold for coefficient of variation (CV) for estimating the variance ratio, and the number of randomly selected markers will be increased until the CV is below the threshold |
geno.sparse |
if |
num.thread |
the number of threads |
model.savefn |
the filename of model output, R data file '.rda', '.RData', '.rds', '.txt' or '.csv' |
seed |
an integer as a seed for random numbers |
fork.loading |
load genotypes via forking or not; forking processes in Unix can reduce loading time of genotypes, but may double the memory usage; not applicable on Windows |
verbose |
if |
verbose.detail |
if |
For more details of SAIGE algorithm, please refer to the SAIGE paper [Zhou et al. 2018] (see the reference section).
Return a data.frame
with the following components:
id1 |
variant ID for the first SNP in the GDS file; |
snp1 |
includes chromosome, position, reference & alterative alleles for SNP1; |
maf1 |
minor allele frequency for the first SNP; |
id2 |
variant ID for the second SNP in the GDS file; |
snp2 |
includes chromosome, position, reference & alterative alleles for SNP2; |
maf2 |
minor allele frequency for the second SNP; |
beta |
beta coefficient, odds ratio if binary outcomes; |
SE |
standard error for beta coefficient; |
n_nonzero |
the number of non-zero values in the interaction term; |
pval |
adjusted p-value with the Saddlepoint approximation method; |
p.norm |
p-values based on asymptotic normality (could be 0 if it
is too small, e.g., |
converged |
whether the SPA algorithm converges or not for adjusted p-values. |
p.glm |
glm p-value with SPA calculation |
p.glm.norm |
glm p-value without SPA calculation |
Xiuwen Zheng
Zhou W, Nielsen JB, Fritsche LG, Dey R, Gabrielsen ME, Wolford BN, LeFaive J, VandeHaar P, Gagliano SA, Gifford A, Bastarache LA, Wei WQ, Denny JC, Lin M, Hveem K, Kang HM, Abecasis GR, Willer CJ, Lee S. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet (2018). Sep;50(9):1335-1341.
seqFitNullGLMM_SPA
, seqAssocGLMM_SPA
# open the GDS file for genetic relationship matrix (GRM) grm_fn <- system.file("extdata", "grm1k_10k_snp.gds", package="SAIGEgds") (grm_gds <- seqOpen(grm_fn)) # load phenotype phenofn <- system.file("extdata", "pheno.txt.gz", package="SAIGEgds") pheno <- read.table(phenofn, header=TRUE, as.is=TRUE) head(pheno) # define the SNP pairs snp_pair <- data.frame(s1=2:3, s2=6:7, note=c("F1", "F2")) seqGLMM_GxG_spa(y ~ x1 + x2, pheno, grm_gds, grm_fn, snp_pair, trait.type="binary", verbose.detail=FALSE)