seqGLMM_GxG_spa {SAIGEgds}R Documentation

SNP Interaction Testing

Description

SNP interaction testing with Saddlepoint approximation method in the mixed framework.

Usage

seqGLMM_GxG_spa(formula, data, gds_grm, gds_assoc, snp_pair,
    trait.type=c("binary", "quantitative"), sample.col="sample.id", maf=0.005,
    missing.rate=0.01, max.num.snp=1000000L, variant.id=NULL, inv.norm=TRUE,
    X.transform=TRUE, tol=0.02, maxiter=20L, nrun=30L, tolPCG=1e-5,
    maxiterPCG=500L, tau.init=c(0,0), use_approx_tau=FALSE, glm_threshold=FALSE,
    traceCVcutoff=0.0025, ratioCVcutoff=0.001, geno.sparse=TRUE, num.thread=1L,
    model.savefn="", seed=200L, fork.loading=FALSE, verbose=TRUE,
    verbose.detail=TRUE)

Arguments

formula

an object of class formula (or one that can be coerced to that class), e.g., y ~ x1 + x2, see lm

data

a data frame for the formulas

gds_grm

a SeqArray GDS filename, or a GDS object

gds_assoc

a SeqArray GDS filename, a GDS object, or a 0/1/2/NA matrix with row names for sample IDs

snp_pair

a data.frame with the first two columns for the variant IDs in gds_assoc

trait.type

"binary" for binary outcomes, "quantitative" for continuous outcomes

sample.col

the column name of sample IDs corresponding to the GDS file

maf

minor allele frequency for imported genotypes (checking >= maf), if variant.id=NULL; NaN for no filter

missing.rate

threshold of missing rate (checking <= missing.rate), if variant.id=NULL; NaN for no filter

max.num.snp

the maximum number of SNPs used, or -1 for no limit

variant.id

a list of variant IDs, used to construct GRM

inv.norm

if TRUE, perform inverse normal transformation on residuals for quantitative outcomes, see the reference [Sofer, 2019]

X.transform

if TRUE, perform QR decomposition on the design matrix

tol

overall tolerance for model fitting

maxiter

the maximum number of iterations for model fitting

nrun

the number of random vectors in the trace estimation

tolPCG

tolerance of PCG iterations

maxiterPCG

the maximum number of PCG iterations

tau.init

a 2-length numeric vector, the initial values for variance components, tau; for binary traits, the first element is always be set to 1. if tau.init is not specified, the second element will be 0.5 for binary traits

use_approx_tau

if TRUE, fit the model defined in formula without any SNP markers for the interactions to provide the estimated tau value (variance component estimates)

glm_threshold

FALSE, TRUE or a numeric value for p-value threshold; if TRUE use 0.01 as a threshold

traceCVcutoff

the threshold for coefficient of variation (CV) for the trace estimator, and the number of runs for trace estimation will be increased until the CV is below the threshold

ratioCVcutoff

the threshold for coefficient of variation (CV) for estimating the variance ratio, and the number of randomly selected markers will be increased until the CV is below the threshold

geno.sparse

if TRUE, store the sparse structure for genotypes; otherwise, save genotypes in a 2-bit dense matrix; see details

num.thread

the number of threads

model.savefn

the filename of model output, R data file '.rda', '.RData', '.rds', '.txt' or '.csv'

seed

an integer as a seed for random numbers

fork.loading

load genotypes via forking or not; forking processes in Unix can reduce loading time of genotypes, but may double the memory usage; not applicable on Windows

verbose

if TRUE, show information

verbose.detail

if TRUE, show the details for model fitting

Details

For more details of SAIGE algorithm, please refer to the SAIGE paper [Zhou et al. 2018] (see the reference section).

Value

Return a data.frame with the following components:

id1

variant ID for the first SNP in the GDS file;

snp1

includes chromosome, position, reference & alterative alleles for SNP1;

maf1

minor allele frequency for the first SNP;

id2

variant ID for the second SNP in the GDS file;

snp2

includes chromosome, position, reference & alterative alleles for SNP2;

maf2

minor allele frequency for the second SNP;

beta

beta coefficient, odds ratio if binary outcomes;

SE

standard error for beta coefficient;

n_nonzero

the number of non-zero values in the interaction term;

pval

adjusted p-value with the Saddlepoint approximation method;

p.norm

p-values based on asymptotic normality (could be 0 if it is too small, e.g., pnorm(-50) = 0 in R; used for checking only

converged

whether the SPA algorithm converges or not for adjusted p-values.

p.glm

glm p-value with SPA calculation

p.glm.norm

glm p-value without SPA calculation

Author(s)

Xiuwen Zheng

References

Zhou W, Nielsen JB, Fritsche LG, Dey R, Gabrielsen ME, Wolford BN, LeFaive J, VandeHaar P, Gagliano SA, Gifford A, Bastarache LA, Wei WQ, Denny JC, Lin M, Hveem K, Kang HM, Abecasis GR, Willer CJ, Lee S. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet (2018). Sep;50(9):1335-1341.

See Also

seqFitNullGLMM_SPA, seqAssocGLMM_SPA

Examples

# open the GDS file for genetic relationship matrix (GRM)
grm_fn <- system.file("extdata", "grm1k_10k_snp.gds", package="SAIGEgds")
(grm_gds <- seqOpen(grm_fn))

# load phenotype
phenofn <- system.file("extdata", "pheno.txt.gz", package="SAIGEgds")
pheno <- read.table(phenofn, header=TRUE, as.is=TRUE)
head(pheno)

# define the SNP pairs
snp_pair <- data.frame(s1=2:3, s2=6:7, note=c("F1", "F2"))

seqGLMM_GxG_spa(y ~ x1 + x2, pheno, grm_gds, grm_fn, snp_pair,
    trait.type="binary", verbose.detail=FALSE)

[Package SAIGEgds version 1.8.1 Index]