format_sumstats {MungeSumstats} | R Documentation |
Check that summary statistics from GWAS are in a homogeneous format
format_sumstats( path, ref_genome = "GRCh37", convert_small_p = TRUE, convert_n_int = TRUE, analysis_trait = NULL, INFO_filter = 0.9, N_std = 5, rmv_chr = c("X", "Y", "MT"), on_ref_genome = TRUE, strand_ambig_filter = FALSE, allele_flip_check = TRUE, bi_allelic_filter = TRUE )
path |
Filepath for the summary statistics file to be formatted |
ref_genome |
name of the reference genome used for the GWAS (GRCh37 or GRCh38). Default is GRCh37. |
convert_small_p |
Binary, should p-values < 5e-324 be converted to 0? Small p-values pass the R limit and can cause errors with LDSC/MAGMA and should be converted. Default is TRUE. |
convert_n_int |
Binary, if N (the number of samples) is not an integer, should this be rounded? Default is TRUE. |
analysis_trait |
If multiple traits were studied, name of the trait for analysis from the GWAS. Default is NULL |
INFO_filter |
numeric The minimum value permissible of the imputation information score (if present in sumstatsfile). Default 0.9 |
N_std |
numeric The number of standard deviations above the mean a SNP's N is needed to be removed. Default is 5. |
rmv_chr |
vector or character The chromosomes on which the SNPs should be removed. Use NULL if no filtering necessary. Default is X, Y and mitochondrial. |
on_ref_genome |
Binary Should a check take place that all SNPs are on the reference genome by SNP ID. Default is TRUE |
strand_ambig_filter |
Binary Should SNPs with strand-ambiguous alleles be removed. Default is FALSE |
allele_flip_check |
Binary Should the allele columns be checked against reference genome to infer if flipping is necessary. Default is TRUE |
bi_allelic_filter |
Binary Should non-biallelic SNPs be removed. Default is TRUE |
The address for the modified sumstats file
#Pass path to Educational Attainment Okbay sumstat file to a temp directory eduAttainOkbayPth <- system.file("extdata","eduAttainOkbay.txt", package="MungeSumstats") #pass path to format_sumstats ## Call uses reference genome as default with more than 2GB of memory, ## which is more than what 32-bit Windows can handle so remove certain checks is_32bit_windows <- .Platform$OS.type == "windows" && .Platform$r_arch == "i386" if (!is_32bit_windows) { reformatted <- MungeSumstats::format_sumstats(eduAttainOkbayPth, ref_genome="GRCh37") } else{ reformatted <- MungeSumstats::format_sumstats(eduAttainOkbayPth, ref_genome="GRCh37",on_ref_genome = FALSE,strand_ambig_filter=FALSE, bi_allelic_filter=FALSE, allele_flip_check=FALSE) } #returned location has the updated summary statistics file