read_vcfs_as_granges {MutationalPatterns} | R Documentation |
This function reads Variant Call Format (VCF) files into a GRanges object and combines them in a GRangesList. In addition to loading the files, this function applies the same seqlevel style to the GRanges objects as the reference genome passed in the 'genome' parameter.
read_vcfs_as_granges( vcf_files, sample_names, genome, group = c("auto+sex", "auto", "sex", "circular", "all", "none"), type = c("snv", "indel", "dbs", "mbs", "all"), change_seqnames = TRUE )
vcf_files |
Character vector of VCF file names |
sample_names |
Character vector of sample names |
genome |
A string matching the name of a BSgenome library corresponding to the reference genome of your VCFs |
group |
Selector for a seqlevel group. All seqlevels outside of this group will be removed. Possible values: * 'all' for all chromosomes; * 'auto' for autosomal chromosomes; * 'sex' for sex chromosomes; * 'auto+sex' for autosomal + sex chromosomes (default); * 'circular' for circular chromosomes; * 'none' for no filtering, which results in keeping all seqlevels from the VCF file. |
type |
The mutation type that will be loaded. All other variants will be filtered out. Possible values: * 'snv' * 'indel' * 'dbs' * 'mbs' * 'all' This function assumes that dbs and mbs variants are present in the vcf as SNVs, which are positioned next to each other. If your dbs/mbs variants are called separately you should use type = 'all' to prevent incorrect filtering. In those cases SNVs could be selected per sample by something like: 'gr[width(gr) == 1]' |
change_seqnames |
Boolean. Whether to change the seqnamesStyle of the vcf to that of the BSgenome object. (default = TRUE) |
A GRangesList containing the GRanges obtained from 'vcf_files'
## The example data set consists of three colon samples, three intestine ## samples and three liver samples. So, to map each file to its appropriate ## sample name, we create a vector containing the sample names: sample_names <- c( "colon1", "colon2", "colon3", "intestine1", "intestine2", "intestine3", "liver1", "liver2", "liver3" ) ## We assemble a list of files we want to load. These files match the ## sample names defined above. vcf_files <- list.files(system.file("extdata", package = "MutationalPatterns" ), pattern = "sample.vcf", full.names = TRUE ) ## Get a reference genome BSgenome object. ref_genome <- "BSgenome.Hsapiens.UCSC.hg19" library("BSgenome") library(ref_genome, character.only = TRUE) ## This function loads the files as GRanges objects. ## For backwards compatability reasons it only loads SNVs by default vcfs <- read_vcfs_as_granges(vcf_files, sample_names, ref_genome) ## To load all variant types use: vcfs <- read_vcfs_as_granges(vcf_files, sample_names, ref_genome, type = "all") ## Loading only indels can be done like this. ## Select data containing indels. vcf_fnames <- list.files(system.file("extdata", package = "MutationalPatterns"), pattern = "blood.*vcf", full.names = TRUE ) sample_names <- c("AC", "ACC55", "BCH") ## Read data and select only the indels. ## Other mutation types can be read in the same way. read_vcfs_as_granges(vcf_fnames, sample_names, ref_genome, type = "indel")