TCGAbiolinks has provided a few functions to download mutation data from GDC. There are two options to download the data:
GDCquery_Maf
which will download MAF aligned against hg38GDCquery
, GDCdownload
and GDCpreprare
to downoad MAF aligned against hg19This exmaple will download MAF (mutation annotation files) for variant calling pipeline muse. Pipelines options are: muse, varscan2, somaticsniper, mutect. For more information please access GDC docs.
acc.maf <- GDCquery_Maf("ACC", pipelines = "muse")
# Only first 50 to make render faster
datatable(acc.maf[1:50,],
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
Hugo_Symbol | Entrez_Gene_Id | Center | NCBI_Build | Chromosome | Start_Position | End_Position | Strand | Variant_Classification | Variant_Type | Reference_Allele | Tumor_Seq_Allele1 | Tumor_Seq_Allele2 | dbSNP_RS | dbSNP_Val_Status | Tumor_Sample_Barcode | Matched_Norm_Sample_Barcode | Match_Norm_Seq_Allele1 | Match_Norm_Seq_Allele2 | Tumor_Validation_Allele1 | Tumor_Validation_Allele2 | Match_Norm_Validation_Allele1 | Match_Norm_Validation_Allele2 | Verification_Status | Validation_Status | Mutation_Status | Sequencing_Phase | Sequence_Source | Validation_Method | Score | BAM_File | Sequencer | Tumor_Sample_UUID | Matched_Norm_Sample_UUID | HGVSc | HGVSp | HGVSp_Short | Transcript_ID | Exon_Number | t_depth | t_ref_count | t_alt_count | n_depth | n_ref_count | n_alt_count | all_effects | Allele | Gene | Feature | Feature_type | Consequence | cDNA_position | CDS_position | Protein_position | Amino_acids | Codons | Existing_variation | ALLELE_NUM | DISTANCE | TRANSCRIPT_STRAND | SYMBOL | SYMBOL_SOURCE | HGNC_ID | BIOTYPE | CANONICAL | CCDS | ENSP | SWISSPROT | TREMBL | UNIPARC | RefSeq | SIFT | PolyPhen | EXON | INTRON | DOMAINS | GMAF | AFR_MAF | AMR_MAF | ASN_MAF | EAS_MAF | EUR_MAF | SAS_MAF | AA_MAF | EA_MAF | CLIN_SIG | SOMATIC | PUBMED | MOTIF_NAME | MOTIF_POS | HIGH_INF_POS | MOTIF_SCORE_CHANGE | IMPACT | PICK | VARIANT_CLASS | TSL | HGVS_OFFSET | PHENO | MINIMISED | ExAC_AF | ExAC_AF_AFR | ExAC_AF_AMR | ExAC_AF_EAS | ExAC_AF_FIN | ExAC_AF_NFE | ExAC_AF_OTH | ExAC_AF_SAS | GENE_PHENO | FILTER | src_vcf_id | tumor_bam_uuid | normal_bam_uuid | GDC_Validation_Status | GDC_Valid_Somatic |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hugo_Symbol | Entrez_Gene_Id | Center | NCBI_Build | Chromosome | Start_Position | End_Position | Strand | Variant_Classification | Variant_Type | Reference_Allele | Tumor_Seq_Allele1 | Tumor_Seq_Allele2 | dbSNP_RS | dbSNP_Val_Status | Tumor_Sample_Barcode | Matched_Norm_Sample_Barcode | Match_Norm_Seq_Allele1 | Match_Norm_Seq_Allele2 | Tumor_Validation_Allele1 | Tumor_Validation_Allele2 | Match_Norm_Validation_Allele1 | Match_Norm_Validation_Allele2 | Verification_Status | Validation_Status | Mutation_Status | Sequencing_Phase | Sequence_Source | Validation_Method | Score | BAM_File | Sequencer | Tumor_Sample_UUID | Matched_Norm_Sample_UUID | HGVSc | HGVSp | HGVSp_Short | Transcript_ID | Exon_Number | t_depth | t_ref_count | t_alt_count | n_depth | n_ref_count | n_alt_count | all_effects | Allele | Gene | Feature | Feature_type | Consequence | cDNA_position | CDS_position | Protein_position | Amino_acids | Codons | Existing_variation | ALLELE_NUM | DISTANCE | TRANSCRIPT_STRAND | SYMBOL | SYMBOL_SOURCE | HGNC_ID | BIOTYPE | CANONICAL | CCDS | ENSP | SWISSPROT | TREMBL | UNIPARC | RefSeq | SIFT | PolyPhen | EXON | INTRON | DOMAINS | GMAF | AFR_MAF | AMR_MAF | ASN_MAF | EAS_MAF | EUR_MAF | SAS_MAF | AA_MAF | EA_MAF | CLIN_SIG | SOMATIC | PUBMED | MOTIF_NAME | MOTIF_POS | HIGH_INF_POS | MOTIF_SCORE_CHANGE | IMPACT | PICK | VARIANT_CLASS | TSL | HGVS_OFFSET | PHENO | MINIMISED | ExAC_AF | ExAC_AF_AFR | ExAC_AF_AMR | ExAC_AF_EAS | ExAC_AF_FIN | ExAC_AF_NFE | ExAC_AF_OTH | ExAC_AF_SAS | GENE_PHENO | FILTER | src_vcf_id | tumor_bam_uuid | normal_bam_uuid | GDC_Validation_Status | GDC_Valid_Somatic |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AJAP1 | 55966 | BCM | GRCh38 | chr1 | 4712275 | 4712275 | + | Silent | SNP | G | G | A | novel | TCGA-OR-A5J9-01A-11D-A29I-10 | TCGA-OR-A5J9-10A-01D-A29L-10 | Somatic | Illumina HiSeq 2000 | 0e14fd74-7eaa-4401-8daa-ec9e467b4299 | e0439659-3c97-4d71-a90e-4be39c757a3a | c.405G>A | p.= | p.S135S | ENST00000378190 | 2/6 | 21 | 10 | 11 | 12 | AJAP1,synonymous_variant,p.=,ENST00000378191,NM_018836.3;AJAP1,synonymous_variant,p.=,ENST00000378190,NM_001042478.1;AJAP1,downstream_gene_variant,,ENST00000466761,; | A | ENSG00000196581 | ENST00000378190 | Transcript | synonymous_variant | 1099/2383 | 405/1236 | 135/411 | S | tcG/tcA | 1 | 1 | AJAP1 | HGNC | HGNC:30801 | protein_coding | CCDS54.1 | ENSP00000367432 | Q9UKB5 | UPI00000728B8 | NM_001042478.1 | 2/6 | Low_complexity_(Seg):Seg,PROSITE_profiles:PS50324 | LOW | SNV | 5 | 1 | PASS | 89a935ed-1590-4344-893f-9a4616b9a672 | 43b324bd-6fb8-4bda-9436-1b33293af4b2 | 0dc84605-b379-4457-8b2c-de63ec780317 | Unknown | False | |||||||||||||||||||||||||||||||||||||||||||||||||||
ADH5P2 | 343296 | BCM | GRCh38 | chr1 | 79521985 | 79521985 | + | RNA | SNP | A | A | T | novel | TCGA-OR-A5J9-01A-11D-A29I-10 | TCGA-OR-A5J9-10A-01D-A29L-10 | Somatic | Illumina HiSeq 2000 | 0e14fd74-7eaa-4401-8daa-ec9e467b4299 | e0439659-3c97-4d71-a90e-4be39c757a3a | n.906A>T | ENST00000425922 | 1/1 | 149 | 85 | 63 | 187 | ADH5P2,non_coding_transcript_exon_variant,,ENST00000425922,; | T | ENSG00000232676 | ENST00000425922 | Transcript | non_coding_transcript_exon_variant | 906/1124 | 1 | 1 | ADH5P2 | HGNC | HGNC:22976 | processed_pseudogene | YES | 1/1 | MODIFIER | 1 | SNV | 1 | PASS | 89a935ed-1590-4344-893f-9a4616b9a672 | 43b324bd-6fb8-4bda-9436-1b33293af4b2 | 0dc84605-b379-4457-8b2c-de63ec780317 | Unknown | False | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
FAM91A3P | 729182 | BCM | GRCh38 | chr1 | 143768678 | 143768678 | + | RNA | SNP | T | T | C | novel | TCGA-OR-A5J9-01A-11D-A29I-10 | TCGA-OR-A5J9-10A-01D-A29L-10 | Somatic | Illumina HiSeq 2000 | 0e14fd74-7eaa-4401-8daa-ec9e467b4299 | e0439659-3c97-4d71-a90e-4be39c757a3a | n.2139T>C | ENST00000456826 | 1/1 | 49 | 32 | 17 | 55 | FAM91A3P,non_coding_transcript_exon_variant,,ENST00000456826,; | C | ENSG00000242352 | ENST00000456826 | Transcript | non_coding_transcript_exon_variant | 2139/2544 | 1 | 1 | FAM91A3P | HGNC | HGNC:32273 | transcribed_processed_pseudogene | YES | 1/1 | MODIFIER | 1 | SNV | 1 | PASS | 89a935ed-1590-4344-893f-9a4616b9a672 | 43b324bd-6fb8-4bda-9436-1b33293af4b2 | 0dc84605-b379-4457-8b2c-de63ec780317 | Unknown | False | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PAPPA2 | 60676 | BCM | GRCh38 | chr1 | 176556468 | 176556468 | + | Missense_Mutation | SNP | G | G | A | novel | TCGA-OR-A5J9-01A-11D-A29I-10 | TCGA-OR-A5J9-10A-01D-A29L-10 | Somatic | Illumina HiSeq 2000 | 0e14fd74-7eaa-4401-8daa-ec9e467b4299 | e0439659-3c97-4d71-a90e-4be39c757a3a | c.146G>A | p.Arg49His | p.R49H | ENST00000367662 | 2/23 | 195 | 127 | 67 | 246 | PAPPA2,missense_variant,p.Arg49His,ENST00000367662,NM_020318.2;PAPPA2,missense_variant,p.Arg49His,ENST00000367661,NM_021936.2;PAPPA2,downstream_gene_variant,,ENST00000486075,;PAPPA2,downstream_gene_variant,,ENST00000493665,; | A | ENSG00000116183 | ENST00000367662 | Transcript | missense_variant | 1310/9691 | 146/5376 | 49/1791 | R/H | cGt/cAt | 1 | 1 | PAPPA2 | HGNC | HGNC:14615 | protein_coding | YES | CCDS41438.1 | ENSP00000356634 | Q9BXP8 | UPI000004A835 | NM_020318.2 | tolerated_low_confidence(0.75) | benign(0.001) | 2/23 | MODERATE | 1 | SNV | 1 | 1 | PASS | 89a935ed-1590-4344-893f-9a4616b9a672 | 43b324bd-6fb8-4bda-9436-1b33293af4b2 | 0dc84605-b379-4457-8b2c-de63ec780317 | Unknown | False | ||||||||||||||||||||||||||||||||||||||||||||||||
ABCA12 | 26154 | BCM | GRCh38 | chr2 | 215049632 | 215049632 | + | Missense_Mutation | SNP | G | G | C | novel | TCGA-OR-A5J9-01A-11D-A29I-10 | TCGA-OR-A5J9-10A-01D-A29L-10 | Somatic | Illumina HiSeq 2000 | 0e14fd74-7eaa-4401-8daa-ec9e467b4299 | e0439659-3c97-4d71-a90e-4be39c757a3a | c.687C>G | p.Phe229Leu | p.F229L | ENST00000272895 | 6/53 | 81 | 52 | 29 | 127 | ABCA12,missense_variant,p.Phe229Leu,ENST00000272895,NM_173076.2;AC072062.3,intron_variant,,ENST00000628464,;AC072062.3,intron_variant,,ENST00000626134,;AC072062.3,intron_variant,,ENST00000626771,; | C | ENSG00000144452 | ENST00000272895 | Transcript | missense_variant | 907/9100 | 687/7788 | 229/2595 | F/L | ttC/ttG | 1 | -1 | ABCA12 | HGNC | HGNC:14637 | protein_coding | YES | CCDS33372.1 | ENSP00000272895 | Q86UK0 | UPI000019AB7A | NM_173076.2 | tolerated(1) | benign(0.001) | 6/53 | Coiled-coils_(Ncoils):ncoils | MODERATE | 1 | SNV | 1 | 1 | PASS | 89a935ed-1590-4344-893f-9a4616b9a672 | 43b324bd-6fb8-4bda-9436-1b33293af4b2 | 0dc84605-b379-4457-8b2c-de63ec780317 | Unknown | False |
This exmaple will download MAF (mutation annotation files) aligned against hg19 (Old TCGA maf files)
query.maf.hg19 <- GDCquery(project = "TCGA-CHOL",
data.category = "Simple nucleotide variation",
data.type = "Simple somatic mutation",
access = "open",
legacy = TRUE)
# Check maf availables
datatable(select(getResults(query.maf.hg19),-contains("cases")),
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 10),
rownames = FALSE)
data_type | updated_datetime | file_name | md5sum | data_format | access | platform | state | state_comment | file_id | data_category | file_size | submitter_id | type | tags | experimental_strategy | created_datetime | project | code | center_name | center_short_name | center_center_id | center_namespace | center_center_type | tissue.definition |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
data_type | updated_datetime | file_name | md5sum | data_format | access | platform | state | state_comment | file_id | data_category | file_size | submitter_id | type | tags | experimental_strategy | created_datetime | project | code | center_name | center_short_name | center_center_id | center_namespace | center_center_type | tissue.definition |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Simple somatic mutation | 2016-09-07T14:16:45.297865-05:00 | bcgsc.ca_CHOL.IlluminaHiSeq_DNASeq.1.somatic.maf | 47268aa46006c53013466f740a3e1462 | MAF | open | Illumina HiSeq | live | 0d2e60c5-dd32-4a19-b600-7e76496f4f94 | Simple nucleotide variation | 1012118 | file | snv,somatic | DNA-Seq | TCGA-CHOL | 34 | Canada's Michael Smith Genome Sciences Centre | BCGSC | 380301b3-6f8d-581d-a81f-f4dd462df12b | bcgsc.ca | GSC | Blood Derived Normal | |||
Simple somatic mutation | 2016-09-07T11:23:40.788470-05:00 | hgsc.bcm.edu_CHOL.IlluminaGA_DNASeq.1.somatic.maf | ee6d4a3810593268b8038dfb13999ddd | MAF | open | Illumina GA | live | 448661d5-a89a-480e-adfd-1cce8eb74e70 | Simple nucleotide variation | 2052077 | file | snv,somatic | DNA-Seq | TCGA-CHOL | 10 | Baylor College of Medicine | BCM | d3b8c887-498b-5490-903e-760403c68307 | hgsc.bcm.edu | GSC | Blood Derived Normal | |||
Simple somatic mutation | 2016-09-07T13:30:03.078686-05:00 | ucsc.edu_CHOL.IlluminaGA_DNASeq_automated.Level_2.1.0.0.somatic.maf | b44be3f2e6a994be766cc881a3143b2b | MAF | open | Illumina GA | live | e45ec3d9-adcc-43db-a71c-9edaf7d11c86 | Simple nucleotide variation | 785408 | file | snv,somatic | DNA-Seq | TCGA-CHOL | 25 | University of California, Santa Cruz | UCSC | 79cc1498-5d7f-5eae-b631-e74b78c13581 | ucsc.edu | GSC | Solid Tissue Normal | |||
Simple somatic mutation | 2016-09-07T15:14:14.461260-05:00 | gsc_CHOL_pairs.aggregated.capture.tcga.uuid.automated.somatic.maf | 9288f4c155d47f4cc090eee3312e09c2 | MAF | open | Illumina GA | live | 2d9ed46f-36a5-4f87-9304-74ce626ae96d | Simple nucleotide variation | 4274149 | file | snv,somatic | DNA-Seq | 2016-06-13T17:02:09.527369-05:00 | TCGA-CHOL | 08 | Broad Institute of MIT and Harvard | BI | 61d634b8-e8dd-58bf-9a65-1233dc7c8c6a | broad.mit.edu | GSC | Solid Tissue Normal | ||
Simple somatic mutation | 2016-09-07T15:17:51.530211-05:00 | hgsc.bcm.edu_CHOL.IlluminaGA_DNASeq.1.somatic.maf | 8db4269d8aba6d8d397e2761e24e8e6e | MAF | open | Mixed platforms | live | a8532d87-1eae-4289-8aea-3255d7b313cf | Simple nucleotide variation | 2482745 | file | snv,somatic | DNA-Seq | TCGA-CHOL | 10 | Baylor College of Medicine | BCM | d3b8c887-498b-5490-903e-760403c68307 | hgsc.bcm.edu | GSC | Blood Derived Normal |
query.maf.hg19 <- GDCquery(project = "TCGA-CHOL",
data.category = "Simple nucleotide variation",
data.type = "Simple somatic mutation",
access = "open",
file.type = "bcgsc.ca_CHOL.IlluminaHiSeq_DNASeq.1.somatic.maf",
legacy = TRUE)
GDCdownload(query.maf.hg19)
maf <- GDCprepare(query.maf.hg19)
# Only first 50 to make render faster
datatable(maf[1:50,],
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
Hugo_Symbol | Entrez_Gene_Id | Center | NCBI_Build | Chromosome | Start_Position | End_Position | Strand | Variant_Classification | Variant_Type | Reference_Allele | Tumor_Seq_Allele1 | Tumor_Seq_Allele2 | dbSNP_RS | dbSNP_Val_Status | Tumor_Sample_Barcode | Matched_Norm_Sample_Barcode | Match_Norm_Seq_Allele1 | Match_Norm_Seq_Allele2 | Tumor_Validation_Allele1 | Tumor_Validation_Allele2 | Match_Norm_Validation_Allele1 | Match_Norm_Validation_Allele2 | Verification_Status | Validation_Status | Mutation_Status | Sequencing_Phase | Sequence_Source | Validation_Method | Score | BAM_File | Sequencer | Tumor_Sample_UUID | Matched_Norm_Sample_UUID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hugo_Symbol | Entrez_Gene_Id | Center | NCBI_Build | Chromosome | Start_Position | End_Position | Strand | Variant_Classification | Variant_Type | Reference_Allele | Tumor_Seq_Allele1 | Tumor_Seq_Allele2 | dbSNP_RS | dbSNP_Val_Status | Tumor_Sample_Barcode | Matched_Norm_Sample_Barcode | Match_Norm_Seq_Allele1 | Match_Norm_Seq_Allele2 | Tumor_Validation_Allele1 | Tumor_Validation_Allele2 | Match_Norm_Validation_Allele1 | Match_Norm_Validation_Allele2 | Verification_Status | Validation_Status | Mutation_Status | Sequencing_Phase | Sequence_Source | Validation_Method | Score | BAM_File | Sequencer | Tumor_Sample_UUID | Matched_Norm_Sample_UUID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TIE1 | 7075 | bcgsc.ca | hg19 | 1 | 43777350 | 43777350 | + | Missense_Mutation | SNP | G | A | A | rs56302794 | byFrequency | TCGA-3X-AAV9-01A-72D-A417-09 | TCGA-3X-AAV9-10A-01D-A41A-09 | G | - | - | - | - | - | Unknown | Untested | Somatic | Phase_1 | WXS | none | Illumina HiSeq | dd621b60-9752-48be-967c-43ee49990150 | 0c81b1b7-c24e-4b40-a844-bf702966e4a1 | ||
DOCK7 | 85440 | bcgsc.ca | hg19 | 1 | 63018485 | 63018485 | + | Missense_Mutation | SNP | C | T | T | novel | TCGA-3X-AAV9-01A-72D-A417-09 | TCGA-3X-AAV9-10A-01D-A41A-09 | C | - | - | - | - | - | Unknown | Untested | Somatic | Phase_1 | WXS | none | Illumina HiSeq | dd621b60-9752-48be-967c-43ee49990150 | 0c81b1b7-c24e-4b40-a844-bf702966e4a1 | |||
LRRC71 | 149499 | bcgsc.ca | hg19 | 1 | 156901798 | 156901798 | + | Missense_Mutation | SNP | C | T | T | novel | TCGA-3X-AAV9-01A-72D-A417-09 | TCGA-3X-AAV9-10A-01D-A41A-09 | C | - | - | - | - | - | Unknown | Untested | Somatic | Phase_1 | WXS | none | Illumina HiSeq | dd621b60-9752-48be-967c-43ee49990150 | 0c81b1b7-c24e-4b40-a844-bf702966e4a1 | |||
PTPN14 | 5784 | bcgsc.ca | hg19 | 1 | 214557314 | 214557314 | + | Silent | SNP | G | A | A | novel | TCGA-3X-AAV9-01A-72D-A417-09 | TCGA-3X-AAV9-10A-01D-A41A-09 | G | - | - | - | - | - | Unknown | Untested | Somatic | Phase_1 | WXS | none | Illumina HiSeq | dd621b60-9752-48be-967c-43ee49990150 | 0c81b1b7-c24e-4b40-a844-bf702966e4a1 | |||
OR2T12 | 127064 | bcgsc.ca | hg19 | 1 | 248458309 | 248458309 | + | Missense_Mutation | SNP | G | A | A | rs138674715 | TCGA-3X-AAV9-01A-72D-A417-09 | TCGA-3X-AAV9-10A-01D-A41A-09 | G | - | - | - | - | - | Unknown | Untested | Somatic | Phase_1 | WXS | none | Illumina HiSeq | dd621b60-9752-48be-967c-43ee49990150 | 0c81b1b7-c24e-4b40-a844-bf702966e4a1 |