InterMine is a powerful open source data warehouse system integrating diverse biological data sets (e.g. genomic, expression and protein data) for various organisms. Integrating data makes it possible to run sophisticated data mining queries that span domains of biological knowledge. A selected list of databases powered by InterMine is shown in Table 1:
Database | Organism | Data |
---|---|---|
FlyMine | Drosophila | Genes, homology, proteins, interactions, gene ontology, expression, regulation, phenotypes, pathways, diseases, resources, publications |
HumanMine | H. sapiens | Genomics, SNPs, GWAS, proteins, gene ontology, pathways, gene expression, interactions, publications, disease, orthologues, alleles |
MouseMine | M. musculus | Genomics, proteins, gene ontology, expression, interactions, pathways, phenotypes, diseases, homology, publications |
RatMine | R. norvegicus | Disease, gene ontology, genomics, interactions, phenotype, pathway, proteins, publication QTL, SNP |
WormMine | C. elegans | Genes, alleles, homology, go annotation, phenotypes, strains |
YeastMine | S. cerevisiae | Genomics, proteins, gene ontology, comparative genomics, phenotypes, interactions, literature, pathways, gene expression |
ZebrafishMine | D. rerio | Genes, constructs, disease, gene ontology, genotypes, homology, morpholinos, phenotypes |
TargetMine | H. sapiens, M. musculus | Genes, protein structures, chemical compounds, protein domains, gene function, pathways, interactions, disease, drug targets |
MitoMiner | H. sapiens, M. musculus, R. norvegicus, D. rerio, S. cerevisiae, S. pombe | Genes, homology, localisation evidence, Mitochondrial reference gene lists, phenotypes, diseases, expression, interactions, pathways, exome |
IndigoMine | Archae | Genomics |
ThaleMine | A. thaliana | Genomics, proteins, domains, homology, gene ontology, interactions, expression, publications, pathways, GeneRIF, stocks, phenotypes, alleles, insertions, TAIR |
MedicMine | Medicago truncatula | Genomics, pathways, gene ontology, publications, proteins, homology |
PhytoMine | over 50 plant genomes | Genes, proteins, expression, transcripts, homology |
Please see the InterMine home page for a full list of available InterMines.
InterMine includes an attractive, user-friendly web interface that works ‘out of the box’ and a powerful, scriptable web-service API to allow programmatic access to your data. This R package provides an interface with the InterMine-powered databases through Web services.
Let’s start with a simple task - find the homologues of gene ABO.
First, we look at what databases are available.
library(InterMineR)
listMines()
## BeanMine
## "https://mines.legumeinfo.org/beanmine"
## BovineMine
## "http://bovinegenome.org/bovinemine"
## CHOmine
## "https://chomine.boku.ac.at/chomine"
## ChickpeaMine
## "https://mines.legumeinfo.org/chickpeamine"
## CowpeaMine
## "https://mines.legumeinfo.org/cowpeamine"
## FlyMine
## "http://www.flymine.org/flymine"
## GrapeMine
## "http://urgi.versailles.inra.fr/GrapeMine"
## HumanMine
## "http://www.humanmine.org/humanmine"
## HymenopteraMine
## "http://hymenopteragenome.org/hymenopteramine"
## IndigoMine
## "http://www.cbrc.kaust.edu.sa/indigo"
## LegumeMine
## "https://intermine.legumefederation.org/legumemine"
## MedicMine
## "http://medicmine.jcvi.org/medicmine"
## MitoMiner
## "http://mitominer.mrc-mbu.cam.ac.uk/release-4.0"
## ModMine
## "http://intermine.modencode.org/release-33"
## MouseMine
## "http://www.mousemine.org/mousemine"
## PeanutMine
## "https://mines.legumeinfo.org/peanutmine"
## PhytoMine
## "https://phytozome.jgi.doe.gov/phytomine/"
## PlanMine
## "http://planmine.mpi-cbg.de/planmine"
## RatMine
## "http://ratmine.mcw.edu/ratmine"
## RepetDB
## "http://urgi.versailles.inra.fr/repetdb"
## Shaare
## "http://www.shaare.org.uk/release-1.0"
## SoyMine
## "https://mines.legumeinfo.org/soymine"
## TargetMine
## "http://targetmine.mizuguchilab.org/targetmine"
## ThaleMine
## "https://apps.araport.org/thalemine"
## Wheat3BMine
## "http://urgi.versailles.inra.fr/Wheat3BMine"
## WormMine
## "http://intermine.wormbase.org"
## XenMine
## "http://www.xenmine.org/xenmine"
## YeastMine
## "http://yeastmine.yeastgenome.org/yeastmine"
## ZebrafishMine
## "http://zebrafishmine.org"
Since we would like to query human genes, we select HumanMine.
# load HumaMine
im <- initInterMine(mine=listMines()["HumanMine"])
im
## $mine
## HumanMine
## "http://www.humanmine.org/humanmine"
##
## $token
## [1] ""
In InterMine you are able to build custom queries, but using R you are only allowed to run pre-built queries – called templates. Templates are queries that have already been created with a fixed set of output columns and one or more constraints.
# Get template (collection of pre-defined queries)
template = getTemplates(im)
head(template)
## name
## 1 Gene_Alleles_Disease
## 2 Gene_Identifiers
## 3 PathwayGenes
## 4 Gene_Location
## 5 GeneExpress
## 6 Gene_particularGoannotation
## title
## 1 Gene --> Alleles and Disease
## 2 Gene --> All identifiers.
## 3 Pathway --> Genes
## 4 Gene --> Chromosomal location.
## 5 Gene --> Gene Expression (Tissue, Disease; Array Express, E-MTAB-62)
## 6 Gene + GO term --> Genes by GO term
We would like to find templates involving genes.
# Get gene-related templates
template[grep("gene", template$name, ignore.case=TRUE),]
## name
## 1 Gene_Alleles_Disease
## 2 Gene_Identifiers
## 3 PathwayGenes
## 4 Gene_Location
## 5 GeneExpress
## 6 Gene_particularGoannotation
## 8 Pathway_ProteinGene
## 10 Gene_proteinAtlasExpression
## 11 Term_inGWASoptionalGene
## 12 Gene_Expression_GTex
## 13 Gene_proteindomain
## 15 Gene_To_Publications
## 17 ChromRegion_GenesTransExon
## 18 Gene_Orth
## 19 Gene_Protein
## 20 ChromRegion_Genes
## 21 Gene_inGWAS
## 22 GeneOrthAllele
## 25 Gene_TissueExpressionIllumina
## 27 GeneInteractorsExpression
## 29 Gene_Pathway
## 31 PhenotypeGene
## 32 GenePathway_interactions2
## 33 humDisGeneOrthol
## 36 domain_protein_gene
## 37 Gene_Interactions_forReportPage
## 38 Dis_Gene
## 39 GOterm_Gene
## 41 Protein_GeneChromosomeLength
## 42 geneInteractiongene
## 43 geneGWAS_reportPg
## 44 gene_complex_details
## 46 disExprGene
## 47 Gene_GO
## 49 Gene_AllelePhen
## 50 Protein_Gene_Ortho
## 52 Gene_Interactions2
## 54 Gene_Dis
## 57 Gene_OverlapppingGenes
## title
## 1 Gene --> Alleles and Disease
## 2 Gene --> All identifiers.
## 3 Pathway --> Genes
## 4 Gene --> Chromosomal location.
## 5 Gene --> Gene Expression (Tissue, Disease; Array Express, E-MTAB-62)
## 6 Gene + GO term --> Genes by GO term
## 8 Pathway --> Protein and Gene
## 10 Gene --> Protein tissue Localisation
## 11 GWAS term --> SNP + Associated gene if available
## 12 Gene --> Tissue Expression (GTex data)
## 13 Gene --> Protein + Domains
## 15 Gene --> Publications.
## 17 Chromosomal Location --> All Genes + Transcripts + Exons
## 18 Gene --> Orthologues
## 19 Gene --> Proteins.
## 20 Region --> Genes
## 21 Gene --> GWAS hit
## 22 Gene (Hum OR Rat) --> Mouse Allele (Phenotype)
## 25 Gene --> Tissue Expression (Illumina body map)
## 27 Gene + Tissue Expression --> Interactors that are expressed in that tissue
## 29 Gene --> Pathway
## 31 Mouse Phenotype --> Mouse Genes + Orthologous genes
## 32 Gene + Pathway --> Interactions
## 33 Human Disease --> [Human +] Orthologue Gene(s)
## 36 Protein Domain --> Protein and Genes
## 37 Gene --> Physical and Genetic Interactions
## 38 Disease --> Gene(s)
## 39 GO term --> Genes
## 41 Protein --> Gene.
## 42 Gene A --> Interaction <-- Gene B
## 43 Gene Report --> GWAS hit
## 44 Gene --> protein complex
## 46 Disease Expression --> Genes
## 47 Gene --> GO terms.
## 49 Mouse Gene --> Allele [Phenotype]
## 50 Protein --> Gene and Orthologues
## 52 Gene --> Interactions
## 54 Gene --> Disease (OMIM)
## 57 Gene --> Overlapping genes.
The template Gene_Orth seems to be what we want. Let’s look at this template in more detail.
# Query for gene orthologs
queryGeneOrth = getTemplateQuery(
im = im,
name = "Gene_Orth"
)
queryGeneOrth
## $model
## name
## "genomic"
##
## $title
## [1] "Gene --> Orthologues"
##
## $description
## [1] "For a given Gene (or List of Genes) in named organism (default: Human) returns the orthologues in a different organisms. [keywords: homologue, homolog, paralogue, paralogue, ortholog]"
##
## $select
## [1] "Gene.primaryIdentifier"
## [2] "Gene.symbol"
## [3] "Gene.homologues.homologue.primaryIdentifier"
## [4] "Gene.homologues.homologue.symbol"
## [5] "Gene.homologues.homologue.organism.shortName"
##
## $name
## [1] "Gene_Orth"
##
## $comment
## [1] ""
##
## $orderBy
## $orderBy[[1]]
## Gene.symbol
## "ASC"
##
##
## $where
## $where[[1]]
## $where[[1]]$path
## [1] "Gene"
##
## $where[[1]]$op
## [1] "LOOKUP"
##
## $where[[1]]$code
## [1] "A"
##
## $where[[1]]$editable
## [1] TRUE
##
## $where[[1]]$switchable
## [1] FALSE
##
## $where[[1]]$switched
## [1] "LOCKED"
##
## $where[[1]]$value
## [1] "PPARG"
##
## $where[[1]]$extraValue
## [1] "H. sapiens"
There are three essential members in a query - SELECT, WHERE and constraintLogic.
What does ‘Gene.symbol’ mean? What is ‘Gene.homologues.homologue.symbol’?
Let’s take a look at the data model.
model <- getModel(im)
head(model)
## type child_name child_type
## 1 Allele alternate
## 2 Allele clinicalSignificance
## 3 Allele id
## 4 Allele name
## 5 Allele primaryIdentifier
## 6 Allele reference
Let’s look at the children of the Gene data type.
model[which(model$type=="Gene"),]
## type child_name child_type
## 599 Gene briefDescription
## 600 Gene cytoLocation
## 601 Gene description
## 602 Gene id
## 603 Gene length
## 604 Gene name
## 605 Gene primaryIdentifier
## 606 Gene score
## 607 Gene scoreType
## 608 Gene secondaryIdentifier
## 609 Gene symbol
## 610 Gene alleles Allele
## 611 Gene atlasExpression AtlasExpression
## 612 Gene CDSs CDS
## 613 Gene chromosome Chromosome
## 614 Gene crossReferences CrossReference
## 615 Gene dataSets DataSet
## 616 Gene diseases Disease
## 617 Gene exons Exon
## 618 Gene goAnnotation GOAnnotation
## 619 Gene flankingRegions GeneFlankingRegion
## 620 Gene homologues Homologue
## 621 Gene interactions Interaction
## 622 Gene downstreamIntergenicRegion IntergenicRegion
## 623 Gene upstreamIntergenicRegion IntergenicRegion
## 624 Gene introns Intron
## 625 Gene chromosomeLocation Location
## 626 Gene locatedFeatures Location
## 627 Gene locations Location
## 628 Gene ontologyAnnotations OntologyAnnotation
## 629 Gene organism Organism
## 630 Gene pathways Pathway
## 631 Gene probeSets ProbeSet
## 632 Gene proteins Protein
## 633 Gene proteinAtlasExpression ProteinAtlasExpression
## 634 Gene publications Publication
## 635 Gene rnaSeqResults RNASeqResult
## 636 Gene regulatoryRegions RegulatoryRegion
## 637 Gene SNPs SNP
## 638 Gene sequenceOntologyTerm SOTerm
## 639 Gene sequence Sequence
## 640 Gene childFeatures SequenceFeature
## 641 Gene overlappingFeatures SequenceFeature
## 642 Gene synonyms Synonym
## 643 Gene transcripts Transcript
## 644 Gene UTRs UTR
Gene has a field called ‘symbol’ (hence the column Gene.symbol). Gene also has a child called homologues, which is of the Homologue data type.
model[which(model$type=="Homologue"),]
## type child_name child_type
## 732 Homologue id
## 733 Homologue type
## 734 Homologue crossReferences CrossReference
## 735 Homologue dataSets DataSet
## 736 Homologue gene Gene
## 737 Homologue homologue Gene
## 738 Homologue evidence OrthologueEvidence
Homologue has a child called ‘gene’ which is of the type ‘Gene’, which we saw above has a field called ‘symbol’ (hence the column Gene.homologues.homologue.symbol).
Let’s now run our template.
resGeneOrth <- runQuery(im, queryGeneOrth)
head(resGeneOrth)
## Gene.primaryIdentifier Gene.symbol
## 1 5468 PPARG
## 2 5468 PPARG
## 3 5468 PPARG
## 4 5468 PPARG
## 5 5468 PPARG
## 6 5468 PPARG
## Gene.homologues.homologue.primaryIdentifier
## 1 10062
## 2 5465
## 3 5467
## 4 5914
## 5 5915
## 6 5916
## Gene.homologues.homologue.symbol
## 1 NR1H3
## 2 PPARA
## 3 PPARD
## 4 RARA
## 5 RARB
## 6 RARG
## Gene.homologues.homologue.organism.shortName
## 1 H. sapiens
## 2 H. sapiens
## 3 H. sapiens
## 4 H. sapiens
## 5 H. sapiens
## 6 H. sapiens
Let’s modify the query to find the orthologues of the gene ABO. We want to change the ‘value’ attribute from PPARG to ABO.
There are two ways to build a query in InterMineR.
We can either build a query as a list object with newQuery
function, and assign all input values (selection of retrieved data type, constraints, etc.) as items of that list,
rr we can build the query as an InterMineR-class
object with the functions setConstraint
, which allows us to generate a new or modify an existing list of constraints, and setQuery
, which generates the query as a InterMineR-class
object.
setConstraints
and setQuery
functions are designed to facilitate the generation of queries for InterMine instances and avoid using multiple iterative loops, especially when it is required to include multiple constraints or constraint values (e.g. genes, organisms) in your query.
# modify directly the value of the first constraint from the list query
queryGeneOrth$where[[1]][["value"]] <- "ABO"
# or modify the value of the first constraint from the list query with setConstraints
queryGeneOrth$where = setConstraints(
modifyQueryConstraints = queryGeneOrth,
m.index = 1,
values = list("ABO")
)
queryGeneOrth$where
## [[1]]
## [[1]]$path
## [1] "Gene"
##
## [[1]]$op
## [1] "LOOKUP"
##
## [[1]]$code
## [1] "A"
##
## [[1]]$editable
## [1] TRUE
##
## [[1]]$switchable
## [1] FALSE
##
## [[1]]$switched
## [1] "LOCKED"
##
## [[1]]$value
## [1] "ABO"
##
## [[1]]$extraValue
## [1] "H. sapiens"
Note the value is now equal to ‘ABO’. Let’s rerun our query with the new constraint.
resGeneOrth <- runQuery(im, queryGeneOrth)
head(resGeneOrth)
## Gene.primaryIdentifier Gene.symbol
## 1 28 ABO
## 2 28 ABO
## 3 28 ABO
## 4 28 ABO
## 5 28 ABO
## 6 28 ABO
## Gene.homologues.homologue.primaryIdentifier
## 1 127550
## 2 26301
## 3 360203
## 4 MGI:2135738
## 5 RGD:2307241
## 6 RGD:628609
## Gene.homologues.homologue.symbol
## 1 A3GALT2
## 2 GBGT1
## 3 GLT6D1
## 4 Abo
## 5 Abo
## 6 Abo3
## Gene.homologues.homologue.organism.shortName
## 1 H. sapiens
## 2 H. sapiens
## 3 H. sapiens
## 4 M. musculus
## 5 R. norvegicus
## 6 R. norvegicus
Now we are seeing orthologues for the ABO gene. Let’s add the organism to the view to make sure we are looking at the desired gene.
You can also add additional filters. Let’s exclude all homologues where organism is H. sapiens.
There are four parts of a constraint to add:
newConstraint <- list(
path=c("Gene.homologues.homologue.organism.shortName"),
op=c("!="),
value=c("H. sapiens"),
code=c("B")
)
queryGeneOrth$where[[2]] <- newConstraint
queryGeneOrth$where
## [[1]]
## [[1]]$path
## [1] "Gene"
##
## [[1]]$op
## [1] "LOOKUP"
##
## [[1]]$code
## [1] "A"
##
## [[1]]$editable
## [1] TRUE
##
## [[1]]$switchable
## [1] FALSE
##
## [[1]]$switched
## [1] "LOCKED"
##
## [[1]]$value
## [1] "ABO"
##
## [[1]]$extraValue
## [1] "H. sapiens"
##
##
## [[2]]
## [[2]]$path
## [1] "Gene.homologues.homologue.organism.shortName"
##
## [[2]]$op
## [1] "!="
##
## [[2]]$value
## [1] "H. sapiens"
##
## [[2]]$code
## [1] "B"
Our new filter has been added successfully. Rerun the query and you see you only have non-Homo sapiens orthologues.
resGeneOrth <- runQuery(im, queryGeneOrth)
resGeneOrth
## Gene.primaryIdentifier Gene.symbol
## 1 28 ABO
## 2 28 ABO
## 3 28 ABO
## 4 28 ABO
## 5 28 ABO
## 6 28 ABO
## 7 28 ABO
## 8 28 ABO
## 9 28 ABO
## 10 28 ABO
## 11 28 ABO
## Gene.homologues.homologue.primaryIdentifier
## 1 MGI:2135738
## 2 RGD:2307241
## 3 RGD:628609
## 4 ZDB-GENE-031204-4
## 5 ZDB-GENE-040426-1117
## 6 ZDB-GENE-040912-46
## 7 ZDB-GENE-060531-15
## 8 ZDB-GENE-060531-59
## 9 ZDB-GENE-060531-70
## 10 ZDB-GENE-060531-71
## 11 ZDB-GENE-081104-23
## Gene.homologues.homologue.symbol
## 1 Abo
## 2 Abo
## 3 Abo3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## Gene.homologues.homologue.organism.shortName
## 1 M. musculus
## 2 R. norvegicus
## 3 R. norvegicus
## 4 D. rerio
## 5 D. rerio
## 6 D. rerio
## 7 D. rerio
## 8 D. rerio
## 9 D. rerio
## 10 D. rerio
## 11 D. rerio
You can also add additional columns to the output. For instance, where do these homologues come from? Let’s add this information.
Let’s see what we know about homologues.
model[which(model$type=="Homologue"),]
## type child_name child_type
## 732 Homologue id
## 733 Homologue type
## 734 Homologue crossReferences CrossReference
## 735 Homologue dataSets DataSet
## 736 Homologue gene Gene
## 737 Homologue homologue Gene
## 738 Homologue evidence OrthologueEvidence
The Homologue data type has an ‘dataSets’ reference of type ‘DataSet’.
model[which(model$type=="DataSet"),]
## type child_name child_type
## 403 DataSet description
## 404 DataSet id
## 405 DataSet name
## 406 DataSet url
## 407 DataSet version
## 408 DataSet bioEntities BioEntity
## 409 DataSet dataSource DataSource
## 410 DataSet publication Publication
DataSet has a child called name. Add Gene.homologues.dataSets.name to the view. We’ll add it as the last column, we can see from above there are 5 other columns already so we’ll put it as #6:
# use setQuery function which will create an InterMineR-class query
queryGeneOrth.InterMineR = setQuery(
inheritQuery = queryGeneOrth,
select = c(queryGeneOrth$select,
"Gene.homologues.dataSets.name")
)
getSelect(queryGeneOrth.InterMineR)
## [1] "Gene.primaryIdentifier"
## [2] "Gene.symbol"
## [3] "Gene.homologues.homologue.primaryIdentifier"
## [4] "Gene.homologues.homologue.symbol"
## [5] "Gene.homologues.homologue.organism.shortName"
## [6] "Gene.homologues.dataSets.name"
#queryGeneOrth.InterMineR@select
# or assign new column directly to the existing list query
queryGeneOrth$select[[6]] <- "Gene.homologues.dataSets.name"
queryGeneOrth$select
## [1] "Gene.primaryIdentifier"
## [2] "Gene.symbol"
## [3] "Gene.homologues.homologue.primaryIdentifier"
## [4] "Gene.homologues.homologue.symbol"
## [5] "Gene.homologues.homologue.organism.shortName"
## [6] "Gene.homologues.dataSets.name"
# run queries
resGeneOrth.InterMineR <- runQuery(im, queryGeneOrth.InterMineR)
resGeneOrth <- runQuery(im, queryGeneOrth)
all(resGeneOrth == resGeneOrth.InterMineR)
## [1] TRUE
head(resGeneOrth, 3)
## Gene.primaryIdentifier Gene.symbol
## 1 28 ABO
## 2 28 ABO
## 3 28 ABO
## Gene.homologues.homologue.primaryIdentifier
## 1 MGI:2135738
## 2 RGD:2307241
## 3 RGD:628609
## Gene.homologues.homologue.symbol
## 1 Abo
## 2 Abo
## 3 Abo3
## Gene.homologues.homologue.organism.shortName Gene.homologues.dataSets.name
## 1 M. musculus Panther data set
## 2 R. norvegicus Panther data set
## 3 R. norvegicus Panther data set
NB: adding columns can result in changing the row count.
The constraintLogic, if not given, is ‘A and B’. We would now try to explicitly specify the constraintLogic. A and B corresponds to the ‘code’ for each constraint.
queryGeneOrth$constraintLogic <- "A and B"
queryGeneOrth$constraintLogic
## [1] "A and B"
Run the query again and see no change:
resGeneOrth <- runQuery(im, queryGeneOrth)
resGeneOrth
## Gene.primaryIdentifier Gene.symbol
## 1 28 ABO
## 2 28 ABO
## 3 28 ABO
## 4 28 ABO
## 5 28 ABO
## 6 28 ABO
## 7 28 ABO
## 8 28 ABO
## 9 28 ABO
## 10 28 ABO
## 11 28 ABO
## Gene.homologues.homologue.primaryIdentifier
## 1 MGI:2135738
## 2 RGD:2307241
## 3 RGD:628609
## 4 ZDB-GENE-031204-4
## 5 ZDB-GENE-040426-1117
## 6 ZDB-GENE-040912-46
## 7 ZDB-GENE-060531-15
## 8 ZDB-GENE-060531-59
## 9 ZDB-GENE-060531-70
## 10 ZDB-GENE-060531-71
## 11 ZDB-GENE-081104-23
## Gene.homologues.homologue.symbol
## 1 Abo
## 2 Abo
## 3 Abo3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## Gene.homologues.homologue.organism.shortName
## 1 M. musculus
## 2 R. norvegicus
## 3 R. norvegicus
## 4 D. rerio
## 5 D. rerio
## 6 D. rerio
## 7 D. rerio
## 8 D. rerio
## 9 D. rerio
## 10 D. rerio
## 11 D. rerio
## Gene.homologues.dataSets.name
## 1 Panther data set
## 2 Panther data set
## 3 Panther data set
## 4 Panther data set
## 5 Panther data set
## 6 Panther data set
## 7 Panther data set
## 8 Panther data set
## 9 Panther data set
## 10 Panther data set
## 11 Panther data set
Change to be ‘A or B’ and see how the results change.
- Start with the template Gene GO
queryGeneGO <- getTemplateQuery(im, "Gene_GO")
queryGeneGO
## $model
## name
## "genomic"
##
## $title
## [1] "Gene --> GO terms."
##
## $description
## [1] "Search for GO annotations for a particular gene (or List of Genes)."
##
## $select
## [1] "Gene.primaryIdentifier"
## [2] "Gene.symbol"
## [3] "Gene.goAnnotation.ontologyTerm.identifier"
## [4] "Gene.goAnnotation.ontologyTerm.name"
## [5] "Gene.goAnnotation.ontologyTerm.namespace"
## [6] "Gene.goAnnotation.evidence.code.code"
## [7] "Gene.goAnnotation.ontologyTerm.parents.identifier"
## [8] "Gene.goAnnotation.ontologyTerm.parents.name"
## [9] "Gene.goAnnotation.qualifier"
##
## $name
## [1] "Gene_GO"
##
## $comment
## [1] "Added 15NOV2010: ML"
##
## $orderBy
## $orderBy[[1]]
## Gene.primaryIdentifier
## "ASC"
##
##
## $where
## $where[[1]]
## $where[[1]]$path
## [1] "Gene"
##
## $where[[1]]$op
## [1] "LOOKUP"
##
## $where[[1]]$code
## [1] "A"
##
## $where[[1]]$editable
## [1] TRUE
##
## $where[[1]]$switchable
## [1] FALSE
##
## $where[[1]]$switched
## [1] "LOCKED"
##
## $where[[1]]$value
## [1] "PPARG"
##
## $where[[1]]$extraValue
## [1] "H. sapiens"
- Modify the view to display a compact view
queryGeneGO$select <- queryGeneGO$select[2:5]
queryGeneGO$select
## [1] "Gene.symbol"
## [2] "Gene.goAnnotation.ontologyTerm.identifier"
## [3] "Gene.goAnnotation.ontologyTerm.name"
## [4] "Gene.goAnnotation.ontologyTerm.namespace"
- Modify the constraints to look for gene ABO.
queryGeneGO$where[[1]][["value"]] <- "ABO"
queryGeneGO$where
## [[1]]
## [[1]]$path
## [1] "Gene"
##
## [[1]]$op
## [1] "LOOKUP"
##
## [[1]]$code
## [1] "A"
##
## [[1]]$editable
## [1] TRUE
##
## [[1]]$switchable
## [1] FALSE
##
## [[1]]$switched
## [1] "LOCKED"
##
## [[1]]$value
## [1] "ABO"
##
## [[1]]$extraValue
## [1] "H. sapiens"
- Run the query
resGeneGO <- runQuery(im, queryGeneGO )
head(resGeneGO)
## Gene.symbol Gene.goAnnotation.ontologyTerm.identifier
## 1 ABO GO:0000166
## 2 ABO GO:0003823
## 3 ABO GO:0004380
## 4 ABO GO:0004381
## 5 ABO GO:0005576
## 6 ABO GO:0005794
## Gene.goAnnotation.ontologyTerm.name
## 1 nucleotide binding
## 2 antigen binding
## 3 glycoprotein-fucosylgalactoside alpha-N-acetylgalactosaminyltransferase activity
## 4 fucosylgalactoside 3-alpha-galactosyltransferase activity
## 5 extracellular region
## 6 Golgi apparatus
## Gene.goAnnotation.ontologyTerm.namespace
## 1 molecular_function
## 2 molecular_function
## 3 molecular_function
## 4 molecular_function
## 5 cellular_component
## 6 cellular_component
- Start with the template Gene GO
queryGOGene <- getTemplateQuery(im, "GOterm_Gene")
queryGOGene
## $model
## name
## "genomic"
##
## $title
## [1] "GO term --> Genes"
##
## $description
## [1] "Search for Genes in a specified organism that are associated with a particular Gene Ontology (GO) annotation."
##
## $select
## [1] "Gene.primaryIdentifier"
## [2] "Gene.symbol"
## [3] "Gene.name"
## [4] "Gene.goAnnotation.ontologyTerm.identifier"
## [5] "Gene.goAnnotation.ontologyTerm.name"
## [6] "Gene.organism.shortName"
##
## $constraintLogic
## [1] "A and B"
##
## $name
## [1] "GOterm_Gene"
##
## $comment
## [1] "Added 26OCT2010: ML"
##
## $orderBy
## $orderBy[[1]]
## Gene.symbol
## "ASC"
##
##
## $where
## $where[[1]]
## $where[[1]]$path
## [1] "Gene.goAnnotation.ontologyTerm.name"
##
## $where[[1]]$op
## [1] "LIKE"
##
## $where[[1]]$code
## [1] "A"
##
## $where[[1]]$editable
## [1] TRUE
##
## $where[[1]]$switchable
## [1] FALSE
##
## $where[[1]]$switched
## [1] "LOCKED"
##
## $where[[1]]$value
## [1] "DNA binding"
##
##
## $where[[2]]
## $where[[2]]$path
## [1] "Gene.organism.shortName"
##
## $where[[2]]$op
## [1] "="
##
## $where[[2]]$code
## [1] "B"
##
## $where[[2]]$editable
## [1] TRUE
##
## $where[[2]]$switchable
## [1] FALSE
##
## $where[[2]]$switched
## [1] "LOCKED"
##
## $where[[2]]$value
## [1] "H. sapiens"
- Modify the view to display a compact view
queryGOGene$select <- queryGOGene$select[2:5]
queryGOGene$select
## [1] "Gene.symbol"
## [2] "Gene.name"
## [3] "Gene.goAnnotation.ontologyTerm.identifier"
## [4] "Gene.goAnnotation.ontologyTerm.name"
- Modify the constraints to look for GO term ‘metal ion binding’
queryGOGene$where[[1]]$value = "metal ion binding"
queryGOGene$where
## [[1]]
## [[1]]$path
## [1] "Gene.goAnnotation.ontologyTerm.name"
##
## [[1]]$op
## [1] "LIKE"
##
## [[1]]$code
## [1] "A"
##
## [[1]]$editable
## [1] TRUE
##
## [[1]]$switchable
## [1] FALSE
##
## [[1]]$switched
## [1] "LOCKED"
##
## [[1]]$value
## [1] "metal ion binding"
##
##
## [[2]]
## [[2]]$path
## [1] "Gene.organism.shortName"
##
## [[2]]$op
## [1] "="
##
## [[2]]$code
## [1] "B"
##
## [[2]]$editable
## [1] TRUE
##
## [[2]]$switchable
## [1] FALSE
##
## [[2]]$switched
## [1] "LOCKED"
##
## [[2]]$value
## [1] "H. sapiens"
- Run the query
resGOGene <- runQuery(im, queryGOGene )
head(resGOGene)
## Gene.symbol Gene.name
## 1 A3GALT2 alpha 1,3-galactosyltransferase 2
## 2 AARS alanyl-tRNA synthetase
## 3 AARS2 alanyl-tRNA synthetase 2, mitochondrial
## 4 AARSD1 alanyl-tRNA synthetase domain containing 1
## 5 ABAT 4-aminobutyrate aminotransferase
## 6 ABLIM1 actin binding LIM protein 1
## Gene.goAnnotation.ontologyTerm.identifier
## 1 GO:0046872
## 2 GO:0046872
## 3 GO:0046872
## 4 GO:0046872
## 5 GO:0046872
## 6 GO:0046872
## Gene.goAnnotation.ontologyTerm.name
## 1 metal ion binding
## 2 metal ion binding
## 3 metal ion binding
## 4 metal ion binding
## 5 metal ion binding
## 6 metal ion binding
- Start with the Gene_Location template, update to search for ABCA6 gene.
queryGeneLoc = getTemplateQuery(im, "Gene_Location")
queryGeneLoc$where[[2]][["value"]] = "ABCA6"
resGeneLoc= runQuery(im, queryGeneLoc)
resGeneLoc
## Gene.primaryIdentifier Gene.secondaryIdentifier Gene.symbol
## 1 23460 ENSG00000154262 ABCA6
## Gene.name
## 1 ATP binding cassette subfamily A member 6
## Gene.chromosome.primaryIdentifier Gene.locations.start Gene.locations.end
## 1 17 69062044 69142188
## Gene.locations.strand
## 1 -1
We’re going to use the output (gene location) as input for the next query.
- Define a new query
# set constraints
constraints = setConstraints(
paths = c(
"Gene.chromosome.primaryIdentifier",
"Gene.locations.start",
"Gene.locations.end",
"Gene.organism.name"
),
operators = c(
"=",
">=",
"<=",
"="
),
values = list(
resGeneLoc[1, "Gene.chromosome.primaryIdentifier"],
as.character(as.numeric(resGeneLoc[1, "Gene.locations.start"])-50000),
as.character(as.numeric(resGeneLoc[1, "Gene.locations.end"])+50000),
"Homo sapiens"
)
)
# set InterMineR-class query
queryNeighborGene = setQuery(
select = c("Gene.primaryIdentifier",
"Gene.symbol",
"Gene.chromosome.primaryIdentifier",
"Gene.locations.start",
"Gene.locations.end",
"Gene.locations.strand"),
where = constraints
)
summary(queryNeighborGene)
## path op value code
## 1 Gene.chromosome.primaryIdentifier = 17 A
## 2 Gene.locations.start >= 69012044 B
## 3 Gene.locations.end <= 69192188 C
## 4 Gene.organism.name = Homo sapiens D
- Run the query
resNeighborGene <- runQuery(im, queryNeighborGene)
resNeighborGene
## Gene.primaryIdentifier Gene.symbol Gene.chromosome.primaryIdentifier
## 1 100421166 LOC100421166 17
## 2 100616316 MIR4524A 17
## 3 100847008 MIR4524B 17
## 4 23460 ABCA6 17
## Gene.locations.start Gene.locations.end Gene.locations.strand
## 1 69094100 69095506 -1
## 2 69099564 69099632 -1
## 3 69099542 69099656 1
## 4 69062044 69142188 -1
- Plot the genes
resNeighborGene$Gene.locations.strand[which(resNeighborGene$Gene.locations.strand==1)]="+"
resNeighborGene$Gene.locations.strand[which(resNeighborGene$Gene.locations.strand==-1)]="-"
gene.idx = which(nchar(resNeighborGene$Gene.symbol)==0)
resNeighborGene$Gene.symbol[gene.idx]=resNeighborGene$Gene.primaryIdentifier[gene.idx]
require(Gviz)
annTrack = AnnotationTrack(
start=resNeighborGene$Gene.locations.start,
end=resNeighborGene$Gene.locations.end,
strand=resNeighborGene$Gene.locations.strand,
chromosome=resNeighborGene$Gene.chromosome.primaryIdentifier[1],
genome="GRCh38",
name="around ABCA6",
id=resNeighborGene$Gene.symbol)
gtr <- GenomeAxisTrack()
itr <- IdeogramTrack(genome="hg38", chromosome="chr17")
plotTracks(list(gtr, itr, annTrack), shape="box", showFeatureId=TRUE, fontcolor="black")
sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.6-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.6-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=C LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid parallel stats4 stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] Gviz_1.22.0 GenomicRanges_1.30.0 GenomeInfoDb_1.14.0
## [4] org.Hs.eg.db_3.4.2 GO.db_3.4.2 GeneAnswers_2.20.0
## [7] RColorBrewer_1.1-2 Heatplus_2.24.0 MASS_7.3-47
## [10] annotate_1.56.0 XML_3.98-1.9 AnnotationDbi_1.40.0
## [13] IRanges_2.12.0 S4Vectors_0.16.0 Biobase_2.38.0
## [16] BiocGenerics_0.24.0 RCurl_1.95-4.8 bitops_1.0-6
## [19] igraph_1.1.2 RSQLite_2.0 InterMineR_1.0.0
## [22] BiocStyle_2.6.0
##
## loaded via a namespace (and not attached):
## [1] ProtGenerics_1.10.0 matrixStats_0.52.2
## [3] bit64_0.9-7 progress_1.1.2
## [5] httr_1.3.1 rprojroot_1.2
## [7] tools_3.4.2 backports_1.1.1
## [9] R6_2.2.2 rpart_4.1-11
## [11] Hmisc_4.0-3 DBI_0.7
## [13] lazyeval_0.2.1 colorspace_1.3-2
## [15] nnet_7.3-12 RMySQL_0.10.13
## [17] gridExtra_2.3 prettyunits_1.0.2
## [19] bit_1.1-12 curl_3.0
## [21] compiler_3.4.2 chron_2.3-51
## [23] graph_1.56.0 htmlTable_1.9
## [25] xml2_1.1.1 DelayedArray_0.4.0
## [27] rtracklayer_1.38.0 bookdown_0.5
## [29] checkmate_1.8.5 scales_0.5.0
## [31] RBGL_1.54.0 stringr_1.2.0
## [33] digest_0.6.12 Rsamtools_1.30.0
## [35] foreign_0.8-69 rmarkdown_1.6
## [37] XVector_0.18.0 dichromat_2.0-0
## [39] base64enc_0.1-3 pkgconfig_2.0.1
## [41] htmltools_0.3.6 ensembldb_2.2.0
## [43] BSgenome_1.46.0 htmlwidgets_0.9
## [45] rlang_0.1.2 BiocInstaller_1.28.0
## [47] shiny_1.0.5 jsonlite_1.5
## [49] BiocParallel_1.12.0 acepack_1.4.1
## [51] VariantAnnotation_1.24.0 magrittr_1.5
## [53] GenomeInfoDbData_0.99.1 Formula_1.2-2
## [55] Matrix_1.2-11 Rcpp_0.12.13
## [57] munsell_0.4.3 proto_1.0.0
## [59] sqldf_0.4-11 stringi_1.1.5
## [61] yaml_2.1.14 RJSONIO_1.3-0
## [63] SummarizedExperiment_1.8.0 zlibbioc_1.24.0
## [65] AnnotationHub_2.10.0 plyr_1.8.4
## [67] blob_1.1.0 lattice_0.20-35
## [69] Biostrings_2.46.0 splines_3.4.2
## [71] GenomicFeatures_1.30.0 knitr_1.17
## [73] tcltk_3.4.2 biomaRt_2.34.0
## [75] evaluate_0.10.1 downloader_0.4
## [77] biovizBase_1.26.0 latticeExtra_0.6-28
## [79] data.table_1.10.4-3 httpuv_1.3.5
## [81] gtable_0.2.0 assertthat_0.2.0
## [83] gsubfn_0.6-6 ggplot2_2.2.1
## [85] mime_0.5 xtable_1.8-2
## [87] AnnotationFilter_1.2.0 survival_2.41-3
## [89] tibble_1.3.4 GenomicAlignments_1.14.0
## [91] memoise_1.1.0 cluster_2.0.6
## [93] interactiveDisplayBase_1.16.0
The InterMine model could be accessed from the mine homepage by clicking the tab “QueryBuilder” and selecting the appropriate data type under “Select a Data Type to Begin a Query”:
Here we select Gene as the data type:
We could select Symbol and Chromosome->Primary Identifier by clicking Show on the right of them. Then click “Export XML” at the bottom right corner of the webpage:
The column names Gene.symbol and Gene.chromosome.primaryIdentifier are contained in the XML output: