knitr::opts_knit$set(width = 80)
NoRCE package systematically performs annotation and enrichment analysis for a set of regulatory non-coding RNA genes. NoRCE analyses are based on spatially proximal mRNAs at a certain distance for a set of non-coding RNA genes or regions of interest. Moreover, specific analyses such as biotype selection, miRNA-mRNA co-expression, miRNA-mRNA target prediction can be performed for filtering. Besides, it allows to curate the gene set according to the topologically associating domain (TAD) regions.
To install the NoRCE
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("NoRCE")
library(NoRCE)
GO enrichment analysis can be performed based on gene neigbourhood, predicted targets, co-expression values and/or topological domain analysis. HUGO, ENSEMBL gene, ENSEMBL transcript gene, ENTREZ ID and miRBase names are supported formats for the input gene list. Moreover, NoRCE accepts a list of genomic regions. The input genomic region list should be in a .bed format. Each analysis is controlled by corresponding parameters. When related parameters are set, the gene set resulting from the intersection of those analysis will be considered for enrichment analysis (Co-expression analysis can be augmented with other analysis). GO enrichment analysis are carried out with geneGOEnricher
and geneRegionGOEnricher
functions. Also, miRNA gene enrichment are carried out with mirnaGOEnricher
and mirnaRegionGOEnricher
functions. Species assembly must be defined using the org_assembly
parameter. NoRCE allows the user to use background gene set. The background gene set and the format of the gene set should be defined.
When the near
parameter is set to TRUE
, the closest genes for the input gene list are retrieved. The gene neighbourhood taken into consideration is controlled by the upstream
and downstream
parameters. By default, all genes that fall into 10 kb upstream and downstream of the input genes are retrieved. Also, using searchRegion
parameter, the analysis can be performed for only those genes whose exon or intron regions fall into the specified upstream and downstream range of the input genes.
library(NoRCE)
Moreover, NoRCE can convert .txt file or data frame to a .bed formatted file to make it available for region based analysis (readbed
function).
For a set of miRNA genes, target prediction is controlled by the target
parameter. Once this parameter is set to TRUE
, TargetScan prediction is used to curate the gene list that will be enriched.
mirGO<-mirnaGOEnricher(gene = brain_mirna, org_assembly='hg19', near=TRUE, target=TRUE)
The above example shows that the GO enrichment is performed based on neighbouring coding genes of brain miRNA targeted by the same brain miRNA gene set.
Gene annotation based on topologically associating domain regions are conducted whether ncRNAs fall into the TAD regions and coding gene assignment only those that are in the same TAD region are included in the neighborhood coding gene set. If cell-line(s) for TAD region is specified, only regions that are associated with the given cell-line(s) are considered. User defined and pre-defined TAD regions can be used to find potential gene set list for enrichment. For human, mouse and fruit fly, pre-defined TAD regions are supplied and custom TAD regions must be in a .BED format. Cell-lines are controlled by the cellline
parameter. Cell-lines can be listed with the listTAD
function.
a<-listTAD(TADName = tad_hg19)
mirGO<-mirnaGOEnricher(gene = brain_mirna, org_assembly='hg19', near=TRUE, isTADSearch = TRUE, TAD = tad_hg19)
User defined TAD regions can be used as an input for the TAD regions and gene enrichment can be performed based on these custom TAD regions. TAD
parameter is provided to input the bed formatted TAD regions .
Enrichment based on correlation analysis is conducted with the express
parameter. For a given cancer, pre-calculated Pearson correlation coefficient between miRNA-mRNA and miRNA-lncRNA expressions can be used to augment or filter the results. User can define the correlation coefficient cutoff and cancer of interest with minAbsCor
and cancer
parameter, respectively. The path of the pre-computed correlation database called as miRCancer.db must be given as an input to a databaseFile
parameter.
Two custom defined expression data can be utilized to augment or filter the coding genes that are found using the previous analysis. Expression data must be patient by gene data and headers should be gene names. If no header is defined, label1
and label2
must be used to define the headers. The correlation cutoff can be defined with minAbsCor
parameter.
As in GO enrichment analysis, pathway enrichment analysis can be performed based on gene neigbourhood, predicted targets, correlation coefficient and/or topological domain analysis. Each parameter is controlled by the related parameters and HUGO, ENSEMBL gene, ENSEMBL transcript gene, ENTREZ ID and miRNA name is supported for the input gene list. Non-coding genes can be annotated and enriched with KEGG, Reactome and Wiki pathways. Moreover, pathway enriched can be performed based on custom GMT file. GMT file supports both gene format of ENTREZ ID, Symbol and it is controlled by the isSymbol
parameter. genePathwayEnricher
and geneRegionPathwayEnricher
functions fulfill the pathway enrichment for the genes and regions expect the miRNA genes and for the miRNA mirnaPathwayEnricher
and mirnaRegionPathwayEnricher
is used.
ncRNAPathway<-genePathwayEnricher(gene = brain_disorder_ncRNA, org_assembly='hg19', isTADSearch = TRUE,TAD = tad_hg19, genetype = 'Ensembl_gene')
As a part of pathway analysis, NoRCE carries out hypergeometric test on/gene enrichment of a set of noncoding genes based on gene neigbourhood, target prediction, correlation coefficient and/or topological domain analysis for a given population coding gene set. Genes that form the population set should be provided with gmtName
and population dataset should be data frame.
Specific biotype of RNA have different characteristics. Biotype information of each gene is provided in GENCODE as a .GTF format for human and mouse genomes. filterBiotype
in NoRCE extracts the genes according to the given biotypes and curates the input list accordingly.
Pre-calculated Pearson correlation coefficient values based on TCGA data or correlation coefficient values measured from the user data can be used for filtering the gene list that are used for annotation and enrichment or applying annotation and enrichment analysis for using only co-expression analysis. Final set is determined by curating the gene list for a given p-value, correlation threshold and p-Adjusted-value.
corrbased
provides pre-measured Pearson correlation between lncRNA-mRNA and miRNA-mRNA interactions. Input list can be curated with the miRNA or lncRNA gene sets whose correlation values exceed the given threshold. Gene expressions are gathered from TCGA. In order to run this part, miRCancer.db database must be downloaded locally. Input list can also be a set of mRNA to allow enrichment analysis of miRNAs that are correlated with those mRNAs.
Correlation coefficient values between two custom expression data can be calculated and possible interactions can be identified with a predefined threshold filtering in NoRCE. calculateCorr
function takes two custom data and calculate the correlation between two genes by using correlation method that is defined by the corrMethod
parameter. It curates the genes based on the cut-offs of the p-value, correlation value and p-Adjusted-value.
dataCor <- calculateCorr(exp1 = mirna[,1:50], exp2 = mrna[,1:50])
Results can be finalized in a tabular format or with relevant graphs. Some plots are available only for the GO enrichment analysis, and some of them are available only for the pathway enrichment.
Information about the enrichment result can be written down in a tabular format as a txt file. Results are sorted according to the p-value or p-adjusted-value and all of the enrichment results or user defined number of top enrichment can be written down. This function is suitable for both GO and pathway enrichment.
Dot plot for the given number of top enrichments can be utilized for further analysis. In the dot plot, number of overlapped genes that are annoted with the enriched GO-term and occur in the input list, p-value or p-value adjustment value for the selected correction method per GO or pathway is provided.
Relationship between top enriched GO-terms and mRNA genes are shown in an undirected network. isNonCode
parameter checks whether list of enriched noncoding genes will be employ for the node name. Node name decision for the GO-term or GO-ID is determined by the takeID
parameter. For the node name decision for the GO-term, parameter must set to the FALSE
.
Relationship between top enriched GO-terms and noncoding genes are shown in an undirected network. isNonCode
parameter checks whether list of enriched noncoding genes will be employed for the node name. Node name decision for the GO-term or GO-ID is determined by the takeID
parameter. For the node name decision for the GO-term, parameter must set to the FALSE
.
Directed acyclic graph of the top enriched GO-terms can be illustareted with the getGoDag
function. Enriched GO-terms are marked with a range of color based on the p-value or p-adjusted-value. P-value ranges can be changed with the p_range
parameter.
Map of the enriched pathways can be demonstrated in the browser. Due to the limitation of the pathways, each pathway should be treated separately for the pathway map. Moreover, matching genes in the enrichment gene set for the corresponding pathway are marked with color.
If you use NoRCE, please cite.