knitr::opts_knit$set(width = 80)

Noncoding RNA Set Cis Annotation and Enrichment

NoRCE package systematically performs annotation and enrichment analysis for a set of regulatory non-coding RNA genes. NoRCE analyses are based on spatially proximal mRNAs at a certain distance for a set of non-coding RNA genes or regions of interest. Moreover, specific analyses such as biotype selection, miRNA-mRNA co-expression, miRNA-mRNA target prediction can be performed for filtering. Besides, it allows to curate the gene set according to the topologically associating domain (TAD) regions.

Supported Assemblies and Organisms

Homo Sapiens (hg19 and hg38)
Mus Musculus (mm10)
Rattus Norvegicus (rn6)
Drosophila Melanogaster (dm6)
Danio Rerio (danRer10)
Caenorhabditis Elegans (ce11)
Saccharomyces Cerevisiae (sc3)

Installation

To install the NoRCE

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("NoRCE")

library(NoRCE)

GO Enrichment Analysis

GO enrichment analysis can be performed based on gene neigbourhood, predicted targets, co-expression values and/or topological domain analysis. HUGO, ENSEMBL gene, ENSEMBL transcript gene, ENTREZ ID and miRBase names are supported formats for the input gene list. Moreover, NoRCE accepts a list of genomic regions. The input genomic region list should be in a .bed format. Each analysis is controlled by corresponding parameters. When related parameters are set, the gene set resulting from the intersection of those analysis will be considered for enrichment analysis (Co-expression analysis can be augmented with other analysis). GO enrichment analysis are carried out with geneGOEnricher and geneRegionGOEnricher functions. Also, miRNA gene enrichment are carried out with mirnaGOEnricher and mirnaRegionGOEnricher functions. Species assembly must be defined using the org_assembly parameter. NoRCE allows the user to use background gene set. The background gene set and the format of the gene set should be defined.

Enrichment Analysis Based on Gene Neighbourhood

When the near parameter is set to TRUE, the closest genes for the input gene list are retrieved. The gene neighbourhood taken into consideration is controlled by the upstream and downstream parameters. By default, all genes that fall into 10 kb upstream and downstream of the input genes are retrieved. Also, using searchRegion parameter, the analysis can be performed for only those genes whose exon or intron regions fall into the specified upstream and downstream range of the input genes.

library(NoRCE)

Moreover, NoRCE can convert .txt file or data frame to a .bed formatted file to make it available for region based analysis (readbed function).

Enrichment Analysis Based on Target Prediction

For a set of miRNA genes, target prediction is controlled by the target parameter. Once this parameter is set to TRUE, TargetScan prediction is used to curate the gene list that will be enriched.

mirGO<-mirnaGOEnricher(gene = brain_mirna, org_assembly='hg19', near=TRUE, target=TRUE)

The above example shows that the GO enrichment is performed based on neighbouring coding genes of brain miRNA targeted by the same brain miRNA gene set.

Enrichment Analysis Based on Topological Associating Domain Analysis

Gene annotation based on topologically associating domain regions are conducted whether ncRNAs fall into the TAD regions and coding gene assignment only those that are in the same TAD region are included in the neighborhood coding gene set. If cell-line(s) for TAD region is specified, only regions that are associated with the given cell-line(s) are considered. User defined and pre-defined TAD regions can be used to find potential gene set list for enrichment. For human, mouse and fruit fly, pre-defined TAD regions are supplied and custom TAD regions must be in a .BED format. Cell-lines are controlled by the cellline parameter. Cell-lines can be listed with the listTAD function.

a<-listTAD(TADName = tad_hg19)

mirGO<-mirnaGOEnricher(gene = brain_mirna, org_assembly='hg19', near=TRUE, isTADSearch = TRUE, TAD = tad_hg19)

User defined TAD regions can be used as an input for the TAD regions and gene enrichment can be performed based on these custom TAD regions. TAD parameter is provided to input the bed formatted TAD regions .

Enrichment Analysis Based on Correlation Analysis

Enrichment based on correlation analysis is conducted with the express parameter. For a given cancer, pre-calculated Pearson correlation coefficient between miRNA-mRNA and miRNA-lncRNA expressions can be used to augment or filter the results. User can define the correlation coefficient cutoff and cancer of interest with minAbsCor and cancer parameter, respectively. The path of the pre-computed correlation database called as miRCancer.db must be given as an input to a databaseFile parameter.

Two custom defined expression data can be utilized to augment or filter the coding genes that are found using the previous analysis. Expression data must be patient by gene data and headers should be gene names. If no header is defined, label1 and label2 must be used to define the headers. The correlation cutoff can be defined with minAbsCor parameter.

Pathway Enrichment

As in GO enrichment analysis, pathway enrichment analysis can be performed based on gene neigbourhood, predicted targets, correlation coefficient and/or topological domain analysis. Each parameter is controlled by the related parameters and HUGO, ENSEMBL gene, ENSEMBL transcript gene, ENTREZ ID and miRNA name is supported for the input gene list. Non-coding genes can be annotated and enriched with KEGG, Reactome and Wiki pathways. Moreover, pathway enriched can be performed based on custom GMT file. GMT file supports both gene format of ENTREZ ID, Symbol and it is controlled by the isSymbol parameter. genePathwayEnricher and geneRegionPathwayEnricher functions fulfill the pathway enrichment for the genes and regions expect the miRNA genes and for the miRNA mirnaPathwayEnricher and mirnaRegionPathwayEnricher is used.

ncRNAPathway<-genePathwayEnricher(gene = brain_disorder_ncRNA, org_assembly='hg19', isTADSearch = TRUE,TAD = tad_hg19, genetype = 'Ensembl_gene')

Gene Enrichment Analysis

As a part of pathway analysis, NoRCE carries out hypergeometric test on/gene enrichment of a set of noncoding genes based on gene neigbourhood, target prediction, correlation coefficient and/or topological domain analysis for a given population coding gene set. Genes that form the population set should be provided with gmtName and population dataset should be data frame.

Pre-processing Steps

Filter ncRNA Genes Based on Biotype Subsets

Specific biotype of RNA have different characteristics. Biotype information of each gene is provided in GENCODE as a .GTF format for human and mouse genomes. filterBiotype in NoRCE extracts the genes according to the given biotypes and curates the input list accordingly.

Co-expression Analysis

Pre-calculated Pearson correlation coefficient values based on TCGA data or correlation coefficient values measured from the user data can be used for filtering the gene list that are used for annotation and enrichment or applying annotation and enrichment analysis for using only co-expression analysis. Final set is determined by curating the gene list for a given p-value, correlation threshold and p-Adjusted-value.

Co-expression Analysis in The Cancer Genome Atlas

corrbased provides pre-measured Pearson correlation between lncRNA-mRNA and miRNA-mRNA interactions. Input list can be curated with the miRNA or lncRNA gene sets whose correlation values exceed the given threshold. Gene expressions are gathered from TCGA. In order to run this part, miRCancer.db database must be downloaded locally. Input list can also be a set of mRNA to allow enrichment analysis of miRNAs that are correlated with those mRNAs.

Co-expression Analysis in Custom Expression Data

Correlation coefficient values between two custom expression data can be calculated and possible interactions can be identified with a predefined threshold filtering in NoRCE. calculateCorr function takes two custom data and calculate the correlation between two genes by using correlation method that is defined by the corrMethod parameter. It curates the genes based on the cut-offs of the p-value, correlation value and p-Adjusted-value.

dataCor <- calculateCorr(exp1 =  mirna[,1:50], exp2 = mrna[,1:50])

Visualization

Results can be finalized in a tabular format or with relevant graphs. Some plots are available only for the GO enrichment analysis, and some of them are available only for the pathway enrichment.

Tabular Format

Information about the enrichment result can be written down in a tabular format as a txt file. Results are sorted according to the p-value or p-adjusted-value and all of the enrichment results or user defined number of top enrichment can be written down. This function is suitable for both GO and pathway enrichment.

Dot Plot

Dot plot for the given number of top enrichments can be utilized for further analysis. In the dot plot, number of overlapped genes that are annoted with the enriched GO-term and occur in the input list, p-value or p-value adjustment value for the selected correction method per GO or pathway is provided.

GO:mRNA Network

Relationship between top enriched GO-terms and mRNA genes are shown in an undirected network. isNonCode parameter checks whether list of enriched noncoding genes will be employ for the node name. Node name decision for the GO-term or GO-ID is determined by the takeID parameter. For the node name decision for the GO-term, parameter must set to the FALSE.

GO:ncRNA Gene Network

Relationship between top enriched GO-terms and noncoding genes are shown in an undirected network. isNonCode parameter checks whether list of enriched noncoding genes will be employed for the node name. Node name decision for the GO-term or GO-ID is determined by the takeID parameter. For the node name decision for the GO-term, parameter must set to the FALSE.

GO DAG Network

Directed acyclic graph of the top enriched GO-terms can be illustareted with the getGoDag function. Enriched GO-terms are marked with a range of color based on the p-value or p-adjusted-value. P-value ranges can be changed with the p_range parameter.

KEGG and Reactome Pathway Map

Map of the enriched pathways can be demonstrated in the browser. Due to the limitation of the pathways, each pathway should be treated separately for the pathway map. Moreover, matching genes in the enrichment gene set for the corresponding pathway are marked with color.

Citation

If you use NoRCE, please cite.