Contents

1 GSEA algorithm

A common approach in analyzing gene expression profiles was identifying differential expressed genes that are deemed interesting. The enrichment analysis we demonstrated in Disease enrichment analysis vignette were based on these differential expressed genes. This approach will find genes where the difference is large, but it will not detect a situation where the difference is small, but evidenced in coordinated way in a set of related genes. Gene Set Enrichment Analysis (GSEA)1 directly addresses this limitation. All genes can be used in GSEA; GSEA aggregates the per gene statistics across genes within a gene set, therefore making it possible to detect situations where all genes in a predefined set change in a small but coordinated way. Since it is likely that many relevant phenotypic differences are manifested by small but consistent changes in a set of genes.

Genes are ranked based on their phenotypes. Given a priori defined set of gens S (e.g., genes shareing the same DO category), the goal of GSEA is to determine whether the members of S are randomly distributed throughout the ranked gene list (L) or primarily found at the top or bottom.

There are three key elements of the GSEA method:

We implemented GSEA algorithm proposed by Subramanian1. Alexey Sergushichev implemented an algorithm for fast GSEA analysis in the fgsea2 package.

In DOSE3, user can use GSEA algorithm implemented in DOSE or fgsea by specifying the parameter by="DOSE" or by="fgsea". By default, DOSE use fgsea since it is much more fast.

1.1 Leading edge analysis and core enriched genes

Leading edge analysis reports Tags to indicate the percentage of genes contributing to the enrichment score, List to indicate where in the list the enrichment score is attained and Signal for enrichment signal strength.

It would also be very interesting to get the core enriched genes that contribute to the enrichment.

DOSE supports leading edge analysis and report core enriched genes in GSEA analysis.

1.2 gseDO fuction

In the following example, in order to speedup the compilation of this document, only gene sets with size above 120 were tested and only 100 permutations were performed.

library(DOSE)
data(geneList)
y <- gseDO(geneList, 
           nPerm         = 100, 
           minGSSize     = 120,
           pvalueCutoff  = 0.2, 
           pAdjustMethod = "BH",
           verbose       = FALSE)
head(y, 3)
##                        ID               Description setSize
## DOID:0060084 DOID:0060084 cell type benign neoplasm     439
## DOID:1492       DOID:1492    eye and adnexa disease     459
## DOID:5614       DOID:5614               eye disease     450
##              enrichmentScore       NES     pvalue   p.adjust    qvalues
## DOID:0060084      -0.2837045 -1.261653 0.01190476 0.07548077 0.03289474
## DOID:1492         -0.3105160 -1.387227 0.01190476 0.07548077 0.03289474
## DOID:5614         -0.3125247 -1.393922 0.01190476 0.07548077 0.03289474
##              rank                   leading_edge
## DOID:0060084 1890 tags=23%, list=15%, signal=20%
## DOID:1492    1793 tags=22%, list=14%, signal=19%
## DOID:5614    1768 tags=22%, list=14%, signal=19%
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  core_enrichment
## DOID:0060084 37/268/2735/6752/3082/5914/2878/7508/100507436/3791/1301/1543/1027/1028/1958/7173/7201/8743/7450/596/947/2034/11197/4314/324/3964/196/3675/595/6843/573/2246/3912/1902/7507/2308/8829/6817/2934/1277/3953/55384/8773/8991/2057/4311/2247/6414/4588/5243/5468/358/1012/6469/9457/1733/81029/3952/126/79068/7058/4488/1009/4313/56034/3625/2944/6925/2099/3480/6387/5159/57496/1462/1289/2690/6571/4487/1191/54361/10752/5744/4922/2487/247/7043/1811/2952/367/3572/1287/4582/7031/3479/6424/4629/652/10647/5241/4969
## DOID:1492             3371/3082/5914/2878/4153/3791/23247/1543/80184/6750/1958/2098/7450/596/9187/2034/482/948/1490/1280/3931/5737/4314/4881/2261/3426/187/629/6403/7042/6785/7507/2934/5176/4060/1277/7078/5950/2057/727/10516/4311/2247/1295/358/10203/2192/582/10218/57125/3485/585/1675/6310/2202/4313/2944/4254/3075/1501/2099/3480/4653/1195/6387/3305/1471/857/4016/1909/4053/6678/1296/7033/4915/55812/1191/5654/10631/2152/2697/7043/2952/6935/2200/3572/7177/7031/3479/2006/10451/9370/771/3117/125/652/4693/5346/1524
## DOID:5614                       3082/5914/2878/4153/3791/23247/1543/80184/6750/1958/2098/7450/596/9187/2034/482/948/1490/1280/3931/5737/4314/4881/2261/3426/187/629/6403/7042/6785/7507/2934/5176/4060/1277/7078/5950/2057/727/10516/4311/2247/1295/358/10203/2192/582/10218/57125/3485/585/1675/6310/2202/4313/2944/4254/3075/1501/2099/3480/4653/6387/3305/1471/857/4016/1909/4053/6678/1296/7033/4915/55812/1191/5654/10631/2152/2697/7043/2952/6935/2200/3572/7177/7031/3479/2006/10451/9370/771/3117/125/652/4693/5346/1524

1.3 gseNCG fuction

ncg <- gseNCG(geneList,
              nPerm         = 100, 
              minGSSize     = 120,
              pvalueCutoff  = 0.2, 
              pAdjustMethod = "BH",
              verbose       = FALSE)
ncg <- setReadable(ncg, 'org.Hs.eg.db')
head(ncg, 3)
##                ID Description setSize enrichmentScore       NES     pvalue
## breast     breast      breast     133      -0.4869070 -1.862866 0.01282051
## lung         lung        lung     173      -0.3880662 -1.526082 0.01315789
## lymphoma lymphoma    lymphoma     188       0.2999589  1.289014 0.08000000
##            p.adjust    qvalues rank                   leading_edge
## breast   0.03947368 0.02770083 2930 tags=33%, list=23%, signal=26%
## lung     0.03947368 0.02770083 2775 tags=31%, list=22%, signal=25%
## lymphoma 0.16000000 0.11228070 2087 tags=21%, list=17%, signal=18%
##                                                                                                                                                                                                         core_enrichment
## breast                                                                                   KMT2A/ERBB3/SETD2/ARID1A/GPS2/NCOR1/RB1/MAP2K4/NF1/TP53/PIK3R1/STK11/CDKN1B/PTGFR/APC/CCND1/TRAF5/MAP3K1/ESR1/TBX3/FOXA1/GATA3
## lung     SETD2/ATXN3L/LRP1B/BRD3/ARID1A/INHBA/RB1/ADCY1/LYRM9/NF1/CTNNB1/TP53/SATB2/STK11/CTIF/CTNNA3/KDR/COL11A1/FLT3/APC/ADGRL3/FGFR3/NCAM2/DIP2C/APLNR/SLIT2/EPHA3/RUNX1T1/ZMYND10/ZFHX4/GLI3/TNN/PLSCR4/DACH1/ERBB4
## lymphoma                                        DUSP2/EZH2/PRDM1/MYC/ZWILCH/IKZF3/PLCG2/IDH2/HIST1H1C/MAGEC3/CD79B/ETV6/HIST1H1E/HIST1H1B/IRF8/CD28/SLC29A2/DUSP9/TNFAIP3/DNMT3A/SYK/TNF/BCR/HIST1H1D/DSC3/UBE2A/PABPC1

1.4 gseDGN fuction

dgn <- gseDGN(geneList,
              nPerm         = 100, 
              minGSSize     = 120,
              pvalueCutoff  = 0.2, 
              pAdjustMethod = "BH",
              verbose       = FALSE)
dgn <- setReadable(dgn, 'org.Hs.eg.db')
head(dgn, 3)
##                          ID         Description setSize enrichmentScore
## umls:C0011570 umls:C0011570   Mental Depression     483      -0.2874181
## umls:C0011581 umls:C0011581 Depressive disorder     464      -0.2963136
## umls:C0151744 umls:C0151744 Myocardial Ischemia     418      -0.3013524
##                     NES     pvalue  p.adjust    qvalues rank
## umls:C0011570 -1.284093 0.01219512 0.1209677 0.07809847 2587
## umls:C0011581 -1.318177 0.01219512 0.1209677 0.07809847 2587
## umls:C0151744 -1.335898 0.01219512 0.1209677 0.07809847 2309
##                                 leading_edge
## umls:C0011570 tags=25%, list=21%, signal=20%
## umls:C0011581 tags=25%, list=21%, signal=21%
## umls:C0151744 tags=26%, list=18%, signal=22%
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            core_enrichment
## umls:C0011570 ETS2/RGN/GRIA1/PTGS1/NLGN1/PDE4A/ADAMTS2/EHD3/NR5A1/SORCS3/A2M/KCNQ1/CRY1/ADRB2/FZD1/MYOM2/ADCY1/POU6F1/MAPK3/BICC1/SLC6A4/AHI1/TP53/DBP/SLC12A2/BDNF/NR3C1/SRSF5/PCLO/GABRA6/WWC1/IL5/GLUL/ELK3/GAD1/RARA/GRM5/ASAH1/IMPACT/CHRM2/WFS1/TSPAN31/ARGLU1/HP/PVALB/HTR1A/GPM6A/CYP2A6/DUSP1/NLGN4Y/F2R/CD36/DBH/BECN1/CCND1/PER3/OXTR/SGCE/CFB/CLASP2/LPAR1/NRP1/AVPR1B/ARSD/GC/FAAH/BHLHE41/FGF2/CD1C/ABCB1/PPARG/SRPX/RAPGEF3/CRHBP/CDH13/HSPA2/BHLHE40/PDE1A/LEP/FTO/PER2/ALPK1/GSTM1/DIXDC1/XBP1/TCF4/ESR1/IGF1R/NTF3/CACNA1C/NR3C2/SLC18A2/NTRK2/RAPGEF4/F3/AGTR1/TAC1/GSTT1/AR/UCN/FBN1/MAOA/CARTPT/TAT/ADRA2A/MUC1/TGFBR3/TPH1/IGF1/MAOB/ADIPOQ/TBC1D9/ADH1B/EMX2/MAPT/CRY2/GATA3/TFAP2B
## umls:C0011581                       ETS2/HDAC5/RGN/GRIA1/PTGS1/PDE4A/SNCA/ADAMTS2/EHD3/NR5A1/SORCS3/CRY1/ADRB2/FZD1/MYOM2/ADCY1/POU6F1/MAPK3/BICC1/SLC6A4/AHI1/TP53/RNF103/SLC12A2/BDNF/NR3C1/SRSF5/PCLO/GABRA6/WWC1/IL5/GLUL/ELK3/GAD1/RARA/GRM5/KDR/ASAH1/IMPACT/CHRM2/WFS1/TSPAN31/HP/PVALB/HTR1A/BCL2/GPM6A/CYP2A6/DUSP1/NLGN4Y/F2R/CD36/NGFR/NPY2R/DBH/BECN1/CCND1/OXTR/SGCE/SELP/NGF/LPAR1/NRP1/AVPR1B/IFT88/ARSD/FAAH/NEFL/FGF2/CD1C/ABCB1/SRPX/RAPGEF3/CRHBP/HSPA2/LEP/FTO/PER2/ALPK1/GSTM1/DIXDC1/XBP1/ESR1/IGF1R/NTF3/CACNA1C/NR3C2/SLC18A2/NTRK2/SPDEF/RAPGEF4/ALB/NPY1R/F3/AGTR1/TAC1/AR/UCN/FBN1/MAOA/CARTPT/TAT/ADRA2A/MUC1/TGFBR3/TPH1/IGF1/ABAT/MAOB/ADIPOQ/TBC1D9/ADH1B/CRY2/GATA3/TFAP2B
## umls:C0151744                                                            ADRB2/HSPB1/ABCC6/ADD1/PECAM1/MAPK3/GRK5/VEGFC/AMPD1/AES/F7/HSPB2/ENTPD1/ID3/PRKAA2/ATP2B1/SOD3/PRKAB1/AMH/STAT6/RGCC/RXRG/GDF10/SLC9A1/HGF/SERPINA3/MBL2/KDR/EGR1/HSPB6/HBB/STAT5A/EEF1A2/VWF/BCL2/CD34/DUSP1/PRKG1/CD36/CTGF/MMP3/BECN1/NPR1/CCND1/GATM/LPA/EDIL3/RTN1/APLNR/PYGM/SELP/FGF1/NEDD4/ID1/ALDH6A1/FOXO1/SULT1A1/SNAP23/FGF2/DUSP6/ABCB1/PPARG/PDK4/SHH/HSPA2/BHLHE40/LPL/THBD/COL5A2/UGCG/KL/ADH1C/GSTM2/THBS2/PER2/ATXN1/MMP2/TXNIP/KITLG/CFH/ESR1/CXCL12/CIRBP/EDNRA/GHR/SPARC/GPD1L/ENPP1/ALB/F13A1/MEOX2/F3/AGTR1/ZEB1/TNFRSF11B/UCN/DCN/LTC4S/IL6ST/EPHX2/THBS4/IGF1/FXYD1/SFRP4/ELN/RAMP2/ADIPOQ/ADH1B/HMGCS2

2 Visualization

2.1 cnetplot

cnetplot(ncg, categorySize="pvalue", foldChange=geneList)

2.2 enrichMap

enrichMap(y, n=20)

2.3 gseaplot

gseaplot(y, geneSetID = y$ID[1], title=y$Description[1])

References

1. Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102, 15545–15550 (2005).

2. S., A. An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. biorxiv doi:10.1101/060012

3. Yu, G., Wang, L.-G., Yan, G.-R. & He, Q.-Y. DOSE: An r/bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics 31, 608–609 (2015).