This vignette describes how to infer transcription factor activity from bulk transcriptome data by using DoRothEA’s curated regulons with viper.
dorothea 1.2.2
DoRothEA is a comprehensive resource containing a curated collection of transcription factors (TFs) and its transcriptional targets. The set of genes regulated by a specific transcription factor is known as regulon. DoRothEA’s regulons were gathered from different types of evidence. Each TF-target interaction is defined by a confidence level based on the number of supporting evidence. The confidence levels ranges from A (highest confidence) to E (lowest confidence) (Garcia-Alonso et al. 2019). While DoRothEA was originally developed for the application on human data it can be applied also on mouse data with comparable performace but better coverage than dedicated mouse regulons (Holland, Szalai, and Saez-Rodriguez 2019).
DoRothEA regulons are usually coupled with the statistical method VIPER (Alvarez et al. 2016). In this context, TF activities are computed based on the mRNA expression levels of its targets. We therefore can consider TF activity as a proxy of a given transcriptional state (Dugourd and Saez-Rodriguez 2019). However, it is up to the user to decide which statistcal method to use. Alternatives could be for instance classical Gene Set Enrichment Analysis or simply mean statistic.
First of all, you need a current version of R (http://www.r-project.org). In addition you need dorothea, a freely available package deposited on http://bioconductor.org/ and https://github.com/saezlab/dorothea.
You can install it by running the following commands on an R console:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("dorothea")
We also load here the packages required to run this vignette
## We load the required packages
library(dorothea)
library(bcellViper)
library(dplyr)
library(viper)
According to the vignette from the viper package we demonstrate how to combine viper with regulons from DoRothEA.
Similiar to the viper vignette we use the gene expression matrix
from the bcellViper package. Click here
for more information about the gene expression matrix. The regulons from
DoRothEA are provided within the dorothea package and can be acessed via the
data()
function. As the gene expression matrix contains human data we also
load the human version of DoRothEA.
# accessing expression data from bcellViper
data(bcellViper, package = "bcellViper")
# acessing (human) dorothea regulons
# for mouse regulons: data(dorothea_mm, package = "dorothea")
data(dorothea_hs, package = "dorothea")
We implemented a wrapper for the viper function that can deal with different input types such as matrix, dataframe, ExpressionSet or Seurat objects (see dedicated vignette for single-cell analysis). We subset DoRothEA to the confidence levels A and B to include only the high quality regulons.
regulons = dorothea_hs %>%
filter(confidence %in% c("A", "B"))
tf_activities <- run_viper(dset, regulons,
options = list(method = "scale", minsize = 4,
eset.filter = FALSE, cores = 1,
verbose = FALSE))
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] viper_1.24.0 dplyr_1.0.5 bcellViper_1.26.0
## [4] Biobase_2.50.0 BiocGenerics_0.36.1 dorothea_1.2.2
## [7] BiocStyle_2.18.1
##
## loaded via a namespace (and not attached):
## [1] bslib_0.2.4 compiler_4.0.5 pillar_1.6.0
## [4] BiocManager_1.30.12 jquerylib_0.1.3 mixtools_1.2.0
## [7] class_7.3-18 tools_4.0.5 digest_0.6.27
## [10] lattice_0.20-41 jsonlite_1.7.2 evaluate_0.14
## [13] lifecycle_1.0.0 tibble_3.1.1 pkgconfig_2.0.3
## [16] rlang_0.4.10 Matrix_1.3-2 DBI_1.1.1
## [19] yaml_2.2.1 xfun_0.22 e1071_1.7-6
## [22] stringr_1.4.0 knitr_1.32 generics_0.1.0
## [25] vctrs_0.3.7 sass_0.3.1 grid_4.0.5
## [28] segmented_1.3-3 tidyselect_1.1.0 glue_1.4.2
## [31] R6_2.5.0 fansi_0.4.2 survival_3.2-10
## [34] rmarkdown_2.7 bookdown_0.21 kernlab_0.9-29
## [37] purrr_0.3.4 magrittr_2.0.1 splines_4.0.5
## [40] MASS_7.3-53.1 htmltools_0.5.1.1 ellipsis_0.3.1
## [43] assertthat_0.2.1 KernSmooth_2.23-18 utf8_1.2.1
## [46] proxy_0.4-25 stringi_1.5.3 crayon_1.4.1
Alvarez, Mariano J, Yao Shen, Federico M Giorgi, Alexander Lachmann, B Belinda Ding, B Hilda Ye, and Andrea Califano. 2016. “Functional Characterization of Somatic Mutations in Cancer Using Network-Based Inference of Protein Activity.” Nature Genetics 48 (8): 838–47. https://doi.org/10.1038/ng.3593.
Dugourd, Aurelien, and Julio Saez-Rodriguez. 2019. “Footprint-Based Functional Analysis of Multiomic Data.” Current Opinion in Systems Biology 15 (June): 82–90. https://doi.org/10.1016/j.coisb.2019.04.002.
Garcia-Alonso, Luz, Christian H. Holland, Mahmoud M. Ibrahim, Denes Turei, and Julio Saez-Rodriguez. 2019. “Benchmark and Integration of Resources for the Estimation of Human Transcription Factor Activities.” Genome Research 29 (8): 1363–75. https://doi.org/10.1101/gr.240663.118.
Holland, Christian H., Bence Szalai, and Julio Saez-Rodriguez. 2019. “Transfer of Regulatory Knowledge from Human to Mouse for Functional Genomics Analysis.” Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms, September, 194431. https://doi.org/10.1016/j.bbagrm.2019.194431.