Introduction
From copy-number alteration (CNA), RNA-seq data, and pre-defined biological
pathway network, MPAC computes Inferred Pathway Levels (IPLs) for pathway
entities, clusters patient samples by their enriched GO terms, and identifies
key pathway proteins for each patient cluster.
Installation
From GitHub
Start R and enter:
devtools::install_github('pliu55/MPAC')
From Bioconductor
Start R and enter:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version='devel')
BiocManager::install("MPAC")
For different versions of R, please refer to the appropriate
Bioconductor release.
Required R packages for this vignette
Please use the following code to load required packages for running this
vignette. Many intermediate objects in this vignette are SummarizedExperiment
objects.
require(SummarizedExperiment)
require(MPAC)
Main functions
ppCnInp()
: prepare CNA states from real CNA data
ppRnaInp()
: prepare RNA states from read RNA-seq data
ppRealInp()
: prepare CNA and RNA states from read CNA and RNA-seq data
ppPermInp():
prepare permuted CNA and RNA states
runPrd()
: run PARADIGM to compute IPLs from real CNA and RNA states
runPermPrd()
: run PARADIGM to compute IPLs from permuted CNA and RNA states
colRealIPL()
: collect IPLs from real data
colPermIPL()
: collect IPLs from permuted data
fltByPerm()
: filter IPLs from real data by IPLs from permuted data
subNtw()
: find the largest pathway network subset by filtered IPLs
ovrGMT()
: do gene set over-representation on a sample’s largest pathway
subset
clSamp()
: cluster samples by their gene set over-representation results
conMtf()
: find consensus pathway motifs within a cluster
pltNeiStt()
: plot a heatmap to identify a protein’s pathway determinants
Inferred pathway levels
PARADIGM binary
MPAC uses PARADIGM developed by Vaske et al. to predict
pathway levels. PARADIGM Binary is available to download at Github for
Linux and MacOS.
IPLs from real data
Example below shows input and output files for running PARADIGM on data from
real samples.
PARADIGM will generate multiple output files. All the file names will start with
the sample ID and a suffix indicating their types:
- Inferred pathway levles (IPLs): file with a suffix
ipl.txt
- Output log: file with a suffix
run.out
- Error log: file with a suffix
run.err
- Other auxiliary files, which can be skipped under common usage
real_se <- system.file('extdata/TcgaInp/inp_real.rds', package='MPAC') |>
readRDS()
fpth <- system.file('extdata/Pth/tiny_pth.txt', package='MPAC')
outdir <- tempdir()
paradigm_bin <- '/path/to/PARADIGM'
runPrd(real_se, fpth, outdir, paradigm_bin, sampleids=c('TCGA-CV-7100'))
PARADIGM on permuted data
For permuted input, PARADIGM will generate output in the same fashion as on
real data above, except that one permutation corresponds to one output folder
named as p$(index)
, where $index
is the index of that permutation. For
example, three permutations will generate folders p1
, p2
, and p3
.
permll <- system.file('extdata/TcgaInp/inp_perm.rds', package='MPAC') |>
readRDS()
fpth <- system.file('extdata/Pth/tiny_pth.txt', package='MPAC')
outdir <- tempdir()
paradigm_bin <- '/path/to/PARADIGM'
pat <- 'TCGA-CV-7100'
runPermPrd(permll, fpth, outdir, paradigm_bin, sampleids=c(pat))
Collecting IPLs
MPAC has PARADIGM run on individual sample in parallel to speed up calculation.
For the convenience of downstream analysis, PARADIGM results will be collected
and put together for all samples.
From PARADIGM on real data
indir <- system.file('/extdata/runPrd/', package='MPAC')
colRealIPL(indir) |> head()
From PARADIGM on permuted data
indir <- system.file('/extdata/runPrd/', package='MPAC')
n_perms <- 3
colPermIPL(indir, n_perms) |> head()
Filtering IPLs
MPAC uses PARADIGM runs on permuted data to generate a background distribution
of IPLs. This distribution is used to filter IPLs from real data to remove
those could be observed by chance.
realdt <- system.file('extdata/fltByPerm/real.rds', package='MPAC') |> readRDS()
permdt <- system.file('extdata/fltByPerm/perm.rds', package='MPAC') |> readRDS()
fltByPerm(realdt, permdt) |> head()
Find the largest pathway subset
MPAC decomposes the original pathway and identify the largest pathway subset
with all of its entities having filtered IPLs. This sub-pathway allows the user
to focus on the most altered pathway network. Note that, because the set of
entities having filtered IPLs are often different between samples, the
pathway subset will be different between samples as well and represent
sample-specific features.
fltmat <- system.file('extdata/fltByPerm/flt_real.rds', package='MPAC') |>
readRDS()
fpth <- system.file('extdata/Pth/tiny_pth.txt', package='MPAC')
fgmt <- system.file('extdata/ovrGMT/fake.gmt', package='MPAC')
subNtw(fltmat, fpth, fgmt, min_n_gmt_gns=1)
Gene set over-representation
To understand the biological functions of a sample’s largest sub-pathway, MPAC
performs gene set over-representation analysis on genes with non-zero IPLs in
the sub-pathway.
subntwl <- system.file('extdata/subNtw/subntwl.rds', package='MPAC') |>readRDS()
fgmt <- system.file('extdata/ovrGMT/fake.gmt', package='MPAC')
omic_gns <- system.file('extdata/TcgaInp/inp_focal.rds', package='MPAC') |>
readRDS() |> rownames()
ovrGMT(subntwl, fgmt, omic_gns)
Cluster samples by pathway over-representation
With gene set over-representation adjusted p-values, MPAC can cluster samples
as a way to investigate shared features between samples in the same cluster.
Note that, due to randomness introduced in the louvain clustering in igraph R
package version 1.3 (reported in its
Github issue #539), it is
recommended to run clustering multiple times to evaluate its variation. The
clSamp()
function has an argument, n_random_runs
, to specify the number of
random clustering jobs to run.
ovrmat <- system.file('extdata/clSamp/ovrmat.rds', package='MPAC') |> readRDS()
clSamp(ovrmat, n_random_runs=5)
Find consensus pathway motifs within a cluster
From the clustering results, MPAC can find consensus pathway motifs from samples
within the same clusters. These motifs will represent cluster-specific pathway
features. They often contain key proteins for further analysis.
subntwl <- system.file('extdata/conMtf/subntwl.rds', package='MPAC') |>readRDS()
omic_gns <- system.file('extdata/TcgaInp/inp_focal.rds', package='MPAC') |>
readRDS() |> rownames()
conMtf(subntwl, omic_gns, min_mtf_n_nodes=50)
Diagnostic plot
MPAC implemented a diagnostic function to plot a heatmap of the omic and
pathway states of a protein as well as the pathway states of this protein’s
pathway neighbors. This heatmap facilitates the identification of pathway
determinants of this protein.
protein <- 'CD86'
fpth <- system.file('extdata/Pth/tiny_pth.txt', package='MPAC')
real_se <- system.file('extdata/pltNeiStt/inp_real.rds', package='MPAC') |>
readRDS()
fltmat <- system.file('extdata/pltNeiStt/fltmat.rds', package='MPAC') |>
readRDS()
pltNeiStt(real_se, fltmat, fpth, protein)
Session Info
Below is the output of sessionInfo()
on the system on which this document
was compiled.
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8
#> [2] LC_NUMERIC=C
#> [3] LC_TIME=en_GB
#> [4] LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8
#> [6] LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8
#> [8] LC_NAME=C
#> [9] LC_ADDRESS=C
#> [10] LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8
#> [12] LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils
#> [6] datasets methods base
#>
#> other attached packages:
#> [1] SummarizedExperiment_1.36.0
#> [2] Biobase_2.66.0
#> [3] GenomicRanges_1.58.0
#> [4] GenomeInfoDb_1.42.0
#> [5] IRanges_2.40.0
#> [6] S4Vectors_0.44.0
#> [7] BiocGenerics_0.52.0
#> [8] MatrixGenerics_1.18.0
#> [9] matrixStats_1.4.1
#> [10] MPAC_1.0.0
#> [11] data.table_1.16.2
#>
#> loaded via a namespace (and not attached):
#> [1] rlang_1.1.4
#> [2] magrittr_2.0.3
#> [3] clue_0.3-65
#> [4] GetoptLong_1.0.5
#> [5] compiler_4.4.1
#> [6] png_0.1-8
#> [7] systemfonts_1.1.0
#> [8] vctrs_0.6.5
#> [9] pkgconfig_2.0.3
#> [10] shape_1.4.6.1
#> [11] crayon_1.5.3
#> [12] fastmap_1.2.0
#> [13] magick_2.8.5
#> [14] XVector_0.46.0
#> [15] scuttle_1.16.0
#> [16] utf8_1.2.4
#> [17] rmarkdown_2.28
#> [18] UCSC.utils_1.2.0
#> [19] xfun_0.48
#> [20] bluster_1.16.0
#> [21] zlibbioc_1.52.0
#> [22] cachem_1.1.0
#> [23] beachmat_2.22.0
#> [24] jsonlite_1.8.9
#> [25] highr_0.11
#> [26] DelayedArray_0.32.0
#> [27] BiocParallel_1.40.0
#> [28] irlba_2.3.5.1
#> [29] parallel_4.4.1
#> [30] cluster_2.1.6
#> [31] R6_2.5.1
#> [32] bslib_0.8.0
#> [33] RColorBrewer_1.1-3
#> [34] limma_3.62.0
#> [35] jquerylib_0.1.4
#> [36] Rcpp_1.0.13
#> [37] bookdown_0.41
#> [38] iterators_1.0.14
#> [39] knitr_1.48
#> [40] Matrix_1.7-1
#> [41] splines_4.4.1
#> [42] igraph_2.1.1
#> [43] tidyselect_1.2.1
#> [44] abind_1.4-8
#> [45] yaml_2.3.10
#> [46] doParallel_1.0.17
#> [47] codetools_0.2-20
#> [48] lattice_0.22-6
#> [49] tibble_3.2.1
#> [50] evaluate_1.0.1
#> [51] survival_3.7-0
#> [52] fitdistrplus_1.2-1
#> [53] circlize_0.4.16
#> [54] pillar_1.9.0
#> [55] foreach_1.5.2
#> [56] generics_0.1.3
#> [57] ggplot2_3.5.1
#> [58] munsell_0.5.1
#> [59] scales_1.3.0
#> [60] glue_1.8.0
#> [61] metapod_1.14.0
#> [62] tools_4.4.1
#> [63] BiocNeighbors_2.0.0
#> [64] ScaledMatrix_1.14.0
#> [65] fgsea_1.32.0
#> [66] locfit_1.5-9.10
#> [67] scran_1.34.0
#> [68] Cairo_1.6-2
#> [69] fastmatch_1.1-4
#> [70] cowplot_1.1.3
#> [71] grid_4.4.1
#> [72] edgeR_4.4.0
#> [73] colorspace_2.1-1
#> [74] SingleCellExperiment_1.28.0
#> [75] GenomeInfoDbData_1.2.13
#> [76] BiocSingular_1.22.0
#> [77] cli_3.6.3
#> [78] rsvd_1.0.5
#> [79] fansi_1.0.6
#> [80] S4Arrays_1.6.0
#> [81] svglite_2.1.3
#> [82] ComplexHeatmap_2.22.0
#> [83] dplyr_1.1.4
#> [84] gtable_0.3.6
#> [85] sass_0.4.9
#> [86] digest_0.6.37
#> [87] SparseArray_1.6.0
#> [88] dqrng_0.4.1
#> [89] rjson_0.2.23
#> [90] htmltools_0.5.8.1
#> [91] lifecycle_1.0.4
#> [92] httr_1.4.7
#> [93] GlobalOptions_0.1.2
#> [94] statmod_1.5.0
#> [95] MASS_7.3-61