JohnsonKinaseData 1.0.0
Johnson et al. (Johnson et al. 2023) published for 303 human serine/threonine specific kinases substrate affinities in the form of position-specific weight matrices (PWMs). The JohnsonKinaseData package provides access to these PWMs including basic functionality to match user-provided phosphosites against all kinase PWMs. The aim is to give the user a simple way of predicting kinase-substrate relationships based on PWM-phosphosite matching. These predictions can serve to infer kinase activity from differential phospho-proteomic data.
The JohnsonKinaseData package can be install using the following code:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ExperimentHub")
BiocManager::install("JohnsonKinaseData")
The kinase PWMs can be accessed with the getKinasePWM()
function. It returns a list with 303 human serine/threonine specific PWMs.
library(JohnsonKinaseData)
pwms <- getKinasePWM()
#> see ?JohnsonKinaseData and browseVignettes('JohnsonKinaseData') for documentation
#> loading from cache
head(names(pwms))
#> [1] "AAK1" "ACVR2A" "ACVR2B" "AKT1" "AKT2" "AKT3"
Each PWM is a numeric matrix with amino acids as rows and positions as columns. Matrix elements are log2-odd scores measuring differential affinity relative to a random frequency of amino acids (Johnson et al. 2023).
pwms[["PLK2"]]
#> -5 -4 -3 -2 -1 0
#> A -0.036821844 -0.277009455 -0.83856373 -0.4463446 -0.186229068 NA
#> C 0.009633819 -0.034899138 -0.24690897 0.4799548 -0.467333943 NA
#> D 0.549718451 0.795766948 0.82130204 1.6459783 1.329410671 NA
#> E 0.614756952 1.127897364 2.86862751 1.2354207 0.689388627 NA
#> F 0.449006639 0.078199920 -0.41273103 -0.9773836 -0.602963759 NA
#> G 0.326652391 -0.151522275 -0.77793738 -0.6106535 -0.767584829 NA
#> H 0.148478616 -0.172018427 -0.67807191 -0.3219281 0.214995135 NA
#> I -0.311864412 -0.172018427 -1.65154094 -0.8406292 -0.519941731 NA
#> K -0.469329925 -0.647467443 -1.77349147 -1.7345631 -0.656307931 NA
#> L -0.245197993 0.144568518 -0.71785677 0.3032255 -0.511690664 NA
#> M -0.248793390 -0.206894852 -0.38948891 0.3123167 -0.194955239 NA
#> N -0.065823218 0.002018361 -0.54077824 0.9076598 0.307545102 NA
#> P -0.066578437 -0.108114249 -1.05139915 -0.4418303 0.542703792 NA
#> Q -0.530739153 -0.241782116 -0.48096139 -0.1800049 -0.264477823 NA
#> R -0.528032212 -0.715485867 -1.58640592 -1.1059389 -0.339345148 NA
#> S -0.065823218 -0.172018427 -0.77793738 -0.4463446 -0.194955239 0.00000000
#> T -0.065823218 -0.172018427 -0.77793738 -0.4463446 -0.194955239 -0.09585422
#> V -0.401253684 -0.367545642 -1.89324968 -1.3562361 -0.152804813 NA
#> W -0.034160317 -0.140189435 -1.05799229 -1.1256358 -1.093879047 NA
#> Y 0.083383588 -0.242293983 -1.12217724 -0.5640514 -0.004045212 NA
#> s 0.059632160 0.750692249 0.06873959 0.1075540 0.101650076 NA
#> t 0.059632160 0.750692249 0.06873959 0.1075540 0.101650076 NA
#> y 0.707878133 0.679784089 0.26351522 -0.1321035 2.184534212 NA
#> 1 2 3 4
#> A -0.812485602 -0.109981413 -0.53574997 -0.33515312
#> C -0.310253562 0.145612247 0.00000000 0.04362448
#> D -0.942307133 1.124791311 1.17957474 0.98389654
#> E -0.201410261 1.154194325 1.37389873 1.13638828
#> F 1.906390375 -0.122334266 -0.21541226 -0.12610808
#> G -0.918660373 -0.888701547 -0.30329392 -0.24827921
#> H -0.671163536 -0.002165667 -0.13020754 -0.01785518
#> I 0.374065718 -0.042308229 -0.25963366 -0.03785821
#> K -1.145924538 -2.141143704 -1.48196851 -1.17755536
#> L 0.032665112 -0.500013836 -0.19379970 -0.02664588
#> M 0.833902077 0.008200014 -0.23463499 -0.20273795
#> N -0.818579360 -0.015082595 0.07710624 -0.20706138
#> P -2.650181828 -0.911044318 -0.71667083 0.10218779
#> Q 0.266756562 -0.411003598 -0.01873185 -0.18852897
#> R -0.532824877 -1.190338611 -1.33715648 -1.18082233
#> S -0.532824877 -0.109981413 -0.21541226 -0.12610808
#> T -0.532824877 -0.109981413 -0.21541226 -0.12610808
#> V -0.008682243 -0.249993850 -0.38571419 -0.85152138
#> W -0.550465037 0.385154897 0.11769504 0.30836088
#> Y 0.360757558 0.526569660 0.07546417 -0.04751733
#> s 0.412402175 1.196984664 1.25574242 1.70655265
#> t 0.412402175 1.196984664 1.25574242 1.70655265
#> y 0.490467444 3.461305904 1.53012070 1.85199884
Beside the 20 standard amino acids, also phosphorylated serine, threonine and tyrosine residues are included. These phosphorylated residues are distinct from the central phospho-acceptor (serine/threonine at position 0
) and can have a strong impact on the affinity of a given kinase-substrate pair (phospho-priming).
The central phospho-acceptor site is located at position 0
and only measures the favorability of serine over threonine. The user can exclude this favorability measure by setting the parameter includeSTfavorability
to FALSE
, in which case the central position doesn’t contribute to the PWM score.
pwms2 <- getKinasePWM(includeSTfavorability=FALSE)
#> see ?JohnsonKinaseData and browseVignettes('JohnsonKinaseData') for documentation
#> loading from cache
Phosphorylated peptides are often represented in two different formats: (1) the phosphorylated residues are indicated by an asterix as in SAGLLS*DEDC
. Alternatively, phosphorylated residues are given by lower case letters as in SAGLLsDEDC
. In order to unify the phosophosite representation for PWM matching, JohnsonKinaseData provides the function processPhosphopeptides()
. It takes a character vector with phospho-peptides, aligns them to the central phospho-acceptor position and pads and/or truncates the surrounding residues, such that the processed site consists of 5 upstream residues, a central acceptor and 4 downstream residues. The central phospho-acceptor position is defined as the left closest position to the midpoint of the peptide given by floor(nchar(sites)/2)+1
. This midpoint definition is also the default alignment position if no phosphorylated residue was recognized.
ppeps <- c("SAGLLS*DEDC", "GDtND", "EKGDSN__", "HKRNyGsDER", "PEKS*GyNV")
sites <- processPhosphopeptides(ppeps)
#> Warning in processPhosphopeptides(ppeps): No S/T at central phospho-acceptor
#> position.
sites
#> # A tibble: 5 × 3
#> sites processed acceptor
#> <chr> <chr> <chr>
#> 1 SAGLLS*DEDC SAGLLSDEDC S
#> 2 GDtND ___GDTND__ T
#> 3 EKGDSN__ _EKGDSN___ S
#> 4 HKRNyGsDER _HKRNYGsDE Y
#> 5 PEKS*GyNV __PEKSGyNV S
If a peptide contains several phosphorylated residues, option onlyCentralAcceptor
controls how to select the acceptor position. Setting onlyCentralAcceptor=FALSE
will return all possible aligned phosphosites for a given input peptide. Note that in this case the output is not parallel to the input.
sites <- processPhosphopeptides(ppeps, onlyCentralAcceptor=FALSE)
#> Warning in processPhosphopeptides(ppeps, onlyCentralAcceptor = FALSE): No S/T
#> at central phospho-acceptor position.
sites
#> # A tibble: 7 × 3
#> sites processed acceptor
#> <chr> <chr> <chr>
#> 1 SAGLLS*DEDC SAGLLSDEDC S
#> 2 GDtND ___GDTND__ T
#> 3 EKGDSN__ _EKGDSN___ S
#> 4 HKRNyGsDER _HKRNYGsDE Y
#> 5 HKRNyGsDER KRNyGSDER_ S
#> 6 PEKS*GyNV __PEKSGyNV S
#> 7 PEKS*GyNV PEKsGYNV__ Y
Once peptides are processed to sites, the function scorePhosphosites()
can be used to create a matrix of kinase-substrate match scores.
selected <- sites |>
dplyr::filter(acceptor %in% c('S','T')) |>
dplyr::pull(processed)
scores <- scorePhosphosites(pwms, selected)
dim(scores)
#> [1] 5 303
scores[,1:5]
#> AAK1 ACVR2A ACVR2B AKT1 AKT2
#> SAGLLSDEDC -6.794078 -0.1666423 0.30390179 -5.8821117 -4.7783302
#> ___GDTND__ -4.803921 -1.0410203 -0.56120674 -2.8360934 -2.5125933
#> _EKGDSN___ -8.274386 -1.5402977 -0.92960511 -0.6188352 -0.8554523
#> KRNyGSDER_ -6.290564 -1.9202469 -1.38766899 -3.0601553 -1.7486155
#> __PEKSGyNV 1.695554 -0.1171313 0.06161951 -4.7296786 -3.6486856
The PWM scoring can be parallelized by supplying a BiocParallelParam
object to BPPARAM=
.
scores <- scorePhosphosites(pwms, selected, BPPARAM=BiocParallel::SerialParam())
By default, the resulting score is the log2-odds score of the PWM. Alternatively, by setting scoreType="percentile"
, a percentile rank of the log2-odds score is calculated, using for each PWM a background score distribution which is derived by matching each PWM to the 85’603 unique phosphosites published in Johnson et al. 2023.
scores <- scorePhosphosites(pwms, selected, scoreType="percentile")
#> see ?JohnsonKinaseData and browseVignettes('JohnsonKinaseData') for documentation
#> loading from cache
scores[,1:5]
#> AAK1 ACVR2A ACVR2B AKT1 AKT2
#> SAGLLSDEDC 22.375586 79.73910 83.79933 14.73447 14.59609
#> ___GDTND__ 53.371824 67.48779 74.89617 56.34769 53.31220
#> _EKGDSN___ 7.927565 57.36739 69.80942 79.14942 74.56646
#> KRNyGSDER_ 29.304770 48.35330 61.93582 53.01150 64.98986
#> __PEKSGyNV 98.620247 80.26811 81.54857 28.17005 32.26440
Quantifying PWM matches by percentile rank was first described in Yaffe et al. 2001 (Yaffe et al. 2001). It is also the matching score underlying the kinase activity predictions published in Johnson et al. 2023 (Johnson et al. 2023).
Note that these percentile ranks do not account for phospho-priming, as non-central phosphorylated residues were missing in the background sites published in Johnson et al. I.e. the score distributions derived from the background sites do not reflect the impact of phospho-priming.
sessionInfo()
#> R version 4.4.0 beta (2024-04-15 r86425)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] JohnsonKinaseData_1.0.0 BiocStyle_2.32.0
#>
#> loaded via a namespace (and not attached):
#> [1] KEGGREST_1.44.0 xfun_0.43 bslib_0.7.0
#> [4] Biobase_2.64.0 vctrs_0.6.5 tools_4.4.0
#> [7] generics_0.1.3 stats4_4.4.0 curl_5.2.1
#> [10] parallel_4.4.0 tibble_3.2.1 fansi_1.0.6
#> [13] AnnotationDbi_1.66.0 RSQLite_2.3.6 blob_1.2.4
#> [16] pkgconfig_2.0.3 checkmate_2.3.1 dbplyr_2.5.0
#> [19] S4Vectors_0.42.0 lifecycle_1.0.4 GenomeInfoDbData_1.2.12
#> [22] stringr_1.5.1 compiler_4.4.0 Biostrings_2.72.0
#> [25] codetools_0.2-20 GenomeInfoDb_1.40.0 htmltools_0.5.8.1
#> [28] sass_0.4.9 yaml_2.3.8 tidyr_1.3.1
#> [31] pillar_1.9.0 crayon_1.5.2 jquerylib_0.1.4
#> [34] BiocParallel_1.38.0 cachem_1.0.8 mime_0.12
#> [37] ExperimentHub_2.12.0 AnnotationHub_3.12.0 tidyselect_1.2.1
#> [40] digest_0.6.35 stringi_1.8.3 purrr_1.0.2
#> [43] dplyr_1.1.4 bookdown_0.39 BiocVersion_3.19.1
#> [46] fastmap_1.1.1 cli_3.6.2 magrittr_2.0.3
#> [49] utf8_1.2.4 withr_3.0.0 backports_1.4.1
#> [52] filelock_1.0.3 UCSC.utils_1.0.0 rappdirs_0.3.3
#> [55] bit64_4.0.5 rmarkdown_2.26 XVector_0.44.0
#> [58] httr_1.4.7 bit_4.0.5 png_0.1-8
#> [61] memoise_2.0.1 evaluate_0.23 knitr_1.46
#> [64] IRanges_2.38.0 BiocFileCache_2.12.0 rlang_1.1.3
#> [67] glue_1.7.0 DBI_1.2.2 BiocManager_1.30.22
#> [70] BiocGenerics_0.50.0 jsonlite_1.8.8 R6_2.5.1
#> [73] zlibbioc_1.50.0
Johnson, Jared L., Tomer M. Yaron, Emily M. Huntsman, Alexander Kerelsky, Junho Song, Amit Regev, Ting-Yu Lin, et al. 2023. “An Atlas of Substrate Specificities for the Human Serine/Threonine Kinome.” Journal Article. Nature 613 (7945): 759–66. https://doi.org/10.1038/s41586-022-05575-3.
Yaffe, Michael B., German G. Leparc, Jack Lai, Toshiyuki Obata, Stefano Volinia, and Lewis C. Cantley. 2001. “A Motif-Based Profile Scanning Approach for Genome-Wide Prediction of Signaling Pathways.” Journal Article. Nature Biotechnology 19 (4): 348–53. https://doi.org/10.1038/86737.