Global centering-based normalization is a commonly-used normalization approach in mass spectrometry (MS) -based label-free proteomics. It scales the peptide abundances to have the same median intensities, based on an assumption that the majority of abundances remain the same across the samples. However, especially in phosphoproteomics experiments, this assumption can introduce bias, as the enrichment of phosphopeptides during sample preparation can mask large unidirectional biological changes. Therefore, a novel method called pairwise normalization has been introduced that addresses this possible bias by utilizing phosphopeptides quantified in both enriched and non-enriched samples to calculate factors that mitigate the bias (Kauko et al. 2015). The phosphonormalizer package implements the pairwise normalization (Saraei et al., under review ).
The phosphonormalizer package (Saraei et al. under review) normalizes the enriched samples in label-free MS-based phosphoproteomics using phosphopeptides that are present in both enriched and non-enriched data of the same samples. If there are no common phosphopeptides between the enriched and non-enriched data, then the normalization is not possible and an error is generated.
In order to use phosphonormalizer package, we assume that the experiment have been conducted on both enriched and non-enriched samples. These datasets must have the sequence, modification and abundance columns. The sequence and modification columns in the dataframe must be in character format and the abundance columns in numeric. The algorithm expects that the abundances are pre-normalized with median normalization (Kauko et al. 2015). This package also supports MSnSet data type from MSnbase package which is used in data preprocessing step of bioconductor mass spectrometry proteomics workflow (see more: https://www.bioconductor.org/help/workflows/proteomics/).
#Load the library
library(phosphonormalizer)
#Enriched data overview
head(enriched.rd)
## Sequence
## 1 SSSPVNVKK
## 2 TTSQKHRDFVAEPMGEKPVGSLAGIGEVLGK
## 3 RRSPSPYYSR
## 4 LRLSPSPTSQR
## 5 MEDLDQSPLVSSSDSPPRPQPAFK
## 6 LRLSPSPTSQR
## Modification
## 1 [N-term] Acetyl; (N-Term)|[3] Phospho (S)
## 2 [N-term] Acetyl; (N-Term)|[2] Phospho; (T)|[3] Phospho (S)
## 3 [3] Phospho; (S)|[5] Phospho (S)
## 4 [4] Phospho; (S)|[6] Phospho (S)
## 5 [N-term] Acetyl; (N-Term)|[15] Phospho (S)
## 6 [4] Phospho; (S)|[6] Phospho (S)
## gcNorm.ctrl2.1 gcNorm.ctrl2.2 gcNorm.ctrl2.3 gcNorm.ctrl1.1
## 1 259694658 457040590 587004777 230727898
## 2 1119159195 1705615152 1953963165 1078311545
## 3 336584109 193881363 276640765 148976414
## 4 134915349 167734763 197708821 229647325
## 5 720567253 630721302 568929214 606538462
## 6 54067728 68288843 89662099 95882948
## gcNorm.ctrl1.2 gcNorm.ctrl1.3 gcNorm.CIP2A.1 gcNorm.CIP2A.2
## 1 322501318 440483177 668689914 674423478
## 2 1773146888 1474056582 1190582658 676441788
## 3 174112138 153753909 287782373 287136785
## 4 191052464 166754233 220674695 194132906
## 5 723389269 682481415 874073517 1017771417
## 6 88779229 81051004 89833726 71876578
## gcNorm.CIP2A.3 gcNorm.RAS.1 gcNorm.RAS.2 gcNorm.RAS.3 gcNorm.OA.1
## 1 625035056 451234167 363667417 309494581 352078454
## 2 769588835 1372965516 1339147824 2078858839 1179294365
## 3 245112678 234516289 283827574 200462805 180598176
## 4 162583218 227425157 320668737 256416084 150749053
## 5 1007191248 427561343 662159439 681176179 436802093
## 6 68494304 114763943 146028525 111620925 83197600
## gcNorm.OA.2 gcNorm.OA.3
## 1 177105234 227394429
## 2 1167182710 2117750218
## 3 158187327 209212284
## 4 373474922 306284818
## 5 309956463 307752815
## 6 165557524 152691693
#Non-enriched data overview
head(non.enriched.rd)
## Sequence Modification gcNorm.ctrl2.1 gcNorm.ctrl2.2
## 1 LLLPGELAK <NA> 1134176418 974814910
## 2 AGLQFPVGR <NA> 826483607 715965777
## 3 AMGIMNSFVNDIFER <NA> 2528350640 2105237338
## 4 VTIAQGGVLPNIQAVLLPK <NA> 1263890433 1174074703
## 5 VTIAQGGVLPNIQAVLLPK <NA> 1715542498 1657401406
## 6 AGFAGDDAPR <NA> 410016382 334405218
## gcNorm.ctrl2.3 gcNorm.ctrl1.1 gcNorm.ctrl1.2 gcNorm.ctrl1.3
## 1 1228718539 1394059839 1527009279 2331374945
## 2 830605008 902406245 1107622809 1508853999
## 3 2495554811 1272528331 2944603758 2443519913
## 4 1306359138 1068498506 1116762968 1431064191
## 5 1626124071 1985173704 2236269438 2453708614
## 6 692078515 939263413 409181048 515946375
## gcNorm.CIP2A.1 gcNorm.CIP2A.2 gcNorm.CIP2A.3 gcNorm.RAS.1 gcNorm.RAS.2
## 1 3262876356 1475250309 1402379678 2331435624 1359941630
## 2 1051523142 1619993343 1374369052 1702157550 1021184670
## 3 3243241407 3981953390 3293949927 2804134445 3542050573
## 4 1586541519 2171400749 1334195938 1514303277 1828763181
## 5 2788681677 3975512588 2547762644 2568498719 3679284319
## 6 122490064 232386439 354316201 338037562 267232025
## gcNorm.RAS.3 gcNorm.OA.1 gcNorm.OA.2 gcNorm.OA.3
## 1 1337506496 1776762436 2068739879 1619286624
## 2 1094560285 1230922434 1253221279 1234030083
## 3 4403231723 3307460281 2235874137 2344112126
## 4 1619522966 1197144107 1204365265 1428068227
## 5 3297580183 2495127098 2084809203 2121807104
## 6 414121334 275091595 249007012 416661130
The normalization begins by loading the phosphonormalizer package. Here for demonstration, the data used is from “enriched.rd” and “non.enriched.rd” are available with the package. Boxplot of fold change distribution before and after pairwise normalization can also be generated by setting the plot parameter (look at the example).
To install this package, start R and enter:
## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("phosphonormalizer")
#Load the library
library(phosphonormalizer)
#Specify the column numbers of abundances in the original data.frame,
#from both enriched and non-enriched runs
samplesCols <- data.frame(enriched=3:17, non.enriched=3:17)
#Specify the column numbers of sequence and modification in the original data.frame,
#from both enriched and non-enriched runs
modseqCols <- data.frame(enriched = 1:2, non.enriched = 1:2)
#The samples and their technical replicates
techRep <- factor(x = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5))
#If the paramter plot.fc set, the corresponding plots of Sample fold changes is produced
#Here, for demonstration, the fold change distributions are shown for samples 3 vs 1
plot.param <- list(control = c(1), samples = c(3))
#Call the function
norm <- normalizePhospho(enriched = enriched.rd, non.enriched = non.enriched.rd,
samplesCols = samplesCols, modseqCols = modseqCols, techRep = techRep,
plot.fc = plot.param)
## The number of peptides in the intersect is: 54
## 1 plots generated. Browse through them.