Package: peakPantheR
Authors: Arnaud Wolfer

1 Introduction

The peakPantheR package is designed for the detection, integration and reporting of pre-defined features in MS files (e.g. compounds, fragments, adducts, …).

The Real Time Annotation is set to detect and integrate multiple compounds in one file at a time. It therefore can be deployed on a LC-MS instrument to integrate a set of pre-defined features (e.g. spiked standards) as soon as the acquisition of a sample is completed.

Using the faahKO raw MS dataset as an example, this vignette will:

  • Detail the Real Time Annotation concept
  • Apply the Real Time Annotation to a subset of pre-defined features in the faahKO dataset

1.1 Abbreviations

  • ROI: Regions Of Interest
    • reference RT / m/z windows in which to search for a feature
  • uROI: updated Regions Of Interest
    • modifed ROI adapted to the current dataset which override the reference ROI
  • FIR: Fallback Integration Regions
    • RT / m/z window to integrate if no peak is found
  • TIC: Total Ion Chromatogram
    • the intensities summed across all masses for each scan
  • EIC: Extracted Ion Chromatogram
    • the intensities summed over a mass range, for each scan

2 Real Time Annotation Concept

Real time compound integration is set to process multiple compounds in one file at a time.

To achieve this, peakPantheR will:

  • load a list of expected RT / m/z regions of interest (ROI)
  • detect features in each ROI and keep the highest intensity one
  • determine peak statistics for each feature
  • return:
    • TIC
    • a table with all detected compounds for that file (row: compound, col: statistic)
    • EIC for each ROI
    • sample acquisition date-time from the mzML metadata (if available)
    • save EIC plots to disk

3 Real Time Annotation Example

In the following example we will target two pre-defined features in a single raw MS spectra file from the faahKO package. For more details on the installation and input data employed, please consult the Getting Started with peakPantheR vignette.

3.1 Input Data

The path to a MS file from the faahKO is located and used as input spectra:

library(faahKO)
## file paths
input_spectraPath  <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"))
input_spectraPath
#> [1] "/home/biocbuild/bbs-3.13-bioc/R/library/faahKO/cdf/KO/ko15.CDF"

Two targeted features (e.g. compounds, fragments, adducts, …) are defined and stored in a table with as columns:

  • cpdID (numeric)
  • cpdName (character)
  • rtMin (sec)
  • rtMax (sec)
  • rt (sec, optional / NA)
  • mzMin (m/z)
  • mzMax (m/z)
  • mz (m/z, optional / NA)
# targetFeatTable
input_targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(), 
                        c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin", 
                        "mz", "mzMax"))), stringsAsFactors=FALSE)
input_targetFeatTable[1,] <- c("ID-1", "Cpd 1", 3310., 3344.888, 3390., 
                                522.194778, 522.2, 522.205222)
input_targetFeatTable[2,] <- c("ID-2", "Cpd 2", 3280., 3385.577, 3440., 
                                496.195038, 496.2, 496.204962)
input_targetFeatTable[,c(3:8)] <- sapply(input_targetFeatTable[,c(3:8)], 
                                            as.numeric)
cpdID cpdName rtMin rt rtMax mzMin mz mzMax
ID-1 Cpd 1 3310 3344.888 3390 522.194778 522.2 522.205222
ID-2 Cpd 2 3280 3385.577 3440 496.195038 496.2 496.204962

3.2 Run Single File Annotation

peakPantheR_singleFileSearch() takes as input a singleSpectraDataPath pointing to the file to process and targetFeatTable defining the features to integrate. The resulting annotation contains all the fitting and integration properties:

library(peakPantheR)
annotation <- peakPantheR_singleFileSearch(
                                    singleSpectraDataPath = input_spectraPath,
                                    targetFeatTable = input_targetFeatTable,
                                    peakStatistic = TRUE,
                                    curveModel = 'skewedGaussian',
                                    verbose = TRUE)
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Reading data from 2 windows
#> Data read in: 3.76 secs
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #1
#> Found 2/2 features in 0.02 secs
#> Peak statistics done in: 0 secs
#> Feature search done in: 5.16 secs
annotation$TIC
#> [1] 2410533091
## acquisition time cannot be extracted from NetCDF files
annotation$acquTime
#> [1] NA
annotation$peakTable
Table continues below
found rtMin rt rtMax mzMin mz mzMax
TRUE 3309.759 3346.828 3385.41 522.1948 522.2 522.2052
TRUE 3345.377 3386.529 3428.279 496.2 496.2 496.2
Table continues below
peakArea peakAreaRaw maxIntMeasured maxIntPredicted cpdID cpdName
26133727 26071378 889280 901015.8 ID-1 Cpd 1
35472141 36498367 1128960 1113577 ID-2 Cpd 2
is_filled ppm_error rt_dev_sec tailingFactor asymmetryFactor
FALSE 0.02337616 1.939759 1.015357 1.026824
FALSE 0.02460103 0.9518072 1.005378 1.009318
annotation$curveFit
#> [[1]]
#> $amplitude
#> [1] 162404.8
#> 
#> $center
#> [1] 3341.888
#> 
#> $sigma
#> [1] 0.07878613
#> 
#> $gamma
#> [1] 0.00183361
#> 
#> $fitStatus
#> [1] 2
#> 
#> $curveModel
#> [1] "skewedGaussian"
#> 
#> attr(,"class")
#> [1] "peakPantheR_curveFit"
#> 
#> [[2]]
#> $amplitude
#> [1] 199249.1
#> 
#> $center
#> [1] 3382.577
#> 
#> $sigma
#> [1] 0.07490442
#> 
#> $gamma
#> [1] 0.00114719
#> 
#> $fitStatus
#> [1] 2
#> 
#> $curveModel
#> [1] "skewedGaussian"
#> 
#> attr(,"class")
#> [1] "peakPantheR_curveFit"
annotation$ROIsDataPoint
#> [[1]]
#>          rt    mz    int
#> 1  3315.154 522.2   2187
#> 2  3316.719 522.2   3534
#> 3  3318.284 522.2   6338
#> 4  3319.849 522.2  11718
#> 5  3321.414 522.2  21744
#> 6  3322.979 522.2  37872
#> 7  3324.544 522.2  62424
#> 8  3326.109 522.2  98408
#> 9  3327.673 522.2 152896
#> 10 3329.238 522.2 225984
#> 11 3330.803 522.2 308672
#> 12 3332.368 522.2 399360
#> 13 3333.933 522.2 504000
#> 14 3335.498 522.2 614656
#> 15 3337.063 522.2 711872
#> 16 3338.628 522.2 784704
#> 17 3340.193 522.2 836608
#> 18 3341.758 522.2 866304
#> 19 3343.323 522.2 882304
#> 20 3344.888 522.2 889280
#> 21 3346.453 522.2 888256
#> 22 3348.018 522.2 866816
#> 23 3349.583 522.2 827392
#> 24 3351.148 522.2 777728
#> 25 3352.713 522.2 727040
#> 26 3354.278 522.2 678464
#> 27 3355.843 522.2 629120
#> 28 3357.408 522.2 578048
#> 29 3358.973 522.2 524288
#> 30 3360.538 522.2 471040
#> 31 3362.102 522.2 416320
#> 32 3363.667 522.2 360064
#> 33 3365.232 522.2 302400
#> 34 3366.797 522.2 249152
#> 35 3368.362 522.2 202560
#> 36 3369.927 522.2 161024
#> 37 3371.492 522.2 123520
#> 38 3373.057 522.2  93160
#> 39 3374.622 522.2  71856
#> 40 3376.187 522.2  58392
#> 41 3377.752 522.2  51072
#> 42 3379.317 522.2  48376
#> 43 3380.882 522.2  49168
#> 44 3382.447 522.2  53120
#> 45 3384.012 522.2  62488
#> 46 3385.577 522.2  78680
#> 47 3387.142 522.2 102840
#> 48 3388.707 522.2 134656
#> 
#> [[2]]
#>          rt    mz     int
#> 1  3280.725 496.2    1349
#> 2  3290.115 496.2    2069
#> 3  3291.680 496.2    3103
#> 4  3293.245 496.2    5570
#> 5  3294.809 496.2   10730
#> 6  3296.374 496.2   20904
#> 7  3297.939 496.2   38712
#> 8  3299.504 496.2   64368
#> 9  3301.069 496.2   97096
#> 10 3302.634 496.2  136320
#> 11 3304.199 496.2  182080
#> 12 3305.764 496.2  233472
#> 13 3307.329 496.2  288064
#> 14 3308.894 496.2  341248
#> 15 3310.459 496.2  386560
#> 16 3312.024 496.2  419648
#> 17 3313.589 496.2  441344
#> 18 3315.154 496.2  454592
#> 19 3316.719 496.2  459776
#> 20 3318.284 496.2  456320
#> 21 3319.849 496.2  444736
#> 22 3321.414 496.2  425408
#> 23 3322.979 496.2  397888
#> 24 3324.544 496.2  365952
#> 25 3326.109 496.2  329856
#> 26 3327.673 496.2  288832
#> 27 3329.238 496.2  244864
#> 28 3330.803 496.2  207616
#> 29 3332.368 496.2  177216
#> 30 3333.933 496.2  148544
#> 31 3335.498 496.2  119240
#> 32 3337.063 496.2   94736
#> 33 3338.628 496.2   76832
#> 34 3340.193 496.2   64048
#> 35 3341.758 496.2   53744
#> 36 3343.323 496.2   45704
#> 37 3344.888 496.2   39576
#> 38 3346.453 496.2   35080
#> 39 3348.018 496.2   31432
#> 40 3349.583 496.2   28904
#> 41 3351.148 496.2   27064
#> 42 3352.713 496.2   25632
#> 43 3354.278 496.2   24480
#> 44 3355.843 496.2   23840
#> 45 3357.408 496.2   23688
#> 46 3358.973 496.2   24680
#> 47 3360.538 496.2   28608
#> 48 3362.102 496.2   39616
#> 49 3363.667 496.2   63344
#> 50 3365.232 496.2  107352
#> 51 3366.797 496.2  182144
#> 52 3368.362 496.2  295232
#> 53 3369.927 496.2  438016
#> 54 3371.492 496.2  592256
#> 55 3373.057 496.2  737408
#> 56 3374.622 496.2  858624
#> 57 3376.187 496.2  938368
#> 58 3377.752 496.2  982976
#> 59 3379.317 496.2 1015232
#> 60 3380.882 496.2 1057280
#> 61 3382.447 496.2 1095680
#> 62 3384.012 496.2 1122304
#> 63 3385.577 496.2 1128960
#> 64 3387.142 496.2 1111040
#> 65 3388.707 496.2 1058816
#> 66 3390.272 496.2  996608
#> 67 3391.837 496.2  942272
#> 68 3393.402 496.2  888640
#> 69 3394.966 496.2  820160
#> 70 3396.531 496.2  744448
#> 71 3398.096 496.2  674688
#> 72 3399.661 496.2  609600
#> 73 3401.226 496.2  543552
#> 74 3402.791 496.2  480896
#> 75 3404.356 496.2  428032
#> 76 3405.921 496.2  387520
#> 77 3407.486 496.2  350976
#> 78 3409.051 496.2  313984
#> 79 3410.616 496.2  276224
#> 80 3412.181 496.2  243776
#> 81 3413.746 496.2  219264
#> 82 3415.311 496.2  202368
#> 83 3416.876 496.2  189184
#> 84 3418.441 496.2  174272
#> 85 3420.006 496.2  158080
#> 86 3421.571 496.2  143744
#> 87 3423.136 496.2  135808
#> 88 3424.701 496.2  131648
#> 89 3426.266 496.2  126784
#> 90 3427.831 496.2  118328
#> 91 3429.395 496.2  109144
#> 92 3430.960 496.2  101944
#> 93 3432.525 496.2   96896
#> 94 3434.090 496.2   92048
#> 95 3435.655 496.2   86408
#> 96 3437.220 496.2   81288
#> 97 3438.785 496.2   78144

peakPantheR_singleFileSearch() takes multiple parameters that can alter the file annotation:

  • peakStatistic if TRUE calculates additional peak statistics: ‘ppm_error’, ‘rt_dev_sec’, ‘tailing factor’ and ‘asymmetry factor’
  • plotEICsPath if not NA will save a .png of all ROI EICs at the path provided (expects 'filepath/filename.png' for example). If NA no plot is saved
  • getAcquTime if TRUE the sample acquisition date-time is extracted from the mzML metadata. Acquisition time cannot be extracted from other file formats. The additional file access will impact run time
  • FIR if not NULL, defines the Fallback Integration Regions (FIR) to integrate when a feature is not found.
  • curveModel, defines the peak-shape model to fit to each EIC. By default, a ‘skewedGaussian’ model is used. The other alternative is the exponentially modified gaussian ‘emgGaussian’ model.
  • verbose if TRUE messages calculation progress, time taken and number of features found (total and matched to targets)
  • ... passes arguments to findTargetFeatures to alter peak-picking parameters (e.g. the curveModel, the sampling or fitting parameters)

The summary plot generated by plotEICsPath, corresponding to the EICs of each integrated regions of interest is as follow:

EICs plot: Each panel correspond to a targeted feature, with the EIC extracted on the mzMin, mzMax range found. The red dot marks the RT peak apex, and the red line highlights the RT peakwidth range found (rtMin, rtMax)