This user’s guide provides an overview of the package ASICS
. ASICS
is a fully automated procedure to identify and quantify metabolites in \(^1\)H 1D-NMR spectra of biological mixtures (Tardivel et al., 2017). It will enable empowering NMR-based metabolomics by quickly and accurately helping experts to obtain metabolic profiles. In addition to the quantification method, several functions allowing spectrum preprocessing or statistical analyses of quantified metabolites are available.
library(ASICS)
library(ASICSdata)
In this user’s guide, a subset of the public datasets from Salek et al. (2007) is used. The experiment has been designed to improve the understanding of early stage of type 2 diabetes mellitus (T2DM) development. In the dataset used, \(^1\)H-NMR human metabolome was obtained from 25 healthy volunteers and 25 T2DM patients. Raw 1D Bruker spectral data files were found in the MetaboLights database (https://www.ebi.ac.uk/metabolights/, study MTBLS1).
For most of long time functions, a parallel implementation is available for unix-like OS using the BiocParallel package of Bioconductor. Number of cores is chosen with the option ncores
of the relevant functions (default to 1
, no parallel environment).
An object of class PureLibrary
with spectra of 191 pure metabolites is required to perform the quantification. These spectra are metabolite spectra used as references for quantification: only metabolites that are present in the library object can be quantified with ASICS.
A default library is available and is automatically loaded at package start. Available metabolites are displayed with:
head(getSampleName(pure_library), n = 8)
## [1] "1,3-Diaminopropane" "Levoglucosan" "1-Methylhydantoin"
## [4] "1-Methyl-L-Histidine" "QuinolinicAcid" "2-AminoAdipicAcid"
## [7] "2-AminobutyricAcid" "2-Deoxyadenosine"
This library can be complemented or another library can be created with new spectra of pure metabolite. These spectra are imported from Bruker files and a new library can be created with:
pure_spectra <- importSpectraBruker(system.file("extdata", "example_library",
package = "ASICS"))
new_pure_library <- createPureLibrary(pure_spectra,
nb.protons = c(5, 4))
A new library can also be created from txt or csv file, with samples in columns and chemical shifts in rows (see help page of createPureLibrary
function for all details).
The newly created library can be used for quantification or merged with another one:
merged_pure_library <- c(pure_library[1:10], new_pure_library)
The PureLibrary
merged_pure_library
contains the first ten spectra of the default library and the two newly imported spectra imported.
First, data are imported in a data frame from Bruker files with the importSpectraBruker
function. These spectra are baseline corrected (Wang et al, 2013) and normalised by the area under the curve.
spectra_data <- importSpectraBruker(system.file("extdata",
"Human_diabetes_example",
package = "ASICSdata"))
Data can also be imported from other file types with standard R
function. The only constraint is to have a data frame with spectra in columns (column names are sample names) and chemical shifts in rows (row names correspond to the ppm grid).
diabetes <- system.file("extdata", "spectra_diabetes_example.txt",
package = "ASICSdata")
spectra_data_txt <- read.table(diabetes, header = TRUE, row.names = 1)
Several functions for the preprocessing of spectra are also available: baseline correction, normalisation by the AUC and alignment on a reference spectrum (Vu et al., 2011). For example, here, spectra are baseline corrected:
spectra_base_cor <- baselineCorrection(spectra_data_txt)
Finally, from the data frame, a Spectra
object is created. This is a required step for the quantification.
spectra_obj <- createSpectra(spectra_base_cor)
Identification and quantification of metabolites can now be carried out using only the function ASICS
. This function takes approximately 2 minutes by spectrum to run.
# part of the spectrum to exclude (water and urea)
to_exclude <- matrix(c(4.5, 5.1, 5.5, 6.5), ncol = 2, byrow = TRUE)
ASICS_results <- ASICS(spectra_obj, exclusion.areas = to_exclude)
Summary of ASICS results:
ASICS_results
## An object of class ASICSResults
## It contains 50 spectra of 31087 points.
##
## ASICS results:
## 163 metabolites are identified for this set of spectra.
## Most concentrated metabolites are: Creatinine, Citrate, AceticAcid, L-Glycine, L-Arabitol, L-Cysteine
The quality of the results can be assessed by stacking the original and the reconstructed spectra on one plot. A pure metabolite spectrum can also be added for visual comparison. For example, the first spectrum with Creatinine:
plot(ASICS_results, idx = 1, xlim = c(2.8, 3.3), add.metab = "Creatinine")
Relative concentrations of identified metabolites are saved in a data frame accessible via the get_quantification
function:
head(getQuantification(ASICS_results), 10)[, 1:2]
## ADG10003u_007 ADG10003u_008
## Creatinine 0.064383687 0.0114721923
## Citrate 0.033192360 0.0059923342
## AceticAcid 0.010935686 0.0000000000
## L-Glycine 0.007948477 0.0014649545
## L-Arabitol 0.002586274 0.0008558143
## L-Cysteine 0.002355810 0.0000000000
## Indoxylsulfate 0.002251307 0.0000000000
## L-GlutamicAcid 0.002217101 0.0000000000
## Betaine 0.002195449 0.0000000000
## L-Alanine 0.002173974 0.0011605980
Some analysis functions are available in ASICS package.
First, a design data frame is imported. In this data frame, the first column needs to correspond to sample names of all spectra.
design <- read.table(system.file("extdata", "design_diabete_example.txt",
package = "ASICSdata"), header = TRUE)
Then, a preprocessing is performed on relative quantifications: metabolites with more than 75% of null quantifications are removed as well as two samples that are considered as outliers.
analysis_data <- formatForAnalysis(getQuantification(ASICS_results),
design = design, zero.threshold = 75,
zero.group = "condition",
outliers = c("ADG10003u_007",
"ADG19007u_163"))
To explore results of ASICS quantification, a PCA can be performed on results of preprocessing with:
resPCA <- pca(analysis_data)
plot(resPCA, graph = "ind", col.ind = "condition")
plot(resPCA, graph = "var")
It is also possible to find differences between two conditions with an OPLS-DA (Thevenot et al, 2015) or with Kruskall-Wallis tests:
resOPLSDA <- oplsda(analysis_data, condition = "condition", orthoI = 1)
resOPLSDA
## OPLS-DA performed on quantifications
## Cross validation error: 0.12
##
## Variable with the higher VIP:
## Control Group diabetes mellitus VIP influential
## GuanidinoaceticAcid 3.940119e-03 0.0009623509 2.247405 TRUE
## SyringicAcid 6.464693e-05 0.0005018759 2.113223 TRUE
## Creatinine 4.293709e-02 0.0331778002 1.928518 TRUE
## Dimethylamine 1.504707e-03 0.0011279812 1.855604 TRUE
## L-GlutamicAcid 2.176990e-03 0.0013432298 1.802912 TRUE
## Glycerol 6.206438e-04 0.0013831975 1.799578 TRUE
## 4-AminoHippuricAcid 4.456578e-05 0.0002599298 1.771490 TRUE
## D-GlucuronicAcid 2.791478e-04 0.0009820983 1.734255 TRUE
## 4-EthylPhenol 1.875378e-05 0.0003254873 1.577177 TRUE
## Myo-Inositol 3.940959e-04 0.0008022096 1.558489 TRUE
## [...]
plot(resOPLSDA)
Results of Kruskall-Wallis tests and Benjamini-Hochberg correction:
resTests <- kruskalWallis(analysis_data, "condition")
resTests
## Kruskal-Wallis tests performed on quantifications
## Variable with the lower adjusted p-value:
##
## Feature Adjusted.p.value
## 1 Glycerol 0.002245837
## 2 GuanidinoaceticAcid 0.003894254
## 3 SyringicAcid 0.015306160
## 4 Galactitol 0.025998511
## 5 4-EthylPhenol 0.036488597
## 6 HippuricAcid 0.104952881
## 7 D-Glucose 0.157968986
## 8 Glycogen 0.194961626
## 9 Uracil 0.215802678
## 10 ThreonicAcid 0.320150170
## [...]
plot(resTests)
An analysis on buckets can also be performed. An alignment is required before the spectrum bucketing:
spectra_align <- alignment(spectra_data_txt)
spectra_bucket <- binning(spectra_align)
Then, a SummarizedExperiment
object is created with the formatForAnalysis
function as for quantification:
analysis_data_bucket <- formatForAnalysis(spectra_bucket, design = design,
zero.threshold = 75)
Finally, all analyses can be carried out on this object with the parameter type.data
set to buckets
. For example, the OPLS-DA is performed with:
resOPLSDA_buckets <- oplsda(analysis_data_bucket, condition = "condition",
type.data = "buckets")
resOPLSDA_buckets
## OPLS-DA performed on buckets
## Cross validation error: 0.12
##
## Variable with the higher VIP:
## Control Group diabetes mellitus VIP influential
## 4.115 0.1014765671 1.353712e-01 2.224541 TRUE
## 8.335 0.0218728678 3.506186e-02 2.189436 TRUE
## 1.035 0.0405512805 3.001391e-02 2.129490 TRUE
## 3.385 0.0548331586 7.967688e-02 2.063928 TRUE
## 8.935 -0.0013382430 -1.068887e-05 2.053225 TRUE
## 3.695 1.0294490532 6.461326e-01 2.025428 TRUE
## 9.735 -0.0008476978 2.087961e-04 2.013951 TRUE
## 1.105 0.0730545942 1.026731e-01 2.011920 TRUE
## 6.665 0.0122508514 2.004659e-02 1.985841 TRUE
## 0.865 0.0554418249 6.620170e-02 1.972984 TRUE
## [...]
Moreover, another plot with the median spectrum and OPLS-DA results can be produced with the option graph = "buckets"
:
plot(resOPLSDA_buckets, graph = "buckets")
Tardivel P., Canlet C., Lefort G., Tremblay-Franco M., Debrauwer L., Concordet D., Servien R. (2017). ASICS: an automatic method for identification and quantification of metabolites in complex 1D 1H NMR spectra. Metabolomics, 13(10), 109. https://doi.org/10.1007/s11306-017-1244-5
Salek, R. M., Maguire, M. L., Bentley, E., Rubtsov, D. V., Hough, T., Cheeseman, M., … & Connor, S. C. (2007). A metabolomic comparison of urinary changes in type 2 diabetes in mouse, rat, and human. Physiological genomics, 29(2), 99-108.
Wang, K. C., Wang, S. Y., Kuo, C. H., Tseng, Y. J. (2013). Distribution-based classification method for baseline correction of metabolomic 1D proton nuclear magnetic resonance spectra. Analytical Chemistry, 85(2), 1231–1239.
Vu, T. N., Valkenborg, D., Smets, K., Verwaest, K. A., Dommisse, R., Lemiere, F., … & Laukens, K. (2011). An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data. BMC bioinformatics, 12(1), 405.
Thevenot, E.A., Roux, A., Xu, Y., Ezan, E., Junot, C. 2015. Analysis of the human adult urinary metabolome variations with age, body mass index and gender by implementing a comprehensive workflow for univariate and OPLS statistical analyses. Journal of Proteome Research. 14, 3322-3335.
This user’s guide has been created with the following system configuration:
sessionInfo()
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
## [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
## [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
## [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ASICSdata_0.99.2 ASICS_1.0.1 BiocStyle_2.6.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.17 speaq_2.3.1
## [3] lattice_0.20-35 zoo_1.8-2
## [5] ropls_1.10.0 snow_0.4-2
## [7] rprojroot_1.3-2 digest_0.6.15
## [9] foreach_1.4.4 R6_2.2.2
## [11] GenomeInfoDb_1.14.0 plyr_1.8.4
## [13] backports_1.1.2 stats4_3.5.0
## [15] evaluate_0.10.1 httr_1.3.1
## [17] ggplot2_2.2.1 pillar_1.2.3
## [19] itertools_0.1-3 zlibbioc_1.24.0
## [21] rlang_0.2.1 lazyeval_0.2.1
## [23] data.table_1.11.4 S4Vectors_0.16.0
## [25] Matrix_1.2-14 missForest_1.4
## [27] rmarkdown_1.10 labeling_0.3
## [29] qtl_1.42-8 BiocParallel_1.12.0
## [31] stringr_1.3.1 RCurl_1.95-4.10
## [33] munsell_0.5.0 DelayedArray_0.4.1
## [35] compiler_3.5.0 xfun_0.1
## [37] BiocGenerics_0.24.0 htmltools_0.3.6
## [39] doSNOW_1.0.16 mQTL_1.0
## [41] SummarizedExperiment_1.8.1 tibble_1.4.2
## [43] gridExtra_2.3 GenomeInfoDbData_1.0.0
## [45] bookdown_0.7 quadprog_1.5-5
## [47] IRanges_2.12.0 codetools_0.2-15
## [49] matrixStats_0.53.1 randomForest_4.6-14
## [51] MASS_7.3-50 bitops_1.0-6
## [53] grid_3.5.0 MassSpecWavelet_1.44.0
## [55] gtable_0.2.0 magrittr_1.5
## [57] scales_0.5.0 stringi_1.2.3
## [59] impute_1.52.0 XVector_0.18.0
## [61] reshape2_1.4.3 xml2_1.2.0
## [63] RColorBrewer_1.1-2 iterators_1.0.9
## [65] tools_3.5.0 outliers_0.14
## [67] Biobase_2.38.0 parallel_3.5.0
## [69] yaml_2.1.19 colorspace_1.3-2
## [71] cluster_2.0.7-1 GenomicRanges_1.30.3
## [73] rvest_0.3.2 knitr_1.20