transformCounts {mia} | R Documentation |
These functions provide a variety of options for transforming abundance data.
By using these functions, transformed table is calculated and stored in assay
.
transformSamples
does the transformation sample-wise, i.e., column-wise.
It is alias for transformCounts
. transformFeatures
does the transformation
feature-wise, i.e., row-wise. ZTransform
is a shortcut for Z-transformation.
relAbundanceCounts
is a shortcut for fetching relative abundance table.
transformSamples( x, abund_values = "counts", method = c("clr", "rclr", "hellinger", "log10", "pa", "rank", "relabundance"), name = method, pseudocount = FALSE, threshold = 0 ) ## S4 method for signature 'SummarizedExperiment' transformSamples( x, abund_values = "counts", method = c("clr", "rclr", "hellinger", "log10", "pa", "rank", "relabundance"), name = method, pseudocount = FALSE, threshold = 0 ) transformCounts( x, abund_values = "counts", method = c("clr", "rclr", "hellinger", "log10", "pa", "rank", "relabundance"), name = method, pseudocount = FALSE, threshold = 0 ) ## S4 method for signature 'SummarizedExperiment' transformCounts( x, abund_values = "counts", method = c("clr", "rclr", "hellinger", "log10", "pa", "rank", "relabundance"), name = method, pseudocount = FALSE, threshold = 0 ) transformFeatures( x, abund_values = "counts", method = c("log10", "pa", "z"), name = method, pseudocount = FALSE, threshold = 0 ) ## S4 method for signature 'SummarizedExperiment' transformFeatures( x, abund_values = "counts", method = c("log10", "pa", "z"), name = method, pseudocount = FALSE, threshold = 0 ) ZTransform(x, ...) ## S4 method for signature 'SummarizedExperiment' ZTransform(x, ...) relAbundanceCounts(x, ...) ## S4 method for signature 'SummarizedExperiment' relAbundanceCounts(x, ...)
x |
A
|
abund_values |
A single character value for selecting the
|
method |
A single character value for selecting the transformation method. |
name |
A single character value specifying the name of transformed abundance table. |
pseudocount |
FALSE or numeric value deciding whether pseudocount is
added. Numerical value specifies the value of pseudocount. (Only used for
methods |
threshold |
A numeric value for setting threshold for pa transformation.
By default it is 0. (Only used for |
... |
additional arguments |
transformCounts
or transformSamples
and transformFeatures
applies transformation to abundance table. Provided transformation methods include:
'clr' Centered log ratio (clr) transformation can be used for reducing the skewness of data and for centering it. (See e.g. Gloor et al. 2017.)
clr = log10(x_r/g(x_r)) = log10 x_r - log10 µ_r
where x_{r} is a single relative value, g(x_r) is geometric mean of sample-wide relative values, and μ_{r} is arithmetic mean of sample-wide relative values".
'rclr' rclr or robust clr is similar to regular clr. Problem of regular clr is that logarithmic transformations lead to undefined values when zeros are present in the data. In rclr, values are divided by geometric mean of observed taxa and zero values are not taken into account. Zero values will stay as zeroes.
Because of high-dimensionality of data, rclr's geometric mean of observed taxa is a good approximation to the true geometric mean. (See e.g. Martino et al. 2019.)
rclr = log10(x_r/g(x_r > 0))
where x_{r} is a single relative value, and g(x_r > 0) is geometric mean of sample-wide relative values that are over 0".
'hellinger' Hellinger transformation can be used to reduce the impact of extreme data points. It can be utilize for clustering or ordination analysis. (See e.g. Legendre & Gallagher 2001.)
hellinger = sqrt(x/x_tot)
where x is a single value and x_{tot} is the sum of all values
'log10' log10 transformation can be used for reducing the skewness of the data.
log10 = log10(x)
where x is a single value of data.
'pa' Transforms table to presence/absence table. All abundances higher than ε are transformed to 1 (present), otherwise 0 (absent). By default, threshold is 0.
'rank' Rank returns ranks of taxa. For each sample, the least abundant taxa get lower value and more abundant taxa bigger value. The implementation is based on the colRanks function with ties.method="first".
'relabundance' Transforms abundances to relative. Generally, all microbiome data are compositional. That is, e.g., because all measuring instruments have their capacity limits. To make results comparable with other results, values must be relative. (See e.g. Gloor et al. 2017.)
relabundance = x/x_tot
where x is a single value and x_{tot} is the sum of all values.
'z' Z-transformation, Z score transformation, or Z-standardization normalizes
the data by shifting (to mean μ) and scaling (to standard deviation σ).
Z-transformation can be done with function ZTransform
. It is done per rows (features / taxa),
unlike most other transformations. This is often preceded by log10p or clr transformation.
In other words, single value is standardized with respect of feature's values.
z = (x - µ)/σ
where x is a single value, μ is the mean of the feature, and σ is the standard deviation of the feature.
transformCounts
, transformSamples
, transformFeatures
,
relAbundanceCounts
, and ZTransform
return x
with additional,
transformed abundance table named name
in the assay
.
Leo Lahti and Tuomas Borman. Contact: microbiome.github.io
Gloor GB, Macklaim JM, Pawlowsky-Glahn V & Egozcue JJ (2017) Microbiome Datasets Are Compositional: And This Is Not Optional. Frontiers in Microbiology 8: 2224. doi: 10.3389/fmicb.2017.02224
Legendre P & Gallagher ED (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129: 271-280.
Martino C, Morton JT, Marotz CA, Thompson LR, Tripathi A, Knight R & Zengler K (2019) A Novel Sparse Compositional Technique Reveals Microbial Perturbations. mSystems 4: 1. doi: 10.1128/mSystems.00016-19
data(esophagus) x <- esophagus # By specifying, it is possible to apply different transformations, e.g. clr transformation. # Pseudocount can be added by specifying 'pseudocount'. x <- transformSamples(x, method="clr", pseudocount=1) head(assay(x, "clr")) # Also, the target of transformation # can be specified with "abund_values". x <- transformSamples(x, method="relabundance") x <- transformSamples(x, method="clr", abund_values="relabundance", pseudocount = min(assay(x, "relabundance")[assay(x, "relabundance")>0])) x2 <- transformSamples(x, method="clr", abund_values="counts", pseudocount = 1) head(assay(x, "clr")) # Different pseudocounts used by default for counts and relative abundances x <- transformSamples(x, method="relabundance") mat <- assay(x, "relabundance"); pseudonumber <- min(mat[mat>0]) x <- transformSamples(x, method="clr", abund_values = "relabundance", pseudocount=pseudonumber) x <- transformSamples(x, method="clr", abund_values = "counts", pseudocount=1) # Name of the stored table can be specified. x <- transformSamples(x, method="hellinger", name="test") head(assay(x, "test")) # pa returns presence absence table. With 'threshold', it is possible to set the # threshold to a desired level. By default, it is 0. x <- transformSamples(x, method="pa", threshold=35) head(assay(x, "pa")) # rank returns ranks of taxa. It is calculated column-wise, i.e., per sample # and using the ties.method="first" from the colRanks function x <- transformSamples(x, method="rank") head(assay(x, "rank")) # transformCounts is an alias for transformSamples x <- transformCounts(x, method="relabundance", name="test2") head(assay(x, "test2")) # In order to use other ranking variants, modify the chosen assay directly: assay(x, "rank_average", withDimnames = FALSE) <- colRanks(assay(x, "counts"), ties.method="average", preserveShape = TRUE) # If you want to do the transformation for features, you can do that by using x <- transformFeatures(x, method="log10", name="log10_features", pseudocount = 1) head(assay(x, "log10_features")) # Z-transform can be done for features by using shortcut function x <- ZTransform(x) head(assay(x, "z")) # For visualization purposes it is sometimes done by applying CLR for samples, # followed by Z transform for taxa x <- ZTransform(transformCounts(x, method="clr", abund_values = "counts", pseudocount = 1)) # Relative abundances can be also calculated with the dedicated # relAbundanceCounts function. x <- relAbundanceCounts(x) head(assay(x, "relabundance"))