SummarizedBenchmark
package provides a framework for organizing benchmark comparisons, making it easier to both reproduce the original benchmark and replicate the comparison with new data. This vignette introduces the general approach and features of the package using two examples. SummarizedBenchmark package version: 1.0.4”
SummarizedBenchmark 1.0.4
With SummarizedBenchmark
, a complete benchmarking workflow is comprised of three primary components:
The first two (data and methods) are necessary for carrying out the benchmark experiment, and the last (performance metrics) is essential for evaluating the results of the experiment. Following this approach, the SummarizedBenchmark
package defines two types of objects: BenchDesign objects and SummarizedBenchmark objects. BenchDesign objects contain only the design of the benchmark experiment, namely the data and methods, where a method is defined as the combination of a function or algorithm and parameter settings. After constructing a BenchDesign, the experiment is executed to create a SummarizedBenchmark containing the results of applying the methods to the data. SummarizedBenchmark objects extend the Bioconductor SummarizedExperiment
class, with the additional capability of working with performance metrics.
The basic framework is illustrated in the figure below. A BenchDesign is created with methods and combined with data to create a SummarizedBenchmark, which contains the output of applying the methods to the data. This output is then paired with performance metrics specified by the user. Note that the same BenchDesign can be combined with several data sets to produce several SummarizedBenchmark objects with the corresponding outputs. For convenience, several default performance metrics are implemented in the package, and can be added to SummarizedBenchmark objects using simple commands.
In this vignette, we first illustrate the basic use of both the BenchDesign and SummarizedBenchmark classes with a simple comparison of methods for p-value correction in the context of multiple hypothesis testing. Then, we describe more advanced features of the package with a case study comparing three methods for differential expression analysis.
To illustrate the basic use of the BenchDesign class, we use the tdat
data set included with this package.
The data set is a data.frame containing the results of 50 two-sample t-tests. The tests were performed using independently simulated sets of 20 observations drawn from a single standard Normal distribution (when H = 0
) or two mean-shifted Normal distributions (when H = 1
).
## H test_statistic effect_size pval SE
## 1 1 -3.2083437 -1.17151466 4.872282e-03 0.3651463
## 2 0 0.1692236 0.07321912 8.675080e-01 0.4326768
## 3 1 -5.7077940 -1.81715381 2.061521e-05 0.3183636
## 4 1 -1.9805856 -1.09107836 6.313031e-02 0.5508867
## 5 1 -1.0014358 -0.37726058 3.298895e-01 0.3767197
## 6 1 -0.9190433 -0.47583669 3.702252e-01 0.5177522
Several approaches have been proposed and implemented to compute adjusted p-values and q-values with the goal of controlling the total number of false discoveries across a collection of tests. In this example, we compare three such methods:
p.adjust
w/ method = "bonferroni"
) (Dunn 1961),p.adjust
w/ method = "BH"
) (Benjamini and Hochberg 1995), andqvalue::qvalue
) (Storey 2002).First, consider how benchmarking the three methods might look without the SummarizedBenchmark framework.
To compare methods, each is applied to tdat
, and the results are stored in separate variables.
adj_bonf <- p.adjust(p = tdat$pval, method = "bonferroni")
adj_bh <- p.adjust(p = tdat$pval, method = "BH")
qv <- qvalue::qvalue(p = tdat$pval)
adj_qv <- qv$qvalues
Since the values of interest are available from the ouput of each method as a vector of length 50 (the number of hypotheses tested), to keep things clean, they can be combined into a single data.frame.
## adj_bonf adj_bh adj_qv
## 1 0.24361409 0.02191765 0.0079813228
## 2 1.00000000 0.90365415 0.3290660527
## 3 0.00103076 0.00103076 0.0003753518
## 4 1.00000000 0.12140444 0.0442094809
## 5 1.00000000 0.47127065 0.1716134119
## 6 1.00000000 0.48250104 0.1757029652
The data.frame of adjusted p-values and q-values can be used to compare the methods, either by directly parsing the table or using a framework like iCOBRA. Additionally, the data.frame can be saved as a RDS
or Rdata
object for future reference, eliminating the need for recomputing on the original data.
While this approach can work well for smaller comparisons, it can quickly become overwhelming and unweildy as the number of methods and parameters increases. Furthermore, once each method is applied and the final data.frame (adj
) is constructed, there is no way to determine how each value was calculated. While an informative name can be used to “label” each method (as done above), this does not capture the full complexity, e.g. parameters and context, where the function was evaluated. One solution might involve manually recording function calls and parameters in a separate data.frame with the hope of maintaining synchrony with the output data.frame. However, this is prone to errors, e.g. during fast “copy and paste” operations or additions and delations of parameter combinations. An alternative (and hopefully better) solution, is to use the framework of the SummarizedBenchmark package.
In the SummarizedBenchmark approach, a BenchDesign is first constructed with the data as the sole input. (A BenchDesign can also be constructed without any data input. This approach is described in a later section.)
Then, each method of interest is added to the BenchDesign using addMethod()
.
b <- addMethod(bd = b, label = "bonf", func = p.adjust,
params = rlang::quos(p = pval, method = "bonferroni"))
At a minimum, addMethod()
requires three parameters:
bd
: the BenchDesign object to modify,label
: a character name for the method, andfunc
: the function to be called.After the minimum parameters are specified, any parameters needed by the func
method should be passed as named parameters, e.g. p = pval, method = "bonferroni"
, to params =
as a list of quosures using rlang::quos(..)
. Notice here that the pval
wrapped in rlang::quos(..)
does not need to be specified as tdat$pval
for the function to access the column in the data.frame. For readers familiar with the ggplot2 package, the use of params = rlang::quos(..)
here should be viewed similar to the use of aes = aes(..)
in ggplot2
for mapping between the data and plotting (or benchmarking) parameters.
The process of adding methods can be written more concisely using the pipe operators from the magrittr package.
b <- b %>%
addMethod(label = "BH",
func = p.adjust,
params = rlang::quos(p = pval, method = "BH")) %>%
addMethod(label = "qv",
func = qvalue::qvalue,
params = rlang::quos(p = pval),
post = function(x) { x$qvalues })
For some methods, such as the q-value approach above, it may be necessary to call a “post-processing” function on the primary method to extract the desired output (here, the q-values). This should be specified using the optional post =
parameter.
Now, the BenchDesign object contains three methods. This can be verified using the printMethods()
function.
## bonf -------------------------------------------------------
## func:
## p.adjust
## post:
## NULL
## meta:
## parameters:
## p : pval
## method : "bonferroni"
## BH ---------------------------------------------------------
## func:
## p.adjust
## post:
## NULL
## meta:
## parameters:
## p : pval
## method : "BH"
## qv ---------------------------------------------------------
## func:
## qvalue::qvalue
## post:
## function(x) {
## x$qvalues
## }
## meta:
## parameters:
## p : pval
While the bench now includes all the information necessary for performing the benchmarking study, the actual adjusted p-values and q-values have not yet been calculated. To do this, we simply call buildBench()
. While buildBench()
does not require any inputs other than the BenchDesign object, when the corresponding ground truth is known, the truthCols =
parameter should be specified. In this example, the H
column of the tdat
data.frame contains the true null or alternative status of each simulated hypothesis test. Note that if any of the methods are defined in a separate package, they must be installed and loaded before running the experiment.
The returned object is a SummarizedBenchmark class. The SummarizedBenchmark object is an extension of a SummarizedExperiment object. The table of adjusted p-values and q-values is contained in a single “assay” of the object with each method added using addMethod()
as a column with the corresponding label
as the name.
## bonf BH qv
## [1,] 0.24361409 0.02191765 0.0079813228
## [2,] 1.00000000 0.90365415 0.3290660527
## [3,] 0.00103076 0.00103076 0.0003753518
## [4,] 1.00000000 0.12140444 0.0442094809
## [5,] 1.00000000 0.47127065 0.1716134119
## [6,] 1.00000000 0.48250104 0.1757029652
Metadata for the methods is contained in the colData()
of the same object, with each row corresponding to one method in the comparison.
## DataFrame with 3 rows and 9 columns
## func post func_anon vers_src
## <character> <character> <logical> <character>
## bonf p.adjust NA FALSE func
## BH p.adjust NA FALSE func
## qv qvalue::qvalue function(x) {; x$qvalues;} FALSE func
## pkg_name pkg_vers param.p param.method label
## <character> <character> <character> <character> <character>
## bonf stats 3.5.1 pval "bonferroni" bonf
## BH stats 3.5.1 pval "BH" BH
## qv qvalue 2.12.0 pval NA qv
In addition to columns for the functions and parameters specified with addMethod()
(func, post, label, param.*
), the colData()
includes several other columns added during the buildBench()
process. Most notably, columns for the package name and version of func
if available (pkg_name
, pkg_vers
).
When available, ground truth data is contained in the rowData()
of the SummarizedBenchmark object.
## DataFrame with 50 rows and 1 column
## H
## <numeric>
## 1 1
## 2 0
## 3 1
## 4 1
## 5 1
## ... ...
## 46 0
## 47 1
## 48 0
## 49 1
## 50 1
An important advantage of building on the existing SummarizedExperiment class and Bioconductor infrastructure to save the results is that the metadata is tighly linked to the data. Thus, it is possible, for example, to subset the data while keeping the link to its respective metadata in a single step. For example, the code below extracts the data for only the first two methods.
## DataFrame with 2 rows and 9 columns
## func post func_anon vers_src pkg_name pkg_vers
## <character> <character> <logical> <character> <character> <character>
## bonf p.adjust NA FALSE func stats 3.5.1
## BH p.adjust NA FALSE func stats 3.5.1
## param.p param.method label
## <character> <character> <character>
## bonf pval "bonferroni" bonf
## BH pval "BH" BH
In addition, the SummarizedBenchmark class contains an additional slot where users can define performance metrics to evaluate the different methods.
Since different benchmarking experiments may require the use of different metrics to evaluate the performance of the methods, the SummarizedBenchmark class provides a flexible way to define performance metrics. We can define performance metrics using the function addPerformanceMetric()
by providing a SummarizedBenchmark object, a name of the metric, an assay name, and the function that defines it. Importantly, the function must contain the following two arguments: query (referring to a vector of values being evaluated, i.e. the output of one method) and truth (referring to the vector of ground truths). If further arguments are provided to the performance function, these must contain default values.
For our example, we define the performance metric “TPR” (True Positive Rate) that calculates the fraction of true positives recovered given an alpha value. This performance metric uses the H
assay of our SummarizedBenchmark example object.
sb <- addPerformanceMetric(
object = sb,
assay = "H",
evalMetric = "TPR",
evalFunction = function(query, truth, alpha = 0.1) {
goodHits <- sum((query < alpha) & truth == 1)
goodHits / sum(truth == 1)
}
)
performanceMetrics(sb)[["H"]]
## $TPR
## function (query, truth, alpha = 0.1)
## {
## goodHits <- sum((query < alpha) & truth == 1)
## goodHits/sum(truth == 1)
## }
Having defined all the desired performance metrics, the function estimatePerformanceMetrics()
calculates these for each method. Parameters for the performance functions can be passed here. In the case below, we specify several alpha =
values to be used for calculating the performance metrics with each function.
## DataFrame with 3 rows and 12 columns
## func post func_anon vers_src
## <character> <character> <logical> <character>
## bonf p.adjust NA FALSE func
## BH p.adjust NA FALSE func
## qv qvalue::qvalue function(x) {; x$qvalues;} FALSE func
## pkg_name pkg_vers param.p param.method label TPR.1
## <character> <character> <character> <character> <character> <numeric>
## bonf stats 3.5.1 pval "bonferroni" bonf 0.125
## BH stats 3.5.1 pval "BH" BH 0.375
## qv qvalue 2.12.0 pval NA qv 0.6
## TPR.2 TPR.3
## <numeric> <numeric>
## bonf 0.2 0.225
## BH 0.55 0.625
## qv 0.7 0.9
By default, the function above returns a DataFrame, where the parameters of the performance function are stored in its elementMetadata()
.
## DataFrame with 12 rows and 4 columns
## colType assay performanceMetric alpha
## <character> <character> <character> <numeric>
## 1 methodInformation NA NA NA
## 2 methodInformation NA NA NA
## 3 methodInformation NA NA NA
## 4 methodInformation NA NA NA
## 5 methodInformation NA NA NA
## ... ... ... ... ...
## 8 methodInformation NA NA NA
## 9 methodInformation NA NA NA
## 10 performanceMetric H TPR 0.05
## 11 performanceMetric H TPR 0.1
## 12 performanceMetric H TPR 0.2
A second possibility is to set the parameter addColData = TRUE
for these results to be stored in the colData()
of the SummarizedBenchmark object.
## DataFrame with 3 rows and 12 columns
## func post func_anon vers_src
## <character> <character> <logical> <character>
## bonf p.adjust NA FALSE func
## BH p.adjust NA FALSE func
## qv qvalue::qvalue function(x) {; x$qvalues;} FALSE func
## pkg_name pkg_vers param.p param.method label TPR.1
## <character> <character> <character> <character> <character> <numeric>
## bonf stats 3.5.1 pval "bonferroni" bonf 0.125
## BH stats 3.5.1 pval "BH" BH 0.375
## qv qvalue 2.12.0 pval NA qv 0.6
## TPR.2 TPR.3
## <numeric> <numeric>
## bonf 0.2 0.225
## BH 0.55 0.625
## qv 0.7 0.9
## DataFrame with 12 rows and 4 columns
## colType assay performanceMetric alpha
## <character> <character> <character> <numeric>
## 1 methodInformation NA NA NA
## 2 methodInformation NA NA NA
## 3 methodInformation NA NA NA
## 4 methodInformation NA NA NA
## 5 methodInformation NA NA NA
## ... ... ... ... ...
## 8 methodInformation NA NA NA
## 9 methodInformation NA NA NA
## 10 performanceMetric H TPR 0.05
## 11 performanceMetric H TPR 0.1
## 12 performanceMetric H TPR 0.2
Finally, if the user prefers tidier formats, by setting the parameter tidy = TRUE
the function returns a long-formated version of the results.
## func post func_anon vers_src pkg_name
## 1 p.adjust <NA> FALSE func stats
## 2 p.adjust <NA> FALSE func stats
## 3 qvalue::qvalue function(x) {; x$qvalues;} FALSE func qvalue
## 4 p.adjust <NA> FALSE func stats
## 5 p.adjust <NA> FALSE func stats
## 6 qvalue::qvalue function(x) {; x$qvalues;} FALSE func qvalue
## 7 p.adjust <NA> FALSE func stats
## 8 p.adjust <NA> FALSE func stats
## 9 qvalue::qvalue function(x) {; x$qvalues;} FALSE func qvalue
## pkg_vers param.p param.method label key value assay performanceMetric
## 1 3.5.1 pval "bonferroni" bonf TPR.1 0.125 H TPR
## 2 3.5.1 pval "BH" BH TPR.1 0.375 H TPR
## 3 2.12.0 pval <NA> qv TPR.1 0.600 H TPR
## 4 3.5.1 pval "bonferroni" bonf TPR.2 0.200 H TPR
## 5 3.5.1 pval "BH" BH TPR.2 0.550 H TPR
## 6 2.12.0 pval <NA> qv TPR.2 0.700 H TPR
## 7 3.5.1 pval "bonferroni" bonf TPR.3 0.225 H TPR
## 8 3.5.1 pval "BH" BH TPR.3 0.625 H TPR
## 9 2.12.0 pval <NA> qv TPR.3 0.900 H TPR
## alpha
## 1 0.05
## 2 0.05
## 3 0.05
## 4 0.10
## 5 0.10
## 6 0.10
## 7 0.20
## 8 0.20
## 9 0.20
As an alternative to get the same data.frame as the previous chunk, we can call the function tidyUpMetrics()
on the saved results from a SummarizedBenchmark object.
## func post func_anon vers_src pkg_name
## 1 p.adjust <NA> FALSE func stats
## 2 p.adjust <NA> FALSE func stats
## 3 qvalue::qvalue function(x) {; x$qvalues;} FALSE func qvalue
## 4 p.adjust <NA> FALSE func stats
## 5 p.adjust <NA> FALSE func stats
## 6 qvalue::qvalue function(x) {; x$qvalues;} FALSE func qvalue
## pkg_vers param.p param.method label key value assay performanceMetric
## 1 3.5.1 pval "bonferroni" bonf TPR.1 0.125 H TPR
## 2 3.5.1 pval "BH" BH TPR.1 0.375 H TPR
## 3 2.12.0 pval <NA> qv TPR.1 0.600 H TPR
## 4 3.5.1 pval "bonferroni" bonf TPR.2 0.200 H TPR
## 5 3.5.1 pval "BH" BH TPR.2 0.550 H TPR
## 6 2.12.0 pval <NA> qv TPR.2 0.700 H TPR
## alpha
## 1 0.05
## 2 0.05
## 3 0.05
## 4 0.10
## 5 0.10
## 6 0.10
For example, the code below extracts the TPR for an alpha of 0.1 for the Bonferroni method.
tidyUpMetrics(sb) %>%
dplyr:::filter(label == "bonf", alpha == 0.1, performanceMetric == "TPR") %>%
dplyr:::select(value)
## value
## 1 0.2
In this more advanced case study, we use a simulated data set from (Soneson and Robinson 2016) to demonstrate how the SummarizedBenchmark package can be used to benchmark methods for differential expression analysis. Namely, we compare the methods implemented in the DESeq2, edgeR, and limma packages. The simulated data set includes 6 samples of three replicates each from two conditions. For each sample, transcript-level expression is provided as transcripts per-million (TPM) values for 15,677 transcripts from human chromosome 1 (Ensembl GRCh37.71). A more complete description of the data, including code for how the data ws generated, is available in the Supplementary Materials of (Soneson and Robinson 2016) here. We provide precomputed objects containing these count and ground truth data.
## A1 A2 A3 B1 B2
## ENST00000367770 1.670571 0.8064829 5.472561 58.700418 32.89297
## ENST00000367771 558.834722 458.8887676 662.352695 5.299743 20.73813
## ENST00000367772 155.881534 110.6033685 183.417201 0.000000 0.00000
## ENST00000423670 11.809207 16.4752934 10.426669 20.392491 1.26733
## ENST00000470238 96.489863 34.2755231 32.489730 785.514128 614.71261
## ENST00000286031 12.442872 13.4797855 4.781290 1.152118 20.50770
## B3
## ENST00000367770 16.648100
## ENST00000367771 14.862318
## ENST00000367772 0.000000
## ENST00000423670 7.546371
## ENST00000470238 815.123229
## ENST00000286031 5.760588
## # A tibble: 6 x 13
## transcript status logFC_cat logFC avetpm length eff_length tpm1 tpm2
## <chr> <int> <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 ENST00000… 1 [ 3.38,3… 4.17 0.225 2916 2771. 0.0237 0.427
## 2 ENST00000… 1 [ 3.38,3… 5.25 3.20 2921 2776. 6.23 0.164
## 3 ENST00000… 1 Inf Inf 0.817 3477 3332. 1.63 0
## 4 ENST00000… 1 [ 0.00, … 0.322 0.203 2077 1932. 0.181 0.226
## 5 ENST00000… 1 [ 3.38,3… 4.07 4.08 1538 1393. 0.460 7.71
## 6 ENST00000… 0 [ 0.00, … 0 0.154 4355 4210. 0.154 0.154
## # ... with 4 more variables: isopct1 <dbl>, isopct2 <dbl>, gene <chr>,
## # avetpm_cat <chr>
We begin the benchmarking process by creating our BenchDesign object with the data set. The BenchDesign can be initialized with a data.frame (as in the case study above), or more generally, with a list object. In this case study, since methods for differential expression require more than just the expression counts, e.g. the experimental design, we construct a list containing each of these inputs as a named entry.
The scaled TPM values are rounded before passing to the differential expression methods.
Here, we simply use the the conditions for each sample to define the experimental design. The design matrix is stored as a data.frame, mycoldat
.
mycoldat <- data.frame(condition = factor(rep(c(1, 2), each = 3)))
rownames(mycoldat) <- colnames(mycounts)
The data object for the benchmark experiment is now constructed with both the counts and the design matrix, along with some ground truth information (“status”: the true presence or absence of differential expression between conditions, and “lfc”: the expected log-fold change between conditions).
As before, the BenchDesign is constructed with the data as the sole input.
For simplicity, we focus on comparing only the p-values returned by each method after testing for differential expression between the two conditions. However, later in this vignette, we also show how multiple metrics (p-values and log-fold change) can be compared in a single BenchDesign object.
Since each method requires running multiple steps, we write wrapper functions which return only the vector of p-values for each method.
deseq2_pvals <- function(countData, colData, design, contrast) {
dds <- DESeqDataSetFromMatrix(countData,
colData = colData,
design = design)
dds <- DESeq(dds)
res <- results(dds, contrast = contrast)
res$pvalue
}
edgeR_pvals <- function(countData, group, design) {
y <- DGEList(countData, group = group)
y <- calcNormFactors(y)
des <- model.matrix(design)
y <- estimateDisp(y, des)
fit <- glmFit(y, des)
lrt <- glmLRT(fit, coef=2)
lrt$table$PValue
}
voom_pvals <- function(countData, group, design) {
y <- DGEList(countData, group = group)
y <- calcNormFactors(y)
des <- model.matrix(design)
y <- voom(y, des)
eb <- eBayes(lmFit(y, des))
eb$p.value[, 2]
}
Next, each method is added to the BenchDesign using addMethod()
, and the corresponding wrapper function passed as func
. (For a review of the basic usage of addMethod()
, revisit Section 2.) We again use the pipe notation for compactness.
bd <- bd %>%
addMethod(label = "deseq2",
func = deseq2_pvals,
params = rlang::quos(countData = cntdat,
colData = coldat,
design = ~condition,
contrast = c("condition", "2", "1"))
) %>%
addMethod(label = "edgeR",
func = edgeR_pvals,
params = rlang::quos(countData = cntdat,
group = coldat$condition,
design = ~coldat$condition)
) %>%
addMethod(label = "voom",
func = voom_pvals,
params = rlang::quos(countData = cntdat,
group = coldat$condition,
design = ~coldat$condition)
)
So far, none of the methods have been executed. The BenchDesign object simply serves as a container describing how the methods should be executed. The methods are applied by a simple call to buildBench()
. Since the ground truth is known and available in mydat$status
, this is specified to truthCols=
.
We can inspect the results.
## class: SummarizedBenchmark
## dim: 15677 3
## metadata(1): sessionInfo
## assays(1): status
## rownames(15677): ENST00000367770 ENST00000367771 ... ENST00000470694
## ENST00000499986
## rowData names(1): status
## colnames(3): deseq2 edgeR voom
## colData names(12): func post ... param.group label
By running the code above, the results of three differential expression methods (edgeR, limma-voom and DESeq2 will be stored in a SummarizedBenchmark
container. The next step is to define metrics to evaluate the performance of these three methods. This can be done by using the function addPerformanceMetric()
, as described before in Section 2. However, in this package there are implementations for several ‘default’ metrics that are commonly used to evaluate methods. The function availableMetrics()
returns a data.frame of these metrics.
## functions description
## 1 rejections Number of rejections
## 2 TPR True Positive Rate
## 3 TNR True Negative Rate
## 4 FDR False Discovery Rate (estimated)
## 5 FNR False Negative Rate
## 6 correlation Pearson correlation
## 7 sdad Standard Deviation of the Absolute Difference
## 8 hamming Hamming distance
## 9 LPnorm L_{p} norm
## 10 adjustedRandIndex Adjusted Rand Index
## requiresTruth
## 1 FALSE
## 2 TRUE
## 3 TRUE
## 4 TRUE
## 5 TRUE
## 6 TRUE
## 7 TRUE
## 8 TRUE
## 9 TRUE
## 10 TRUE
For example, the predefined metrics rejections
, TPR
, TNR
, FDR
and FNR
can be added to the assay H
of our object using the following code,
sb <- addPerformanceMetric( sb,
evalMetric=c("rejections", "TPR", "TNR", "FDR", "FNR"),
assay="status" )
names(performanceMetrics(sb)[["status"]])
## [1] "rejections" "TPR" "TNR" "FDR" "FNR"
Having defined the desired performance metrics, the function estimatePerformanceMetrics()
will calculate these metrics for each of the three methods.
estimatePerformanceMetrics(
sb,
alpha = c(0.01, 0.05, 0.1, 0.2),
tidy = TRUE) %>%
dplyr:::select(label, value, performanceMetric, alpha) %>%
tail()
## label value performanceMetric alpha
## 55 deseq2 0.10988246 FNR 0.1
## 56 edgeR 0.09381113 FNR 0.1
## 57 voom 0.11381970 FNR 0.1
## 58 deseq2 0.09223211 FNR 0.2
## 59 edgeR 0.07986767 FNR 0.2
## 60 voom 0.08741018 FNR 0.2
Furthermore, the functions plotMethodsOverlap()
and plotROC()
are helpful to visualize the performance of the different methods, in case these methods output q-values.
plotMethodsOverlap()
is a wrapper for the function upset()
from the UpSetR package that is helpful to visualize the overlaps between hits of different methods for a given alpha value.
From the plot above, it is evident that there is a large number of transcripts that are detected to be differentially expressed by all three methods. There are also smallers sets of transcripts that are detected uniquely by a single method or subsets of methods. Another typical way to compare the performance of different methods are Receiver Operating Characteristic (ROC) curves. The function plotROC()
inputs a SummarizeBenchmark object and draws the ROC curves for all methods contained in it.
Here, we describe several additional features implemented in SummarizedBenchmark for building on the standard workflow described in the previous sections. The features are illustrated using the same differential expression example from above.
The differential expression case study described above has assumed that we are interested in a single numeric vector for each method, namely, a vector of p-values. These p-values are stored as the sole assay
in the SummarizedBenchmark object returned by buildBench()
. However, in many cases, there are multiple values of interest to be compared across methods. For example, looking at the estimated log-fold changes in addition to p-values may be informative when comparing methods for differential expression.
The BenchDesign framework supports multiple assays with the post =
parameter of the addMethod()
call. When zero or one function is specified to post =
for all methods, as in the examples above, the results are stored as a single assay
. However, if post =
is passed a named list of functions, separate assay
s will be created using the names and functions in each list. Since the assay
names are taken from post =
, all entries in the list must be named. Furthermore, if more than one assay
is desired, the post =
parameter must be specified for all methods. This is strictly enforced to avoid ambiguities when combining results across methods.
To track both p-values and log-fold change values for each method, we write new wrapper functions. Separate wrapper functions are written for first returning the primary analysis results, and separate accessor functions for extracting p-values and log-fold changes from the results.
deseq2_run <- function(countData, colData, design, contrast) {
dds <- DESeqDataSetFromMatrix(countData,
colData = colData,
design = design)
dds <- DESeq(dds)
results(dds, contrast = contrast)
}
deseq2_pv <- function(x) {
x$pvalue
}
deseq2_lfc <- function(x) {
x$log2FoldChange
}
edgeR_run <- function(countData, group, design) {
y <- DGEList(countData, group = group)
y <- calcNormFactors(y)
des <- model.matrix(design)
y <- estimateDisp(y, des)
fit <- glmFit(y, des)
glmLRT(fit, coef=2)
}
edgeR_pv <- function(x) {
x$table$PValue
}
edgeR_lfc <- function(x) {
x$table$logFC
}
voom_run <- function(countData, group, design) {
y <- DGEList(countData, group = group)
y <- calcNormFactors(y)
des <- model.matrix(design)
y <- voom(y, des)
eBayes(lmFit(y, des))
}
voom_pv <- function(x) {
x$p.value[, 2]
}
voom_lfc <- function(x) {
x$coefficients[, 2]
}
The primary wrapper function and a list of accessor functions are passed to func =
and post =
respectively.
bd <- BenchDesign(mydat) %>%
addMethod(label = "deseq2",
func = deseq2_run,
post = list(pv = deseq2_pv,
lfc = deseq2_lfc),
params = rlang::quos(countData = cntdat,
colData = coldat,
design = ~condition,
contrast = c("condition", "2", "1"))
) %>%
addMethod(label = "edgeR",
func = edgeR_run,
post = list(pv = edgeR_pv,
lfc = edgeR_lfc),
params = rlang::quos(countData = cntdat,
group = coldat$condition,
design = ~coldat$condition)
) %>%
addMethod(label = "voom",
func = voom_run,
post = list(pv = voom_pv,
lfc = voom_lfc),
params = rlang::quos(countData = cntdat,
group = coldat$condition,
design = ~coldat$condition)
)
When the BenchDesign is evaluated using buildBench()
, the resulting SummarizedBenchmark will be generated with two assays: "pv"
and "lfc"
. As before, the ground truth can be specified using the truthCols =
parameter. When multiple assays are used, truthCols =
expects a named vector of assay-name = "column-name"
pairs.
## class: SummarizedBenchmark
## dim: 15677 3
## metadata(1): sessionInfo
## assays(2): pv lfc
## rownames: NULL
## rowData names(2): pv lfc
## colnames(3): deseq2 edgeR voom
## colData names(12): func post ... param.group label
We can verify that the two assays contain the expected values.
## deseq2 edgeR voom
## [1,] 2.875541e-04 2.074073e-04 6.103023e-03
## [2,] 2.799371e-23 2.121422e-16 1.239445e-04
## [3,] 8.450676e-14 3.109331e-18 5.221157e-06
## [4,] 6.930834e-01 6.621236e-01 3.357040e-01
## [5,] 2.479616e-11 2.468951e-09 3.978989e-04
## [6,] 9.057130e-01 8.947438e-01 5.199404e-01
## deseq2 edgeR voom
## [1,] 3.7329019 3.6767779 3.5232524
## [2,] -5.3905346 -5.3773940 -5.5543949
## [3,] -9.7125453 -10.2467488 -8.2533275
## [4,] -0.4706410 -0.4547095 -1.0510456
## [5,] 3.7048724 3.7024426 3.9084940
## [6,] -0.1554938 -0.1504299 -0.7607029
The simple examples considered in this vignette were constructed to be computational manageable with only one core. However, when working with larger data sets, running each method in serial with a single machine is often undesirable. Since constructing a BenchDesign object requires no computation, the bottleneck only appears at the buildBench()
step of the process. Parallelization of this step is enabled using the BiocParallel package.
By default, parallel evaluation is disabled, but can easily be enabled by setting parallel = TRUE
and optionally specifying the BPPARAM =
parameter. If BPPARAM =
is not specified, the default back-end will be used. The default back-end can be checked with bpparam()
.
## class: MulticoreParam
## bpisup: FALSE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
## bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
## bptimeout: 2592000; bpprogressbar: FALSE
## bpRNGseed:
## bplogdir: NA
## bpresultdir: NA
## cluster type: FORK
## class: SummarizedBenchmark
## dim: 15677 3
## metadata(1): sessionInfo
## assays(2): pv lfc
## rownames: NULL
## rowData names(2): pv lfc
## colnames(3): deseq2 edgeR voom
## colData names(12): func post ... param.group label
The results, as expected, are the same as when buildBench()
was called without parallelization.
## [1] TRUE
Details on how to specify the parallelization back-end can be found in the Introduction to BiocParallel vignette for the BiocParallel package. Parallelization of buildBench()
is carried out across the set of methods specified with addMethod()
. There is no benefit to specifying more cores than the number of methods.
Metadata for methods are stored in the colData()
of SummarizedBenchmark objects. As metioned above, several default metadata columns are populated in the colData()
of the SummarizedBenchmark object generated by a call to buildBench()
. Sometimes it may be useful to include additional metadata columns beyond just the default columns. While this can be accomplished manually by modifying the colData()
of the SummarizedBenchmark object post hoc, method metadata can also be specified at the addMethod()
step using the meta =
optional parameter. The meta =
parameter accepts a named list of metadata information. Each list entry will be added to the colData()
as a new column. To avoid collisions between metadata columns specified with meta =
and the default set of columns, metadata specified using meta =
will be added to colData()
with meta.
prefixed to the column name.
As an example, we construct a BenchDesign object again using the differential expression example. The BenchDesign is created with two methods, DESeq2 and edgeR. Each method is specified with the optional meta =
parameter. We can verify that the manually defined metadata column (meta.reason
) is available in the colData()
of the generated SummarizedBenchmark.
BenchDesign(mydat) %>%
addMethod(label = "deseq2",
func = deseq2_pvals,
meta = list(reason = "recommended by friend X"),
params = rlang::quos(countData = cntdat,
colData = coldat,
design = ~condition,
contrast = c("condition", "2", "1"))
) %>%
addMethod(label = "edgeR",
func = edgeR_pvals,
meta = list(reason = "recommended by friend Y"),
params = rlang::quos(countData = cntdat,
group = coldat$condition,
design = ~coldat$condition)
) %>%
buildBench() %>%
colData()
## DataFrame with 2 rows and 13 columns
## func post func_anon vers_src pkg_name
## <character> <character> <logical> <character> <character>
## deseq2 deseq2_pvals NA TRUE func NA
## edgeR edgeR_pvals NA TRUE func NA
## pkg_vers meta.reason param.countData param.colData
## <character> <character> <character> <character>
## deseq2 NA recommended by friend X cntdat coldat
## edgeR NA recommended by friend Y cntdat NA
## param.design param.contrast param.group
## <character> <character> <character>
## deseq2 ~condition c("condition", "2", "1") NA
## edgeR ~coldat$condition NA coldat$condition
## label
## <character>
## deseq2 deseq2
## edgeR edgeR
While all methods in this example had the meta =
option specified, this is not necessary. It is completely acceptable to specify the meta =
parameter for only a subset of methods.
Arguably, two of the most important pieces of metadata stored in the colData()
of the SummarizedBenchmark returned by buildBench()
are the relevant package name and version (pkg_name
, pkg_vers
). Determining the package name and version requires the primary “workhorse” function of the method be directly specified as func =
in the addMethod()
call. In some cases, this may not be possible, e.g. if the “workhorse” function is a wrapper as in the differential expression example above. However, there still might exist an important function for which we would like to track the package name and version. The meta
parameter can help.
The meta =
parameter will handle the following named list entries as special values: pkg_name
, pkg_vers
, pkg_func
. First, if values are specified for pkg_name
and pkg_vers
in meta =
, these will overwrite the values determined from func =
. To trace the source of pkg_name
and pkg_vers
information, the vers_src
column of the colData
will be set to "meta_manual"
(the default value is "func"
). Alternatively, a function can be passed to meta =
as pkg_func
. This function will be used to determine both pkg_name
and pkg_vers
, and will take precendence over manually specified pkg_name
and pkg_vers
values. If pkg_func
is specified, it will be included in the colData()
as a new column with the same name, and the vers_src
column will be set to "meta_func"
. **Note: the function should be wrapped in rlang::quo
to be properly parsed.
The following example illustrates the behavior when using either pkg_func
or pkg_name
and pkg_vers
with the meta
optional parameter.
BenchDesign(mydat) %>%
addMethod(label = "deseq2",
func = deseq2_pvals,
meta = list(pkg_func = rlang::quo(DESeq2::DESeq)),
params = rlang::quos(countData = cntdat,
colData = coldat,
design = ~condition,
contrast = c("condition", "2", "1"))
) %>%
addMethod(label = "edgeR",
func = edgeR_pvals,
meta = list(pkg_name = "edgeR",
pkg_vers = as.character(packageVersion("edgeR"))),
params = rlang::quos(countData = cntdat,
group = coldat$condition,
design = ~coldat$condition)
) %>%
buildBench() %>%
colData()
## DataFrame with 2 rows and 13 columns
## func post func_anon vers_src pkg_name
## <character> <character> <logical> <character> <character>
## deseq2 deseq2_pvals NA TRUE meta_func NA
## edgeR edgeR_pvals NA TRUE meta_manual edgeR
## pkg_vers pkg_func param.countData param.colData
## <character> <character> <character> <character>
## deseq2 NA DESeq2::DESeq cntdat coldat
## edgeR 3.22.3 NA cntdat NA
## param.design param.contrast param.group
## <character> <character> <character>
## deseq2 ~condition c("condition", "2", "1") NA
## edgeR ~coldat$condition NA coldat$condition
## label
## <character>
## deseq2 deseq2
## edgeR edgeR
Modifying the defintion of a method after it has been added to a BenchDesign is supported by the modifyMethod()
function. The BenchDesign object created in the differential expression above includes a method called DESeq2. We can check the definition of this method using printMethod()
.
## deseq2 -----------------------------------------------------
## func:
## deseq2_run
## post:
## list(pv = deseq2_pv, lfc = deseq2_lfc)
## meta:
## parameters:
## countData : cntdat
## colData : coldat
## design : ~condition
## contrast : c("condition", "2", "1")
Suppose we wish to both flip the order of the contrast, and add a metadata tag. This can be easily accomplished by passing both new parameters to modifyMethod()
as a rlang::quos(..)
list, similar to how they would be passed to addMethod()
. If the func
, post
, or meta
valuesshould be modified, they should be included in the list using the special keywords, bd.func =
, bd.post =
, or bd.meta =
, as shown in the example below.
modifyMethod(bd, label = "deseq2",
params = rlang::quos(contrast = c("condition", "1", "2"),
bd.meta = list(note = "modified post hoc"))
) %>%
printMethod("deseq2")
## deseq2 -----------------------------------------------------
## func:
## deseq2_run
## post:
## list(pv = deseq2_pv, lfc = deseq2_lfc)
## meta:
## note : "modified post hoc"
## parameters:
## countData : cntdat
## colData : coldat
## design : ~condition
## contrast : c("condition", "1", "2")
Sometimes it may be desirable to completely overwrite all function parameters for a method, e.g. countData
, colData
, design
, and contrast
in the case of DESeq2. This may occur if some parameters were optional and originally specified, but no longer necessary. All function parameters can be overwritten by specifying .overwrite = TRUE
.
modifyMethod(bd, label = "deseq2",
params = rlang::quos(contrast = c("condition", "1", "2"),
bd.meta = list(note = "modified post hoc")),
.overwrite = TRUE) %>%
printMethod("deseq2")
## deseq2 -----------------------------------------------------
## func:
## deseq2_run
## post:
## list(pv = deseq2_pv, lfc = deseq2_lfc)
## meta:
## note : "modified post hoc"
## parameters:
## contrast : c("condition", "1", "2")
Notice that all parameters other than contrast = c("condition", "1", "2")
have been dropped.
In addition to comparing multiple methods, a benchmark study may also involve comparing a single method across several parameter settings. The expandMethod()
function provides the capability to take a method already defined in the BenchDesign, and expand it to multiple methods with differing parameter values in the BenchDesign object. In the following example, expandMethod()
is used to duplicate the DESeq2 method with only the "contrast"
parameter modified.
bde <- expandMethod(bd, label = "deseq2",
onlyone = "contrast",
params = rlang::quos(deseq2_v1 = c("condition", "1", "2"),
deseq2_v2 = c("condition", "2", "2"))
)
printMethod(bde, "deseq2_v1")
## deseq2_v1 --------------------------------------------------
## func:
## deseq2_run
## post:
## list(pv = deseq2_pv, lfc = deseq2_lfc)
## meta:
## parameters:
## countData : cntdat
## colData : coldat
## design : ~condition
## contrast : c("condition", "1", "2")
## deseq2_v2 --------------------------------------------------
## func:
## deseq2_run
## post:
## list(pv = deseq2_pv, lfc = deseq2_lfc)
## meta:
## parameters:
## countData : cntdat
## colData : coldat
## design : ~condition
## contrast : c("condition", "2", "2")
Notice that the parameter to be modified is specified with onlyone =
(denoting “only one” parameter will be changed), and that method names are taken from names of the quosure list passed to params =
in the expandMethod()
call. To modify more than a single parameter in the duplicated methods, the onlyone =
parameter can be ignored, and the new parameter values specified to params =
as a list of quosure lists. Below, both the "contrast"
and meta
parameters are modified in the expanded methods.
bde <- expandMethod(bd, label = "deseq2",
params = list(deseq2_v1 = rlang::quos(contrast = c("condition", "1", "2"),
meta = list(note = "filp order")),
deseq2_v2 = rlang::quos(contrast = c("condition", "2", "2"),
meta = list(note = "nonsensical order"))
)
)
printMethod(bde, "deseq2_v1")
## deseq2_v1 --------------------------------------------------
## func:
## deseq2_run
## post:
## list(pv = deseq2_pv, lfc = deseq2_lfc)
## meta:
## parameters:
## countData : cntdat
## colData : coldat
## design : ~condition
## contrast : c("condition", "1", "2")
## meta : list(note = "filp order")
## deseq2_v2 --------------------------------------------------
## func:
## deseq2_run
## post:
## list(pv = deseq2_pv, lfc = deseq2_lfc)
## meta:
## parameters:
## countData : cntdat
## colData : coldat
## design : ~condition
## contrast : c("condition", "2", "2")
## meta : list(note = "nonsensical order")
After constructing a BenchDesign, it may become clear that a single method or parameter setting is no longer relevant to the comparison. In this case, the method can be easily dropped from the BenchDesign by specifying the BenchDesign and method name to dropMethod()
.
## edgeR ------------------------------------------------------
## func:
## edgeR_run
## post:
## list(pv = edgeR_pv, lfc = edgeR_lfc)
## meta:
## parameters:
## countData : cntdat
## group : coldat$condition
## design : ~coldat$condition
## voom -------------------------------------------------------
## func:
## voom_run
## post:
## list(pv = voom_pv, lfc = voom_lfc)
## meta:
## parameters:
## countData : cntdat
## group : coldat$condition
## design : ~coldat$condition
When benchmarking several methods, it is generally considered good practice to apply the methods to more than just a single data set. Under the SummarizedBenchmark framework, this naturally translates to recycling the same set of methods defined in a single BenchDesign object across multiple data sets. While the BenchDesign objects in the examples above were all initialized with a data set, this is not necessary.
## BenchDesign object -----------------------------------------
## benchmark data:
## NULL
## benchmark methods:
## none
As before, methods can be added to the BenchDesign with addMethod()
, and the benchmark experiment run using buildBench()
.
bdnull <- bdnull %>%
addMethod(label = "bonf",
func = p.adjust,
params = rlang::quos(p = pval,
method = "bonferroni")) %>%
addMethod(label = "BH",
func = p.adjust,
params = rlang::quos(p = pval,
method = "BH"))
While not mentioned above, the buildBench()
method accepts an optional data =
parameter. When specified, this data set is used to run the experiment, taking precedence over the data set specified in (or missing from) the BenchDesign object.
## class: SummarizedBenchmark
## dim: 50 2
## metadata(1): sessionInfo
## assays(1): bench
## rownames: NULL
## rowData names(1): bench
## colnames(2): bonf BH
## colData names(9): func post ... param.method label
By specifying data during the buildBench()
step the exact same benchmark comparison, as defined in the common BenchDesign object, can be carried out consistently across multiple data sets. While this approach works even if the common BenchDesign object contains a default data set, it is recommended that the BenchDesign be created without any data to avoid errors if the design is going to be reunsed across data sets.
Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society. Series B (Methodological), 289–300.
Dunn, Olive Jean. 1961. “Multiple Comparisons Among Means.” Journal of the American Statistical Association 56 (293):52–64.
Soneson, Charlotte, and Mark D Robinson. 2016. “iCOBRA: Open, Reproducible, Standardized and Live Method Benchmarking.” Nature Methods 13 (4). Springer Nature:283–83. https://doi.org/10.1038/nmeth.3805.
Storey, John D. 2002. “A Direct Approach to False Discovery Rates.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 (3):479–98.