LOND {onlineFDR} | R Documentation |
Implements the LOND algorithm for online FDR control, where LOND stands for (significance) Levels based On Number of Discoveries, as presented by Javanmard and Montanari (2015).
LOND( d, alpha = 0.05, betai, dep = FALSE, random = TRUE, display_progress = FALSE, date.format = "%Y-%m-%d", original = TRUE )
d |
Either a vector of p-values, or a dataframe with three columns: an identifier (‘id’), date (‘date’) and p-value (‘pval’). If no column of dates is provided, then the p-values are treated as being ordered sequentially with no batches. |
alpha |
Overall significance level of the FDR procedure, the default is 0.05. |
betai |
Optional vector of β_i. A default is provided as proposed by Javanmard and Montanari (2018), equation 31. |
dep |
Logical. If |
random |
Logical. If |
display_progress |
Logical. If |
date.format |
Optional string giving the format that is used for dates. |
original |
Logical. If |
The function takes as its input either a vector of p-values, or a dataframe with three columns: an identifier (‘id’), date (‘date’) and p-value (‘pval’). The case where p-values arrive in batches corresponds to multiple instances of the same date. If no column of dates is provided, then the p-values are treated as being ordered sequentially with no batches.
The LOND algorithm controls the FDR for independent p-values (see below for the modification for dependent p-values). Given an overall significance level α, we choose a sequence of non-negative numbers β_i such that they sum to α. The values of the adjusted significance thresholds α_i are chosen as follows:
α_i = (D(i-1) + 1)β_i
where D(n) denotes the number of discoveries in the first n hypotheses.
A slightly modified version of LOND with thresholds α_i = max(D(i-1), 1)β_i provably controls the FDR under positive dependence (PRDS condition), see Zrnic et al. (2021).
For arbitrarily dependent p-values, LOND controls the FDR if it is modified with β_i / H(i) in place of β_i, where H(j) is the i-th harmonic number.
Further details of the LOND algorithm can be found in Javanmard and Montanari (2015).
out |
A dataframe with the original data |
Javanmard, A. and Montanari, A. (2015) On Online Control of False Discovery Rate. arXiv preprint, https://arxiv.org/abs/1502.06197.
Javanmard, A. and Montanari, A. (2018) Online Rules for Control of False Discovery Rate and False Discovery Exceedance. Annals of Statistics, 46(2):526-554.
Zrnic, T., Ramdas, A. and Jordan, M.I. (2021). Asynchronous Online Testing of Multiple Hypotheses. Journal of Machine Learning Research (to appear), https://arxiv.org/abs/1812.05068.
LONDstar
presents versions of LORD for synchronous
p-values, i.e. where each test can only start when the previous test has
finished.
sample.df <- data.frame( id = c('A15432', 'B90969', 'C18705', 'B49731', 'E99902', 'C38292', 'A30619', 'D46627', 'E29198', 'A41418', 'D51456', 'C88669', 'E03673', 'A63155', 'B66033'), date = as.Date(c(rep('2014-12-01',3), rep('2015-09-21',5), rep('2016-05-19',2), '2016-11-12', rep('2017-03-27',4))), pval = c(2.90e-08, 0.06743, 0.01514, 0.08174, 0.00171, 3.60e-05, 0.79149, 0.27201, 0.28295, 7.59e-08, 0.69274, 0.30443, 0.00136, 0.72342, 0.54757)) set.seed(1); LOND(sample.df) LOND(sample.df, random=FALSE) set.seed(1); LOND(sample.df, alpha=0.1)