1 Simulation setting and methods compared

As IHW has been benchmarked thoroughly elsewhere (for example, the rest of this package contains many such simulations), here we restrict ourselves to a small study to illustrate the effects of censoring (as the censoring threshold $\tau$ varies) and null-proportion adaptivity.

In particular, we consider the following example of a simple conditional two-groups model (for two different values of $\pi_0 \in \{0.7, 0.9\})$ :

$\begin{aligned} &X_i \sim U[0,2.5]\\ &H_i \mid X_i \sim \text{Bernoulli}(\pi_0)\\ &Z_i \mid (H_i = 0, X_i) \sim \mathcal{N}(0,1)\\ &Z_i \mid (H_i = 1, X_i) \sim \mathcal{N}(X_i,1)\\ &P_i = 1 - \Phi(X_i) \end{aligned}$ We compare the following approaches, using the nominal level $\alpha=0.1$ , $P_i$ as the p-values and $X_i$ as the covariates (except for BH which does not account for covariates):

BH (Benjamini and Hochberg 1995)
IHW
IHW with Storey $\pi_0$ correction ( $\tau'=0.5$ )
IHWc with different censoring thresholds $\tau$
IHWc with Storey $\pi_0$ correction ( $\tau'=0.5$ ) and different censoring thresholds $\tau$ .

The censored IHW (IHWc) variant was implemented as follows:

We censor the p-values, i.e. we set all p-values below $\tau$ to 0.
As in classical IHW, we discretize the covariate into a finite number of strata (20) and also randomly split the hypotheses into 5 folds.
Within each combination of stratum and fold, we fit the Beta-Uniform model with the EM-method of (Markitsis and Lai 2010), which can take censoring into account.
For each censored p-value, we draw a new p-value from its fitted Beta-Uniform model conditionally on it being $\leq \tau$ , i.e. we impute all censored p-values.
Then we use regular IHW to learn the weight functions.
We use weighted BH after cross-weighting the original (i.e. non-censored/non-imputed) p-values, making sure we do not reject any p-value $\leq \tau$ .

For the exact details of the simulation and also to reproduce it, see the folder “inst/theory_paper/ihwc_wasserman_normal_simulation.R” of this package.

2 Simulation results

library("ggplot2")
library("grid")
library("dplyr")
library("tidyr")
library("cowplot")
library("IHWpaper")

# color from https://github.com/dill/beyonce 
# beyonce::beyonce_palette(30,5)
colors <- c("#040320", "#352140", "#871951", "#EB4A60", "#CFAB7A")

sim_folder <- system.file("simulation_benchmarks/theory_paper",
                              package = "IHWpaper")
sim_df <- readRDS(file.path(sim_folder, "ihwc_wasserman_normal_sim.Rds"))

taus_df <- expand.grid(fdr_pars = unique(sim_df$fdr_pars), 
                       fdr_method = c("BH","IHW","IHW-Storey"),
                       sim_pars = unique(sim_df$sim_pars)) %>%
           na.exclude() 

df1 <- left_join(taus_df, select(sim_df, -fdr_pars), by=c("fdr_method","sim_pars"))  %>%
             select(-fdr_pars.y) %>% rename(fdr_pars = fdr_pars.x)

## Warning: Detecting old grouped_df format, replacing `vars` attribute by
## `groups`

## Adding missing grouping variables: `fdr_pars`

## Warning: Column `fdr_method` joining factor and character vector, coercing
## into character vector

## Warning: Column `sim_pars` joining factor and character vector, coercing into
## character vector

df <- full_join(df1, filter(sim_df, fdr_method %in% c("IHWc_Storey","IHWc"))) %>%
      mutate( fdr_method = ifelse(fdr_method=="IHWc_Storey","IHWc-Storey", fdr_method))

## Joining, by = c("fdr_pars", "fdr_method", "sim_pars", "alpha", "FDR", "power", "rj_ratio", "FPR", "FDR_sd", "FWER", "nsuccessful", "sim_method", "m")

pi0s <- sapply(strsplit(df$sim_pars, split="[[:punct:]]"), function(x) paste0(x[[2]],".",x[[3]]))
df$pi0s <- pi0s

fdr_plot <- ggplot(df, aes(x=fdr_pars, y=FDR, col=fdr_method)) + geom_line() + 
                facet_grid(.~pi0s, labeller=label_bquote(cols=pi[0] == .(pi0s))) + 
                scale_x_log10() + 
                scale_color_manual(values=colors) +
                theme(legend.title=element_blank()) + 
                xlab(expression(tau~"(censoring threshold)")) + 
                ylab("FDR")
fdr_plot

power_plot <- ggplot(df, aes(x=fdr_pars, y=power, col=fdr_method)) + geom_line() + 
                facet_grid(.~pi0s, labeller=label_bquote(cols=pi[0] == .(pi0s))) + 
                scale_x_log10() + 
                scale_color_manual(values=colors) +
                theme(legend.title=element_blank()) + 
                xlab(expression(tau~"(censoring threshold)")) + 
                ylab("Power")
power_plot

3 Discussion of simulation results

The first of the above figures shows the FDR of the procedures as $\tau$ varies (note that BH/IHW/IHW-Storey do not depend on $\tau$ ). Formally, only IHWc and IHWc-Storey have provable finite-sample control. However, we see that all methods are below the nominal $\alpha=0.1$ . Furthermore, IHW-Storey is at essentially exactly $0.1$ , i.e. by estimating $\pi_0$ it controls FDR at exactly the nominal level. This is also the case for IHWc-Storey when it is properly tuned. IHWc and IHWc can have FDR way below $\alpha$ for improper choice of the tuning parameter $\tau$ .

The second plot shows the power of the procedures (defined as the expected proportion of true discoveries among all false null hypotheses). Of course for $\pi_0 = 0.7$ all methods have a lot more power than for $\pi_0=0.9$ . We also observe the theoretically expected trade-off for the choice of censoring threshold $\tau$ : If it is chosen way too large, then we cannot estimate the underlying conditional two-groups model well, thus the weights are suboptimal and we lose power. On the other hand, if $\tau$ is too small, then we lose a lot of power since we cannot reject the p-values below $\tau$ . BH has a lot less power compared to IHW/IHW-Storey and also IHWc/IHWc-Storey for a wide range of censoring thresholds $\tau$ .

References

Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society. Series B (Statistical Methodology). JSTOR, 289–300.

Markitsis, Anastasios, and Yinglei Lai. 2010. “A Censored Beta Mixture Model for the Estimation of the Proportion of Non-Differentially Expressed Genes.” Bioinformatics 26 (5). Oxford Univ Press:640–46.

IHW-censoring & Storey simulations

5 November 2019

Package

Contents

1 Simulation setting and methods compared

2 Simulation results

3 Discussion of simulation results

References