upSample {CNPBayes} | R Documentation |
For large datasets (several thousand subjects), the computational burden for fitting Bayesian mixture models can be high. Downsampling can reduce the computational burden with little effect on inference. The function tileMedians is useful for putting the median log R ratios for each subject in a bucket. The observations in each bucket are averaged. This is done independently for each batch and the range of median log R ratios within each bucket is guaranteed to be less than 0.05. Note this function requires specification of a batch variable. If the study was small enough such that all the samples were processed in a single batch, then downsampling would not be needed. By summarizing the observations in each bucket by batch, the SingleBatchModels (SB or SBP) and MultiBatchModels (MB or MBP) will be fit to the same data and are still comparable by marginal likelihoods or Bayes Factors.
upSample(model, tiles) tileMedians(y, nt, batch) tileSummaries(tiles) ## S4 method for signature 'MultiBatchModel' upSample(model, tiles) ## S4 method for signature 'MixtureModel' upSample(model, tiles)
model |
a SB, MB, SBP, or MBP model |
tiles |
a tibble as constructed by |
y |
vector containing data |
nt |
the number of observations per batch |
batch |
a vector containing the labels from which batch each observation came from. |
a SB, MB, SBP, or MBP model
A tibble with a tile assigned to each log R ratio
mb <- MultiBatchModelExample tiled.medians <- tileMedians(y(mb), 200, batch(mb)) tile.summaries <- tileSummaries(tiled.medians) mp <- McmcParams(iter=50, burnin=100) mb <- MultiBatchModel2(dat=tile.summaries$avgLRR, batches=tile.summaries$batch, mp=mp) mb <- posteriorSimulation(mb) ggMixture(mb) mb2 <- upSample(mb, tiled.medians) ggMixture(mb2)