.prior_flowClust1d {openCyto} | R Documentation |
We elicit data-driven prior parameters from a flowSet
object for a
specified channel. For each sample in the flowSet
object, we apply a
kernel-density estimator (KDE) and identify its local maxima (peaks).
We then aggregate these peaks to elicit a prior
parameters for each of K
mixture components.
.prior_flowClust1d(flow_set, channel, K = NULL, hclust_height = NULL, clust_method = c("kmeans", "hclust"), hclust_method = "complete", artificial = NULL, nu0 = 4, w0 = 10, adjust = 2, min = -200, max = NULL, vague = TRUE)
flow_set |
a |
channel |
the channel in the |
K |
the number of mixture components to identify. By default, this value
is |
hclust_height |
the height of the |
clust_method |
the method used to cluster peaks together when for prior
elicitation. By default, |
hclust_method |
the agglomeration method used in the hierarchical
clustering. This value is passed directly to |
artificial |
a numeric vector containing prior means for artificial
mixture components. The remaining prior parameters for the artificial
components are copied directly from the most informative prior component
elicited. If |
nu0 |
prior degrees of freedom of the Student's t mixture components. |
w0 |
the number of prior pseudocounts of the Student's t mixture components. |
adjust |
the bandwidth to use in the kernel density estimation. See
|
min |
a numeric value that sets the lower bound for data filtering. If
|
max |
a numeric value that sets the upper bound for data filtering. If
|
vague |
|
Here, we outline the approach used for prior elicitation. First, we apply a
KDE to each sample and extract all of its peaks (local maxima). It is
important to note that different samples may have a different number of
peaks. Our goal then is to align the peaks before aggregating the information
across all samples. To do this, we utilize a technique similar to the peak
probability contrasts (PPC) method from Tibshirani et al (2004). Effectively,
we apply hierarchical clustering to the peaks from all samples to find
clusters of peaks. We compute the sample mean and variance of the peaks
within each cluster to elicit the prior means and its hyperprior variance,
respectively, for a flowClust
mixture component. We elicit the
prior variance for each mixture component by first assigning the observations
within each sample to the nearest prior mean. Then, we compute the variance
of the observations within each cluster. Finally, we average the variances
corresponding to each mixture component across all samples in the
flowSet
object.
Following Tibshirani et al. (2004), we cluster the peaks from each sample
using complete-linkage hierarchical clustering. The linkage type can be
changed via the hclust_method
argument. This argument is passed
directly to hclust
.
To cluster the peaks, we must cut the hierarchical tree by selecting either a
value for K
or by providing a height of the tree to cut. By default,
we cut the tree using as the height the median of the distances between
adjacent peaks within each sample. This value can be changed via the
hclust_height
argument and, if provided, will be passed to
cutree
. Also, by default, the number of mixture components
K
is NULL
and is ignored. However, if K
is provided,
then it has priority over hclust_height
and is passed instead directly
to cutree
.
To ensure that the KDEs are smooth, we recommend that the bandwidth set in
the adjust
argument be sufficiently large. We have defaulted this
value to 2. If the bandwidth is not large enough, the KDE may contain
numerous bumps, resulting in erroneous peaks.
list of prior parameters
Tibshirani, R et al. (2004), "Sample classification from protein mass spectrometry, by 'peak probability contrasts'," Bioinformatics, 20, 17, 3034-3044. http://bioinformatics.oxfordjournals.org/content/20/17/3034.