optimize.sd_selection {BioTIP} | R Documentation |
The optimize.sd_selection
filters a multi-state dataset
based on a cutoff value for standard deviation per state and optimizes.
By default, a cutoff value of 0.01 is used. Suggested if each state contains more than 10 samples.
optimize.sd_selection( df, samplesL, B = 100, percent = 0.8, times = 0.8, cutoff = 0.01, method = c("other", "reference", "previous", "itself", "longitudinal reference"), control_df = NULL, control_samplesL = NULL )
df |
A dataframe of numerics. The rows and columns represent unique transcript IDs (geneID) and sample names, respectively. |
samplesL |
A list of n vectors, where n equals to the number of states. Each vector gives the sample names in a state. Note that the vectors (sample names) has to be among the column names of the R object 'df'. |
B |
An integer indicating number of times to run this optimization, default 1000. |
percent |
A numeric value indicating the percentage of samples will be selected in each round of simulation. |
times |
A numeric value indicating the percentage of |
cutoff |
A positive numeric value. Default is 0.01. If < 1, automatically
goes to select top x percentage transcripts using the a selecting method (which is
either the |
method |
Selection of methods from
|
control_df |
A count matrix with unique loci as row names and samples names
of control samples as column names, only used for method |
control_samplesL |
A list of characters with stages as names of control samples, required for method 'longitudinal reference'. |
A list of dataframe of filtered transcripts with the highest standard
deviation are selected from df
based on a cutoff value assigned. The
resulting dataframe represents a subset of the raw input df
.
Zhezhen Wang zhezhen@uchicago.edu
counts = matrix(sample(1:100, 30), 2, 30) colnames(counts) = 1:30 row.names(counts) = paste0('loci', 1:2) cli = cbind(1:30, rep(c('state1', 'state2', 'state3'), each = 10)) colnames(cli) = c('samples', 'group') samplesL <- split(cli[, 1], f = cli[, 'group']) test_sd_selection <- optimize.sd_selection(counts, samplesL, B = 3, cutoff =0.01)