DaMiR.FSelect {DaMiRseq} | R Documentation |
This function identifies the class-correlated principal components (PCs) which are then used to implement a backward variable elimination procedure for the removal of non informative features.
DaMiR.FSelect(data, df, th.corr = 0.6, type = c("spearman", "pearson"), th.VIP = 3, nPlsIter = 1)
data |
A transposed data frame or a matrix of normalized expression data. Rows and Cols should be, respectively, observations and features |
df |
A data frame with known variables; at least one column with 'class' label must be included |
th.corr |
Minimum threshold of correlation between class and PCs; default is 0.6 |
type |
Type of correlation metric; default is "spearman" |
th.VIP |
Threshold for |
nPlsIter |
Number of times that bve_pls has to run. Each iteration produces a set of selected features, usually similar to each other but not exacly the same! When nPlsIter is > 1, the intersection between each set of selected features is performed; so that, only the most robust features are selected. Default is 1 |
The function aims to reduce the number of features to obtain
the most informative
variables for classification purpose. First, PCs obtained by principal
component analysis (PCA)
are correlated with "class". The correlation is defined by the user in
th.corr
argument. The higher is the correlation, the lower is the number of PCs
returned.
Users should pay attention to appropriately set the th.corr
argument because
it will also affect the total number of selected features that ultimately
depend on the number of PCs.
The bve_pls
function of plsVarSel
package is, then,
applied.
This function exploits a backward variable elimination procedure coupled
to a partial least squares approach to remove those variable which are
less informative with
respect to class. The returned vector of variables is further reduced by
the following
DaMiR.FReduct
function in order to obtain a subset of
non correlated
putative predictors.
A list containing:
An expression matrix with only informative features.
A data frame with class and optional variables information.
Mattia Chiesa, Luca Piacentini
Tahir Mehmood, Kristian Hovde Liland, Lars Snipen and Solve Saebo (2011). A review of variable selection methods in Partial Least Squares Regression. Chemometrics and Intelligent Laboratory Systems 118, pp. 62-69.
# use example data: data(data_norm) data(df) # extract expression data from SummarizedExperiment object # and transpose the matrix: t_data<-t(assay(data_norm)) t_data <- t_data[,seq_len(100)] # select class-related features data_reduced <- DaMiR.FSelect(t_data, df, th.corr = 0.7, type = "spearman", th.VIP = 1)