dAllocate {DepecheR} | R Documentation |
Here, observations of a dataset are allocated to a set of preestablished cluster centers. This is intended to be used for the test set in train-test dataset situations.
dAllocate(inDataFrame, depModel)
inDataFrame |
A dataset that should be allocated to a set of cluster centers, for example a richer, but less representative dataset, with all datapoints from all donors, instead of only a set number of values from all. |
depModel |
This is the result of the original application of the depeche function on the associated, more representative dataset. |
A vector with the same length as number of rows in the inDataFrame, where the cluster identity of each observation is noted.
# Retrieve some example data data(testData) ## Not run: # Now arbitrarily (for the sake of the example) divide the data into a # training- and a test set. testDataSample <- sample(1:nrow(testData), size = 10000) testDataTrain <- testData[testDataSample, ] testDataTest <- testData[-testDataSample, ] # Run the depeche function for the train set depeche_train <- depeche(testDataTrain[, 2:15], maxIter = 20, sampleSize = 1000 ) # Allocate the test dataset to the centers of the train dataset depeche_test <- dAllocate(testDataTest[, 2:15], depeche_train ) # And finally plot the two groups to see how great the overlap was: clustVecList <- list(list("Ids" =testDataTrain$ids, "Clusters" = depeche_train$clusterVector), list("Ids" =testDataTest$ids, "Clusters" = depeche_test)) tablePerId <- do.call("rbind", lapply(seq_along(clustVecList), function(x){ locDat <- clustVecList[[x]] locRes <- apply(as.matrix(table( locDat$Ids, locDat$Clusters)), 1, function(y) y/sum(y)) locResLong <- reshape2::melt(locRes) colnames(locResLong) <- c("Cluster", "Donor", "Fraction") locResLong$Group <- x locResLong })) tablePerId$Cluster <- as.factor(tablePerId$Cluster) tablePerId$Group <- as.factor(tablePerId$Group) library(ggplot2) ggplot(data=tablePerId, aes(x=Cluster, y=Fraction, fill=Group)) + geom_boxplot() + theme_bw() ## End(Not run)