skKMeans {BiocSklearn} | R Documentation |
interface to sklearn.cluster.KMeans using basilisk discipline
skKMeans(mat, ...)
mat |
a matrix-like datum or reference to such |
... |
arguments to sklearn.cluster.KMeans |
a list with cluster assignments (integers starting with zero) and asserted cluster centers.
You can use py_help(SklearnEls()$skcl$KMeans)
to
get python documentation on parameters and return structure.
This is a demonstrative interface to the resources of sklearn.cluster.
In this particular interface, we are using sklearn.cluster.k_means_.KMeans.
There are many other possibilities in sklearn.cluster: _dbscan_inner,
feature_agglomeration,
hierarchical,
k_means,
k_means_elkan,
affinity_propagation,
bicluster,
birch,
dbscan,
hierarchical,
k_means,
mean_shift,
setup,
spectral.
## Not run: # This blocked example shows a risky approach evading basilisk discipline and is # to be used at your own risk. # start with numpy array reference as data irloc = system.file("csv/iris.csv", package="BiocSklearn") skels = SklearnEls() irismat = skels$np$genfromtxt(irloc, delimiter=',') ans = skKMeans(irismat, n_clusters=2L) names(ans) # names of available result components table(iris$Species, ans$labels_) # now use an HDF5 reference irh5 = system.file("hdf5/irmat.h5", package="BiocSklearn") fref = skels$h5py$File(irh5) ds = fref$`__getitem__`("quants") # thanks Samuela Pollack! ans2 = skKMeans(skels$np$array(ds)$T, n_clusters=2L) # HDF5 matrix is transposed relative to python array layout! Is the np$array conversion unduly costly? table(ans$labels_, ans2$labels_) ans3 = skKMeans(skels$np$array(ds)$T, n_clusters=8L, max_iter=200L, algorithm="full", random_state=20L) ## End(Not run) dem = skKMeans(iris[,1:4], n_clusters=3L, max_iter=100L, algorithm="full", random_state=20L) str(dem) tab = table(iris$Species, dem$labels) tab plot(iris[,1], iris[,3], col=as.numeric(factor(iris$Species))) points(dem$centers[,1], dem$centers[,3], pch=19, col=apply(tab,2,which.max))