BiocNeighbors 1.18.0
Another application of the KMKNN or VP tree algorithms is to identify all neighboring points within a certain distance1 The default here is Euclidean, but again, we can set distance="Manhattan"
in the BNPARAM
object if so desired. of the current point.
We first mock up some data:
nobs <- 10000
ndim <- 20
data <- matrix(runif(nobs*ndim), ncol=ndim)
We apply the findNeighbors()
function to data
:
fout <- findNeighbors(data, threshold=1)
head(fout$index)
## [[1]]
## [1] 8858 9358 9130 8420 31 1 3750 6487 4818 7327 6041 4153 3489 7689
##
## [[2]]
## [1] 8767 2030 9066 731 2 9521 2895 8189
##
## [[3]]
## [1] 6923 127 7562 9453 2350 3 425 7667 3491 2670 7094 3430 3354 9747 1973
## [16] 3392 1564 9201 6013 6833 3233 4857 8230 5071 9524 4195 7785 8506 253 3437
## [31] 3465 2102 5239
##
## [[4]]
## [1] 4 3167 5757
##
## [[5]]
## [1] 5
##
## [[6]]
## [1] 5589 1199 8223 8001 9281 6
head(fout$distance)
## [[1]]
## [1] 0.9942257 0.8791539 0.9305256 0.9841900 0.8169466 0.0000000 0.9331852
## [8] 0.9493077 0.8917474 0.9483454 0.9564943 0.9743096 0.8555653 0.8494570
##
## [[2]]
## [1] 0.9548969 0.9586163 0.9150596 0.9576625 0.0000000 0.9501475 0.9806113
## [8] 0.8480450
##
## [[3]]
## [1] 0.9558596 0.8863850 0.9948984 0.9448565 0.8580069 0.0000000 0.9735133
## [8] 0.8381879 0.9594610 0.9369485 0.9908432 0.9106234 0.9688437 0.8544111
## [15] 0.9168398 0.9812346 0.8411358 0.9297719 0.8405916 0.9300225 0.8865665
## [22] 0.9936256 0.9531833 0.9221648 0.8490360 0.9591853 0.9759956 0.9621819
## [29] 0.9892061 0.9626584 0.9288345 0.8770389 0.9150011
##
## [[4]]
## [1] 0.0000000 0.7935779 0.9971574
##
## [[5]]
## [1] 0
##
## [[6]]
## [1] 0.9081928 0.9685665 0.9601649 0.9757782 0.8661774 0.0000000
Each entry of the index
list corresponds to a point in data
and contains the row indices in data
that are within threshold
.
For example, the 3rd point in data
has the following neighbors:
fout$index[[3]]
## [1] 6923 127 7562 9453 2350 3 425 7667 3491 2670 7094 3430 3354 9747 1973
## [16] 3392 1564 9201 6013 6833 3233 4857 8230 5071 9524 4195 7785 8506 253 3437
## [31] 3465 2102 5239
… with the following distances to those neighbors:
fout$distance[[3]]
## [1] 0.9558596 0.8863850 0.9948984 0.9448565 0.8580069 0.0000000 0.9735133
## [8] 0.8381879 0.9594610 0.9369485 0.9908432 0.9106234 0.9688437 0.8544111
## [15] 0.9168398 0.9812346 0.8411358 0.9297719 0.8405916 0.9300225 0.8865665
## [22] 0.9936256 0.9531833 0.9221648 0.8490360 0.9591853 0.9759956 0.9621819
## [29] 0.9892061 0.9626584 0.9288345 0.8770389 0.9150011
Note that, for this function, the reported neighbors are not sorted by distance. The order of the output is completely arbitrary and will vary depending on the random seed. However, the identity of the neighbors is fully deterministic.
The queryNeighbors()
function is also provided for identifying all points within a certain distance of a query point.
Given a query data set:
nquery <- 1000
ndim <- 20
query <- matrix(runif(nquery*ndim), ncol=ndim)
… we apply the queryNeighbors()
function:
qout <- queryNeighbors(data, query, threshold=1)
length(qout$index)
## [1] 1000
… where each entry of qout$index
corresponds to a row of query
and contains its neighbors in data
.
Again, the order of the output is arbitrary but the identity of the neighbors is deterministic.
Most of the options described for findKNN()
are also applicable here.
For example:
subset
to identify neighbors for a subset of points.get.distance
to avoid retrieving distances when unnecessary.BPPARAM
to parallelize the calculations across multiple workers.raw.index
to return the raw indices from a precomputed index.Note that the argument for a precomputed index is precomputed
:
pre <- buildIndex(data, BNPARAM=KmknnParam())
fout.pre <- findNeighbors(BNINDEX=pre, threshold=1)
qout.pre <- queryNeighbors(BNINDEX=pre, query=query, threshold=1)
Users are referred to the documentation of each function for specific details.
sessionInfo()
## R version 4.3.0 RC (2023-04-13 r84269)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] BiocParallel_1.34.0 BiocNeighbors_1.18.0 knitr_1.42
## [4] BiocStyle_2.28.0
##
## loaded via a namespace (and not attached):
## [1] cli_3.6.1 rlang_1.1.0 xfun_0.39
## [4] jsonlite_1.8.4 S4Vectors_0.38.0 htmltools_0.5.5
## [7] stats4_4.3.0 sass_0.4.5 rmarkdown_2.21
## [10] grid_4.3.0 evaluate_0.20 jquerylib_0.1.4
## [13] fastmap_1.1.1 yaml_2.3.7 bookdown_0.33
## [16] BiocManager_1.30.20 compiler_4.3.0 codetools_0.2-19
## [19] Rcpp_1.0.10 lattice_0.21-8 digest_0.6.31
## [22] R6_2.5.1 parallel_4.3.0 bslib_0.4.2
## [25] Matrix_1.5-4 tools_4.3.0 BiocGenerics_0.46.0
## [28] cachem_1.0.7