TileDBArray 1.14.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.2245533 3.3982749 -0.4767668 . 1.31737733 -0.64929472
## [2,] 0.5388409 0.7858693 -0.7635974 . -0.41601281 -0.70212208
## [3,] -2.3008874 -0.8752345 0.0161724 . 0.08831344 0.03899068
## [4,] 0.3395164 -0.3047107 1.7153126 . -0.90008163 -0.55188032
## [5,] -1.0849716 -0.5851945 0.4255562 . 0.94291946 0.09697986
## ... . . . . . .
## [96,] -1.73417132 -0.40809825 -0.13162415 . -0.59106027 0.06529277
## [97,] -1.49898008 -1.12567295 -0.24058434 . -1.15869364 0.62179628
## [98,] -0.24970471 -2.18096607 1.55440330 . 0.35336278 1.15895689
## [99,] -1.05605037 1.16053607 0.62008660 . 1.06748727 0.06991999
## [100,] -1.15530108 -0.10971338 -0.07221385 . -1.14217863 -0.07769031
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.2245533 3.3982749 -0.4767668 . 1.31737733 -0.64929472
## [2,] 0.5388409 0.7858693 -0.7635974 . -0.41601281 -0.70212208
## [3,] -2.3008874 -0.8752345 0.0161724 . 0.08831344 0.03899068
## [4,] 0.3395164 -0.3047107 1.7153126 . -0.90008163 -0.55188032
## [5,] -1.0849716 -0.5851945 0.4255562 . 0.94291946 0.09697986
## ... . . . . . .
## [96,] -1.73417132 -0.40809825 -0.13162415 . -0.59106027 0.06529277
## [97,] -1.49898008 -1.12567295 -0.24058434 . -1.15869364 0.62179628
## [98,] -0.24970471 -2.18096607 1.55440330 . 0.35336278 1.15895689
## [99,] -1.05605037 1.16053607 0.62008660 . 1.06748727 0.06991999
## [100,] -1.15530108 -0.10971338 -0.07221385 . -1.14217863 -0.07769031
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.2245533 3.3982749 -0.4767668 . 1.31737733 -0.64929472
## GENE_2 0.5388409 0.7858693 -0.7635974 . -0.41601281 -0.70212208
## GENE_3 -2.3008874 -0.8752345 0.0161724 . 0.08831344 0.03899068
## GENE_4 0.3395164 -0.3047107 1.7153126 . -0.90008163 -0.55188032
## GENE_5 -1.0849716 -0.5851945 0.4255562 . 0.94291946 0.09697986
## ... . . . . . .
## GENE_96 -1.73417132 -0.40809825 -0.13162415 . -0.59106027 0.06529277
## GENE_97 -1.49898008 -1.12567295 -0.24058434 . -1.15869364 0.62179628
## GENE_98 -0.24970471 -2.18096607 1.55440330 . 0.35336278 1.15895689
## GENE_99 -1.05605037 1.16053607 0.62008660 . 1.06748727 0.06991999
## GENE_100 -1.15530108 -0.10971338 -0.07221385 . -1.14217863 -0.07769031
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.2245533 0.5388409 -2.3008874 0.3395164 -1.0849716 0.1911018
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.22455329 3.39827488 -0.47676685 -0.08100085 2.52155471
## GENE_2 0.53884094 0.78586933 -0.76359744 0.48013142 0.53182864
## GENE_3 -2.30088741 -0.87523453 0.01617240 0.43977291 -0.91609543
## GENE_4 0.33951642 -0.30471075 1.71531255 -0.91573012 1.50344719
## GENE_5 -1.08497162 -0.58519455 0.42555615 0.46471113 -0.37692090
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.44910657 6.79654976 -0.95353370 . 2.63475467 -1.29858944
## GENE_2 1.07768187 1.57173866 -1.52719487 . -0.83202563 -1.40424417
## GENE_3 -4.60177481 -1.75046906 0.03234481 . 0.17662688 0.07798137
## GENE_4 0.67903283 -0.60942150 3.43062510 . -1.80016327 -1.10376065
## GENE_5 -2.16994324 -1.17038909 0.85111231 . 1.88583893 0.19395972
## ... . . . . . .
## GENE_96 -3.4683426 -0.8161965 -0.2632483 . -1.1821205 0.1305855
## GENE_97 -2.9979602 -2.2513459 -0.4811687 . -2.3173873 1.2435926
## GENE_98 -0.4994094 -4.3619321 3.1088066 . 0.7067256 2.3179138
## GENE_99 -2.1121007 2.3210721 1.2401732 . 2.1349745 0.1398400
## GENE_100 -2.3106022 -0.2194268 -0.1444277 . -2.2843573 -0.1553806
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7 SAMP_8
## -1.815403 -4.475513 4.588793 1.306531 5.368829 5.944767 4.398826 -6.303318
## SAMP_9 SAMP_10
## 18.657668 -4.839989
out %*% runif(ncol(out))
## [,1]
## GENE_1 0.8688192963
## GENE_2 -0.1898830770
## GENE_3 -1.7908490606
## GENE_4 -0.3716491637
## GENE_5 -1.6721858040
## GENE_6 0.0006166954
## GENE_7 -2.8416608017
## GENE_8 1.0766219488
## GENE_9 1.2179346916
## GENE_10 1.3430648662
## GENE_11 2.7331651359
## GENE_12 0.2592691785
## GENE_13 -1.5086141130
## GENE_14 -1.5641968739
## GENE_15 -0.1175903802
## GENE_16 5.1072565623
## GENE_17 -0.5916499421
## GENE_18 0.9798555530
## GENE_19 1.0063625397
## GENE_20 3.8200735848
## GENE_21 0.7596058352
## GENE_22 1.2439378061
## GENE_23 0.6949814138
## GENE_24 -2.3441690314
## GENE_25 -1.3792303207
## GENE_26 0.2414108982
## GENE_27 1.1504281769
## GENE_28 -0.4889754694
## GENE_29 -1.9452986175
## GENE_30 2.7997621312
## GENE_31 -0.3299243026
## GENE_32 -0.9362742859
## GENE_33 4.0040594701
## GENE_34 -0.4627129412
## GENE_35 -2.4487002553
## GENE_36 -2.0876673302
## GENE_37 0.3535499116
## GENE_38 -0.9406849682
## GENE_39 0.7384988031
## GENE_40 0.1689928945
## GENE_41 1.5657763638
## GENE_42 1.4605836999
## GENE_43 0.6607338874
## GENE_44 -1.1846469600
## GENE_45 -1.0858536139
## GENE_46 -4.2713501226
## GENE_47 -2.0644089768
## GENE_48 0.1864473868
## GENE_49 1.7415726712
## GENE_50 -2.4955078820
## GENE_51 1.2425956118
## GENE_52 -0.9989060588
## GENE_53 -0.9493043489
## GENE_54 -0.0341353362
## GENE_55 -1.2153121174
## GENE_56 0.5708281912
## GENE_57 2.4687255728
## GENE_58 -1.0034725837
## GENE_59 3.5295035644
## GENE_60 1.1112049784
## GENE_61 -2.0486982315
## GENE_62 -1.1113539974
## GENE_63 -0.9653691143
## GENE_64 1.9506894869
## GENE_65 -2.8161606026
## GENE_66 2.7626664631
## GENE_67 -0.1004561565
## GENE_68 0.9474568207
## GENE_69 2.0952155640
## GENE_70 0.3518496193
## GENE_71 1.8798068099
## GENE_72 0.0638856141
## GENE_73 1.0780263827
## GENE_74 0.5103160276
## GENE_75 2.7945361529
## GENE_76 -0.7909069005
## GENE_77 -0.2441733194
## GENE_78 -2.2566761743
## GENE_79 -0.8163043088
## GENE_80 0.6471854737
## GENE_81 -1.7621690881
## GENE_82 0.1639373722
## GENE_83 1.0493489619
## GENE_84 1.5176873594
## GENE_85 2.0274293145
## GENE_86 -2.0054388370
## GENE_87 -1.0040332046
## GENE_88 0.0720300281
## GENE_89 -0.7255565604
## GENE_90 0.8549762616
## GENE_91 -1.0893451042
## GENE_92 1.7687251721
## GENE_93 2.1639017934
## GENE_94 -1.4922864284
## GENE_95 0.8969532511
## GENE_96 -0.7489882689
## GENE_97 -1.2962150294
## GENE_98 1.7431971412
## GENE_99 0.2346769776
## GENE_100 -1.8347399503
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.1481521 -0.7368929 0.6111708 . 0.325901791 -0.945709865
## [2,] 0.0372808 -2.7426956 -0.3007063 . -0.250381016 -0.904252374
## [3,] -0.8818251 -0.4202735 -1.2364513 . 0.863748893 -0.105099686
## [4,] -0.4459568 -0.6416005 1.1553420 . -0.125705801 0.008882491
## [5,] 1.6799410 0.9850412 -2.0455321 . -1.732964894 -0.531972993
## ... . . . . . .
## [96,] -1.5257753 -0.4027286 1.1449621 . 2.0164418 -0.8116160
## [97,] -0.3012900 -1.2943056 -0.4378370 . -0.2811415 -0.4007702
## [98,] 0.4073845 1.3089640 0.3290174 . 2.0203369 0.9065849
## [99,] 0.1462974 -0.2962213 1.4124186 . 1.1702711 0.2736539
## [100,] -0.2723442 2.3863290 1.3683312 . 0.9597713 -0.9383583
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.1481521 -0.7368929 0.6111708 . 0.325901791 -0.945709865
## [2,] 0.0372808 -2.7426956 -0.3007063 . -0.250381016 -0.904252374
## [3,] -0.8818251 -0.4202735 -1.2364513 . 0.863748893 -0.105099686
## [4,] -0.4459568 -0.6416005 1.1553420 . -0.125705801 0.008882491
## [5,] 1.6799410 0.9850412 -2.0455321 . -1.732964894 -0.531972993
## ... . . . . . .
## [96,] -1.5257753 -0.4027286 1.1449621 . 2.0164418 -0.8116160
## [97,] -0.3012900 -1.2943056 -0.4378370 . -0.2811415 -0.4007702
## [98,] 0.4073845 1.3089640 0.3290174 . 2.0203369 0.9065849
## [99,] 0.1462974 -0.2962213 1.4124186 . 1.1702711 0.2736539
## [100,] -0.2723442 2.3863290 1.3683312 . 0.9597713 -0.9383583
sessionInfo()
## R version 4.4.0 beta (2024-04-15 r86425 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows Server 2022 x64 (build 20348)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.17 TileDBArray_1.14.0 DelayedArray_0.30.0
## [4] SparseArray_1.4.0 S4Arrays_1.4.0 abind_1.4-5
## [7] IRanges_2.38.0 S4Vectors_0.42.0 MatrixGenerics_1.16.0
## [10] matrixStats_1.3.0 BiocGenerics_0.50.0 Matrix_1.7-0
## [13] BiocStyle_2.32.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.0.5 jsonlite_1.8.8 compiler_4.4.0
## [4] BiocManager_1.30.22 crayon_1.5.2 Rcpp_1.0.12
## [7] nanoarrow_0.4.0.1 jquerylib_0.1.4 yaml_2.3.8
## [10] fastmap_1.1.1 lattice_0.22-6 R6_2.5.1
## [13] RcppCCTZ_0.2.12 XVector_0.44.0 tiledb_0.26.0
## [16] knitr_1.46 bookdown_0.39 bslib_0.7.0
## [19] rlang_1.1.3 cachem_1.0.8 xfun_0.43
## [22] sass_0.4.9 bit64_4.0.5 cli_3.6.2
## [25] zlibbioc_1.50.0 spdl_0.0.5 digest_0.6.35
## [28] grid_4.4.0 lifecycle_1.0.4 data.table_1.15.4
## [31] evaluate_0.23 nanotime_0.3.7 zoo_1.8-12
## [34] rmarkdown_2.26 tools_4.4.0 htmltools_0.5.8.1