TileDBArray 1.15.1
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.26519148 0.27685001 0.52538785 . 0.6456617 0.6000798
## [2,] -1.05841804 0.12044974 0.14446773 . -0.5170676 1.5843966
## [3,] 0.16383479 -0.76019113 0.93658852 . -0.5174823 -1.4866183
## [4,] 0.48436319 -0.97025555 0.05844534 . -1.5484419 0.1370955
## [5,] 1.32045022 -0.48898061 -0.64330710 . 0.1110501 0.7032928
## ... . . . . . .
## [96,] 0.09189449 -0.65666702 -0.33371106 . 1.1025217 -0.8173443
## [97,] -0.47135014 -0.65877525 1.45952043 . -0.3138286 1.8284152
## [98,] 0.23515305 -0.10341025 -0.56998933 . -0.2557391 -0.7927358
## [99,] 0.15752575 -0.68511595 0.72677378 . -1.8671554 0.8908641
## [100,] -1.05808021 0.30858283 0.35268412 . -0.8319163 1.9757953
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.26519148 0.27685001 0.52538785 . 0.6456617 0.6000798
## [2,] -1.05841804 0.12044974 0.14446773 . -0.5170676 1.5843966
## [3,] 0.16383479 -0.76019113 0.93658852 . -0.5174823 -1.4866183
## [4,] 0.48436319 -0.97025555 0.05844534 . -1.5484419 0.1370955
## [5,] 1.32045022 -0.48898061 -0.64330710 . 0.1110501 0.7032928
## ... . . . . . .
## [96,] 0.09189449 -0.65666702 -0.33371106 . 1.1025217 -0.8173443
## [97,] -0.47135014 -0.65877525 1.45952043 . -0.3138286 1.8284152
## [98,] 0.23515305 -0.10341025 -0.56998933 . -0.2557391 -0.7927358
## [99,] 0.15752575 -0.68511595 0.72677378 . -1.8671554 0.8908641
## [100,] -1.05808021 0.30858283 0.35268412 . -0.8319163 1.9757953
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0.00 0.00
## [997,] 0 0 0 . 0.62 0.00
## [998,] 0 0 0 . 0.00 0.00
## [999,] 0 0 0 . 0.00 0.00
## [1000,] 0 0 0 . 0.00 0.00
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . TRUE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.26519148 0.27685001 0.52538785 . 0.6456617 0.6000798
## GENE_2 -1.05841804 0.12044974 0.14446773 . -0.5170676 1.5843966
## GENE_3 0.16383479 -0.76019113 0.93658852 . -0.5174823 -1.4866183
## GENE_4 0.48436319 -0.97025555 0.05844534 . -1.5484419 0.1370955
## GENE_5 1.32045022 -0.48898061 -0.64330710 . 0.1110501 0.7032928
## ... . . . . . .
## GENE_96 0.09189449 -0.65666702 -0.33371106 . 1.1025217 -0.8173443
## GENE_97 -0.47135014 -0.65877525 1.45952043 . -0.3138286 1.8284152
## GENE_98 0.23515305 -0.10341025 -0.56998933 . -0.2557391 -0.7927358
## GENE_99 0.15752575 -0.68511595 0.72677378 . -1.8671554 0.8908641
## GENE_100 -1.05808021 0.30858283 0.35268412 . -0.8319163 1.9757953
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.2651915 -1.0584180 0.1638348 0.4843632 1.3204502 1.0890060
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.26519148 0.27685001 0.52538785 1.23622270 0.67567813
## GENE_2 -1.05841804 0.12044974 0.14446773 -0.17723756 0.82049939
## GENE_3 0.16383479 -0.76019113 0.93658852 1.32964687 1.24683844
## GENE_4 0.48436319 -0.97025555 0.05844534 -0.50633785 0.75250035
## GENE_5 1.32045022 -0.48898061 -0.64330710 -0.79663551 -1.08134539
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.5303830 0.5537000 1.0507757 . 1.2913234 1.2001597
## GENE_2 -2.1168361 0.2408995 0.2889355 . -1.0341353 3.1687932
## GENE_3 0.3276696 -1.5203823 1.8731770 . -1.0349647 -2.9732366
## GENE_4 0.9687264 -1.9405111 0.1168907 . -3.0968838 0.2741909
## GENE_5 2.6409004 -0.9779612 -1.2866142 . 0.2221001 1.4065856
## ... . . . . . .
## GENE_96 0.1837890 -1.3133340 -0.6674221 . 2.2050434 -1.6346887
## GENE_97 -0.9427003 -1.3175505 2.9190409 . -0.6276572 3.6568304
## GENE_98 0.4703061 -0.2068205 -1.1399787 . -0.5114782 -1.5854716
## GENE_99 0.3150515 -1.3702319 1.4535476 . -3.7343108 1.7817283
## GENE_100 -2.1161604 0.6171657 0.7053682 . -1.6638327 3.9515906
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7 SAMP_8
## 4.819871 10.522706 -3.956068 -6.898352 5.258639 4.220472 3.791705 -7.821431
## SAMP_9 SAMP_10
## -6.550939 2.344161
out %*% runif(ncol(out))
## [,1]
## GENE_1 3.288666139
## GENE_2 0.254744345
## GENE_3 -0.164266991
## GENE_4 -4.087662079
## GENE_5 -1.893139026
## GENE_6 -1.698235852
## GENE_7 1.455868109
## GENE_8 -0.494683250
## GENE_9 1.820065087
## GENE_10 0.629884878
## GENE_11 0.362327116
## GENE_12 -0.163045423
## GENE_13 -0.081818121
## GENE_14 1.414293887
## GENE_15 0.767545165
## GENE_16 1.762497719
## GENE_17 1.128085802
## GENE_18 -1.807172637
## GENE_19 -1.250762147
## GENE_20 2.426849928
## GENE_21 0.541626925
## GENE_22 -0.203375690
## GENE_23 3.534471631
## GENE_24 0.823928727
## GENE_25 -1.683473733
## GENE_26 0.193343954
## GENE_27 -1.743013032
## GENE_28 0.614488523
## GENE_29 -2.400367439
## GENE_30 0.501694230
## GENE_31 -0.068208238
## GENE_32 -0.294624711
## GENE_33 0.256911174
## GENE_34 -1.392229207
## GENE_35 -1.599630389
## GENE_36 -0.464061373
## GENE_37 0.344727742
## GENE_38 -0.519202790
## GENE_39 -0.800071881
## GENE_40 0.106575765
## GENE_41 -0.411411974
## GENE_42 1.571471658
## GENE_43 0.556682215
## GENE_44 3.059243384
## GENE_45 -0.172752621
## GENE_46 -3.530150146
## GENE_47 4.375921552
## GENE_48 -1.766503217
## GENE_49 -4.809364035
## GENE_50 -0.895783093
## GENE_51 0.613670069
## GENE_52 1.737947254
## GENE_53 -1.451178099
## GENE_54 -1.195437707
## GENE_55 2.181853069
## GENE_56 3.359036554
## GENE_57 -1.926517906
## GENE_58 -1.763785874
## GENE_59 0.490548241
## GENE_60 0.017332672
## GENE_61 2.534315023
## GENE_62 1.246616386
## GENE_63 1.363242515
## GENE_64 4.078715330
## GENE_65 -2.704178265
## GENE_66 -0.603127716
## GENE_67 0.664226196
## GENE_68 -0.445124006
## GENE_69 -5.564009714
## GENE_70 -1.959996949
## GENE_71 -0.435317653
## GENE_72 1.342505832
## GENE_73 0.364627541
## GENE_74 0.648644083
## GENE_75 2.298638718
## GENE_76 1.845317067
## GENE_77 0.703129154
## GENE_78 -1.961138719
## GENE_79 0.078023056
## GENE_80 0.481100329
## GENE_81 1.562452154
## GENE_82 -0.360863376
## GENE_83 2.275491742
## GENE_84 1.058574702
## GENE_85 -0.549675839
## GENE_86 -1.317658463
## GENE_87 -1.568615816
## GENE_88 2.102890713
## GENE_89 -2.173014778
## GENE_90 0.007307893
## GENE_91 0.303697190
## GENE_92 0.459935359
## GENE_93 -3.010334688
## GENE_94 1.976000241
## GENE_95 0.747452964
## GENE_96 -0.945098327
## GENE_97 -0.410157229
## GENE_98 -0.899173025
## GENE_99 -0.978194106
## GENE_100 1.131220350
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.7335479 -0.4430717 2.4085432 . -0.5064213 -0.9446351
## [2,] -0.1536984 0.3903943 0.6503098 . 0.5177662 0.0268979
## [3,] 1.3056942 -1.4229694 -1.0225693 . 0.1568990 -0.9689682
## [4,] 0.2960689 -1.8457948 1.2167076 . -0.3045679 -0.3684436
## [5,] -0.1761430 0.5852646 -0.7990274 . -1.3822230 1.6802569
## ... . . . . . .
## [96,] 0.28490815 -0.03353880 0.25061874 . 0.90214093 -0.65275733
## [97,] -0.06698369 0.84000785 0.31761169 . 1.53862293 -0.34885141
## [98,] 0.99598801 1.02097654 -1.14645699 . -0.12170148 -0.19210220
## [99,] 1.13149995 -0.51986392 -0.10813220 . 0.14837376 1.31205819
## [100,] -0.20144229 -1.06870947 -0.24955879 . 0.97613212 0.01759711
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.7335479 -0.4430717 2.4085432 . -0.5064213 -0.9446351
## [2,] -0.1536984 0.3903943 0.6503098 . 0.5177662 0.0268979
## [3,] 1.3056942 -1.4229694 -1.0225693 . 0.1568990 -0.9689682
## [4,] 0.2960689 -1.8457948 1.2167076 . -0.3045679 -0.3684436
## [5,] -0.1761430 0.5852646 -0.7990274 . -1.3822230 1.6802569
## ... . . . . . .
## [96,] 0.28490815 -0.03353880 0.25061874 . 0.90214093 -0.65275733
## [97,] -0.06698369 0.84000785 0.31761169 . 1.53862293 -0.34885141
## [98,] 0.99598801 1.02097654 -1.14645699 . -0.12170148 -0.19210220
## [99,] 1.13149995 -0.51986392 -0.10813220 . 0.14837376 1.31205819
## [100,] -0.20144229 -1.06870947 -0.24955879 . 0.97613212 0.01759711
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.17 TileDBArray_1.15.1 DelayedArray_0.31.8
## [4] SparseArray_1.5.21 S4Arrays_1.5.4 IRanges_2.39.1
## [7] abind_1.4-5 S4Vectors_0.43.1 MatrixGenerics_1.17.0
## [10] matrixStats_1.3.0 BiocGenerics_0.51.0 Matrix_1.7-0
## [13] BiocStyle_2.33.1
##
## loaded via a namespace (and not attached):
## [1] bit_4.0.5 jsonlite_1.8.8 compiler_4.4.1
## [4] BiocManager_1.30.23 crayon_1.5.3 Rcpp_1.0.12
## [7] nanoarrow_0.5.0.1 jquerylib_0.1.4 yaml_2.3.9
## [10] fastmap_1.2.0 lattice_0.22-6 R6_2.5.1
## [13] RcppCCTZ_0.2.12 XVector_0.45.0 tiledb_0.28.0
## [16] knitr_1.48 bookdown_0.40 bslib_0.7.0
## [19] rlang_1.1.4 cachem_1.1.0 xfun_0.45
## [22] sass_0.4.9 bit64_4.0.5 cli_3.6.3
## [25] zlibbioc_1.51.1 spdl_0.0.5 digest_0.6.36
## [28] grid_4.4.1 lifecycle_1.0.4 data.table_1.15.4
## [31] evaluate_0.24.0 nanotime_0.3.9 zoo_1.8-12
## [34] rmarkdown_2.27 tools_4.4.1 htmltools_0.5.8.1