saveHDF5SummarizedExperiment {HDF5Array} | R Documentation |
saveHDF5SummarizedExperiment
and
loadHDF5SummarizedExperiment
can be used to save/load an HDF5-based
SummarizedExperiment object to/from disk.
saveHDF5SummarizedExperiment(x, dir="my_h5_se", replace=FALSE, chunkdim=NULL, level=NULL, verbose=FALSE) loadHDF5SummarizedExperiment(dir="my_h5_se")
x |
A SummarizedExperiment object. |
dir |
The path (as a single string) to the directory where to save the
HDF5-based SummarizedExperiment object
or to load it from.
When saving, the directory will be created so should not already exist,
unless |
replace |
If directory |
chunkdim, level |
The dimensions of the chunks and the compression level to use for
writing the assay data to disk.
Passed to the internal calls to |
verbose |
Set to |
These functions use functionalities from the SummarizedExperiment package internally and so require this package to be installed.
saveHDF5SummarizedExperiment
creates the directory specified
thru the dir
argument and then populates it with the HDF5 datasets
(one per assay in x
) plus a serialized version of x
that
contains pointers to these datasets. This directory provides a
self-contained HDF5-based representation of x
that can then
be loaded back in R with loadHDF5SummarizedExperiment
.
Note that this directory is relocatable i.e. it can be moved
(or copied) to a different place, on the same or a different computer,
before calling loadHDF5SummarizedExperiment
on it. For convenient
sharing with collaborators, it is suggested to turn it into a tarball
(with Unix command tar
), or zip file, before the transfer.
Please keep in mind that saveHDF5SummarizedExperiment
and
loadHDF5SummarizedExperiment
don't know how to produce/read
tarballs or zip files at the moment, so the process of packaging/extracting
the tarball or zip file is entirely the user responsibility. It is
typically done from outside R.
Finally please note that, depending on the size of the data to write to
disk and the performance of the disk, saveHDF5SummarizedExperiment
can take a long time to complete. Use verbose=TRUE
to see its
progress.
loadHDF5SummarizedExperiment
is generally very fast, even if
the assay data is big, because all the assays in the returned object
are HDF5Array objects pointing to the on-disk HDF5 datasets
located in dir
. HDF5Array objects are typically
light-weight in memory.
saveHDF5SummarizedExperiment
returns an invisible
SummarizedExperiment object where all
the assays are HDF5Array objects pointing to the HDF5 datasets
saved in dir
. It's in fact the same obect as the object that
would be returned by calling loadHDF5SummarizedExperiment
on
dir
.
Hervé Pagès
SummarizedExperiment and RangedSummarizedExperiment objects in the SummarizedExperiment package.
The writeHDF5Array
function which
saveHDF5SummarizedExperiment
uses internally to write
the assay data to disk.
library(SummarizedExperiment) nrows <- 200; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3), row.names=LETTERS[1:6]) se0 <- SummarizedExperiment(assays=SimpleList(counts=counts), colData=colData) se0 ## Save 'se0' as an HDF5-based SummarizedExperiment object: dir <- sub("file", "h5_se0_", tempfile()) h5_se0 <- saveHDF5SummarizedExperiment(se0, dir) h5_se0 assay(h5_se0, withDimnames=FALSE) # HDF5Matrix object h5_se0b <- loadHDF5SummarizedExperiment(dir) h5_se0b assay(h5_se0b, withDimnames=FALSE) # HDF5Matrix object ## Sanity checks: stopifnot(is(assay(h5_se0, withDimnames=FALSE), "HDF5Matrix")) stopifnot(all(DelayedArray(assay(se0)) == assay(h5_se0))) stopifnot(is(assay(h5_se0b, withDimnames=FALSE), "HDF5Matrix")) stopifnot(all(DelayedArray(assay(se0)) == assay(h5_se0b))) ## --------------------------------------------------------------------- ## More sanity checks ## --------------------------------------------------------------------- ## Make a copy of directory 'dir': somedir <- sub("file", "somedir", tempfile()) dir.create(somedir) file.copy(dir, somedir, recursive=TRUE) dir2 <- list.files(somedir, full.names=TRUE) ## 'dir2' contains a copy of 'dir'. Call loadHDF5SummarizedExperiment() ## on it. h5_se0c <- loadHDF5SummarizedExperiment(dir2) stopifnot(is(assay(h5_se0c, withDimnames=FALSE), "HDF5Matrix")) stopifnot(all(DelayedArray(assay(se0)) == assay(h5_se0c)))