1 Overview

This package provides a lightweight interface between the Bioconductor SingleCellExperiment data structure and the Python AnnData-based single-cell analysis environment. The idea is to enable users and developers to easily move data between these frameworks to construct a multi-language analysis pipeline across R/Bioconductor and Python.

2 Reading and writing H5AD files

The readH5AD() function can be used to read a SingleCellExperiment from a H5AD file. This can be manipulated in the usual way as described in the SingleCellExperiment documentation.

library(zellkonverter)

# Obtaining an example H5AD file.
example_h5ad <- system.file("extdata", "krumsiek11.h5ad",
                            package = "zellkonverter")
readH5AD(example_h5ad)
## class: SingleCellExperiment 
## dim: 11 640 
## metadata(2): highlights iroot
## assays(1): X
## rownames(11): Gata2 Gata1 ... EgrNab Gfi1
## rowData names(0):
## colnames(640): 0 1 ... 158-3 159-3
## colData names(1): cell_type
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

We can also write a SingleCellExperiment to a H5AD file with the writeH5AD() function. This is demonstrated below on the classic Zeisel mouse brain dataset from the scRNAseq package. The resulting file can then be directly used in compatible Python-based analysis frameworks.

library(scRNAseq)

sce_zeisel <- ZeiselBrainData()
out_path <- tempfile(pattern = ".h5ad")
writeH5AD(sce_zeisel, file = out_path)

3 Converting between SingleCellExperiment and AnnData objects

Developers and power users who control their Python environments can directly convert between SingleCellExperiment and AnnData objects using the SCE2AnnData() and AnnData2SCE() utilities. These functions expect that reticulate has already been loaded along with an appropriate version of the anndata package. We suggest using the basilisk package to set up the Python environment before using these functions.

library(basilisk)
library(scRNAseq)

seger <- SegerstolpePancreasData()
roundtrip <- basiliskRun(fun = function(sce) {
     # Convert SCE to AnnData:
     adata <- SCE2AnnData(sce)

     # Maybe do some work in Python on 'adata':
     # BLAH BLAH BLAH

     # Convert back to an SCE:
     AnnData2SCE(adata)
}, env = zellkonverterAnnDataEnv, sce = seger)

Package developers can guarantee that they are using the same versions of Python packages as zellkonverter by using the .AnnDataDependencies variable to set up their Python environments.

.AnnDataDependencies
## [1] "anndata==0.7.6"  "h5py==3.2.1"     "hdf5==1.10.6"    "natsort==7.1.1" 
## [5] "numpy==1.20.2"   "packaging==20.9" "pandas==1.2.4"   "scipy==1.6.3"   
## [9] "sqlite==3.35.5"

4 Session information

sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] basilisk_1.4.0              scRNAseq_2.6.1             
##  [3] SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0
##  [5] Biobase_2.52.0              GenomicRanges_1.44.0       
##  [7] GenomeInfoDb_1.28.0         IRanges_2.26.0             
##  [9] S4Vectors_0.30.0            BiocGenerics_0.38.0        
## [11] MatrixGenerics_1.4.0        matrixStats_0.59.0         
## [13] zellkonverter_1.2.1         knitr_1.33                 
## [15] BiocStyle_2.20.2           
## 
## loaded via a namespace (and not attached):
##  [1] ProtGenerics_1.24.0           bitops_1.0-7                 
##  [3] bit64_4.0.5                   filelock_1.0.2               
##  [5] progress_1.2.2                httr_1.4.2                   
##  [7] tools_4.1.0                   bslib_0.2.5.1                
##  [9] utf8_1.2.1                    R6_2.5.0                     
## [11] lazyeval_0.2.2                DBI_1.1.1                    
## [13] withr_2.4.2                   tidyselect_1.1.1             
## [15] prettyunits_1.1.1             bit_4.0.4                    
## [17] curl_4.3.1                    compiler_4.1.0               
## [19] basilisk.utils_1.4.0          xml2_1.3.2                   
## [21] DelayedArray_0.18.0           rtracklayer_1.52.0           
## [23] bookdown_0.22                 sass_0.4.0                   
## [25] rappdirs_0.3.3                Rsamtools_2.8.0              
## [27] stringr_1.4.0                 digest_0.6.27                
## [29] rmarkdown_2.9                 XVector_0.32.0               
## [31] pkgconfig_2.0.3               htmltools_0.5.1.1            
## [33] ensembldb_2.16.0              dbplyr_2.1.1                 
## [35] fastmap_1.1.0                 rlang_0.4.11                 
## [37] RSQLite_2.2.7                 shiny_1.6.0                  
## [39] BiocIO_1.2.0                  jquerylib_0.1.4              
## [41] generics_0.1.0                jsonlite_1.7.2               
## [43] BiocParallel_1.26.0           dplyr_1.0.7                  
## [45] RCurl_1.98-1.3                magrittr_2.0.1               
## [47] GenomeInfoDbData_1.2.6        Matrix_1.3-4                 
## [49] Rcpp_1.0.6                    fansi_0.5.0                  
## [51] reticulate_1.20               lifecycle_1.0.0              
## [53] stringi_1.6.2                 yaml_2.2.1                   
## [55] zlibbioc_1.38.0               BiocFileCache_2.0.0          
## [57] AnnotationHub_3.0.1           grid_4.1.0                   
## [59] blob_1.2.1                    promises_1.2.0.1             
## [61] ExperimentHub_2.0.0           crayon_1.4.1                 
## [63] dir.expiry_1.0.0              lattice_0.20-44              
## [65] Biostrings_2.60.1             GenomicFeatures_1.44.0       
## [67] hms_1.1.0                     KEGGREST_1.32.0              
## [69] pillar_1.6.1                  rjson_0.2.20                 
## [71] biomaRt_2.48.1                XML_3.99-0.6                 
## [73] glue_1.4.2                    BiocVersion_3.13.1           
## [75] evaluate_0.14                 BiocManager_1.30.16          
## [77] png_0.1-7                     vctrs_0.3.8                  
## [79] httpuv_1.6.1                  purrr_0.3.4                  
## [81] assertthat_0.2.1              cachem_1.0.5                 
## [83] xfun_0.24                     mime_0.10                    
## [85] xtable_1.8-4                  AnnotationFilter_1.16.0      
## [87] restfulr_0.0.13               later_1.2.0                  
## [89] tibble_3.1.2                  GenomicAlignments_1.28.0     
## [91] AnnotationDbi_1.54.1          memoise_2.0.0                
## [93] ellipsis_0.3.2                interactiveDisplayBase_1.30.0