What is the Human Cell Atlas?

From the Human Cell Atlas (HCA) website:

The cell is the core unit of the human body—the key to understanding the biology of health and the ways in which molecular dysfunction leads to disease. Yet our characterization of the hundreds of types and subtypes of cells in the human body is limited, based partly on techniques that have limited resolution and classifications that do not always map neatly to each other. Genomics has offered a systematic approach, but it has largely been applied in bulk to many cell types at once—masking critical differences between cells—and in isolation from other valuable sources of data.

Recent advances in single-cell genomic analysis of cells and tissues have put systematic, high-resolution and comprehensive reference maps of all human cells within reach. In other words, we can now realistically envision a human cell atlas to serve as a basis for both understanding human health and diagnosing, monitoring, and treating disease.

At its core, a cell atlas would be a collection of cellular reference maps, characterizing each of the thousands of cell types in the human body and where they are found. It would be an extremely valuable resource to empower the global research community to systematically study the biological changes associated with different diseases, understand where genes associated with disease are active in our bodies, analyze the molecular mechanisms that govern the production and activity of different cell types, and sort out how different cell types combine and work together to form tissues.

The Human Cell Atlas facilitates queries on it's data coordination platform with a RESTFUL API.

Installation

To install this package, use Bioconductor's BiocManager package.

if (!require("BiocManager"))
    install.packages("BiocManager")
BiocManager::install('HCAExplorer')

library(HCAExplorer)

Obtaining Metadata files from the HCAExplorer

One of the primary tasks of the HCAExplorer is to obtain metadata files of projects and then pass them down to other pipelines to download useful information like expression matrices. To illustrate the functionality of the package, we will first embark on the task of obtaining expression matrices from a selection of projects. We will first, initiate an HCAExplorer object, then look at functions useful for navigating the HCAExplorer object, and finally we will download the manifest file and use it to obtain expression matrices as a LoomExperiment object.

Connecting to the Human Cell Atlas

The HCAExplorer package relies on having network connectivety. Also, the a link to a viable digest of the Human Cell Atlas must also be operational. The backend that we are using will be using is refered to as the “azul backend”. This package is meant to mirror the functionality of the HCA Data Explorer.

The HCAExplorer object serves as the representation of the Human Cell Atlas. Upon creation, it will automatically perform a cursorary query and display a small table showing the first few project of the entire HCA. This intial table contains some columns that we have determined are most useful to users. The output also displays the url of the instance of the HCA digest being used, the current query, relevant information about the quantity of data being displayed, and finally the table of projects.

By default, 15 entries per page will be displayed in the result and the default url to the HCA DCP will be used. These two values can be changed in the constructor or later on using methods.

If the HCA cannot be reached, an error will be thrown displaying the status of the request.

hca <- HCAExplorer(url = 'https://service.explore.data.humancellatlas.org', per_page = 15)
hca

## class: HCAExplorer 
## Using azul backend at:
##   https://service.explore.data.humancellatlas.org 
## 
## Donor count: 289 
## Specimens: 780 
## Estimated Cells: 4535231 
## Files: 536961 
## File Size: 30.2 Tb 
## 
## Showing projects with 15 results per page.# A tibble: 15 x 7
##    projects.projec… samples.sampleE… samples.organ protocols.libra…
##    <chr>            <chr>            <chr>         <chr>           
##  1 1.3 Million Bra… specimens        brain         10X v2 sequenci…
##  2 A Single-Cell T… specimens        pancreas      inDrop, NULL    
##  3 A single-cell m… specimens        embryo        10X 3' v1 seque…
##  4 A single-cell r… specimens        blood, hemat… 10X v2 sequenci…
##  5 A single-cell t… specimens        eye           10X v2 sequenci…
##  6 Assessing the r… organoids        <NA>          10X v2 sequenci…
##  7 Bone marrow pla… specimens        hematopoieti… MARS-seq, NULL  
##  8 Cell hashing wi… specimens        blood         CITE-seq, NULL  
##  9 Census of Immun… specimens        blood, immun… 10X v2 sequenci…
## 10 Comparison, cal… cellLines        <NA>          DroNc-Seq, Drop…
## 11 Dissecting the … specimens        liver         10X v2 sequenci…
## 12 Ischaemic sensi… specimens        esophagus, l… 10X v2 sequenci…
## 13 Melanoma infilt… specimens        lymph node, … Smart-seq2, NULL
## 14 Precursors of h… specimens        blood         Smart-seq2, NULL
## 15 Profiling of CD… cellLines        <NA>          10X v2 sequenci…
## # … with 3 more variables: protocols.pairedEnd <chr>,
## #   donorOrganisms.genusSpecies <chr>, samples.disease <chr>
## Showing page 1 of 2

Upon displaying the object, multiple fields can be seen:

The class: HCAExplorer
The azul-backend address that is currently being used.
The current query.
The projects being shown and whether a link to more results is available.
The number of projects being shown per_page.
The results tibble of the query. (This table is abbreviated to show only columns that we determined are most useful to the user.)

The results tibble can be obtained using the results() method.

results(hca)

## # A tibble: 15 x 52
##    protocols.libra… protocols.instr… protocols.paire… protocols.workf…
##    <chr>            <chr>            <chr>            <chr>           
##  1 10X v2 sequenci… Illumina HiSeq … FALSE, NULL      NULL            
##  2 inDrop, NULL     Illumina HiSeq … TRUE, NULL       NULL            
##  3 10X 3' v1 seque… Illumina HiSeq … FALSE, NULL      NULL            
##  4 10X v2 sequenci… Illumina HiSeq … FALSE, NULL      optimus_v1.3.5,…
##  5 10X v2 sequenci… Illumina HiSeq … FALSE, NULL      optimus_v1.3.5,…
##  6 10X v2 sequenci… Illumina HiSeq … TRUE, NULL       optimus_v1.3.1,…
##  7 MARS-seq, NULL   Illumina NextSe… TRUE, NULL       NULL            
##  8 CITE-seq, NULL   Illumina Hiseq … FALSE, NULL      NULL            
##  9 10X v2 sequenci… Illumina HiSeq … FALSE, NULL      optimus_v1.3.2,…
## 10 DroNc-Seq, Drop… Illumina NextSe… TRUE, NULL       NULL            
## 11 10X v2 sequenci… Illumina Hiseq … FALSE, NULL      optimus_v1.3.5,…
## 12 10X v2 sequenci… Illumina HiSeq … FALSE, TRUE, NU… optimus_v1.3.5,…
## 13 Smart-seq2, NULL Illumina HiSeq … TRUE, NULL       NULL            
## 14 Smart-seq2, NULL Illumina HiSeq … FALSE, NULL      NULL            
## 15 10X v2 sequenci… Illumina HiSeq … FALSE, NULL      optimus_v1.3.1,…
## # … with 48 more variables: protocols.assayType <chr>, entryId <chr>,
## #   projects.projectTitle <chr>, projects.projectShortname <chr>,
## #   projects.laboratory <chr>, projects.arrayExpressAccessions <chr>,
## #   projects.geoSeriesAccessions <chr>, projects.insdcProjectAccessions <chr>,
## #   projects.insdcStudyAccessions <chr>, projects.supplementaryLinks <chr>,
## #   samples.sampleEntityType <chr>, samples.effectiveOrgan <chr>,
## #   samples.organ <chr>, samples.id <chr>, samples.preservationMethod <chr>,
## #   samples.source <chr>, samples.organPart <chr>, samples.disease <chr>,
## #   specimens.id <chr>, specimens.organ <chr>, specimens.organPart <chr>,
## #   specimens.disease <chr>, specimens.preservationMethod <chr>,
## #   specimens.source <chr>, donorOrganisms.id <chr>,
## #   donorOrganisms.donorCount <chr>, donorOrganisms.genusSpecies <chr>,
## #   donorOrganisms.organismAge <chr>, donorOrganisms.organismAgeUnit <chr>,
## #   donorOrganisms.organismAgeRange <chr>, donorOrganisms.biologicalSex <chr>,
## #   donorOrganisms.disease <chr>, cellSuspensions.organ <chr>,
## #   cellSuspensions.organPart <chr>, cellSuspensions.selectedCellType <chr>,
## #   cellSuspensions.totalCells <chr>, fileTypeSummaries.fileType <chr>,
## #   fileTypeSummaries.count <chr>, fileTypeSummaries.totalSize <chr>,
## #   samples.modelOrganPart <chr>, samples.modelOrgan <chr>, cellLines.id <chr>,
## #   cellLines.cellLineType <chr>, cellLines.modelOrgan <chr>,
## #   organoids.id <chr>, organoids.modelOrgan <chr>,
## #   organoids.modelOrganPart <chr>, samples.cellLineType <chr>

There are various columns that can be displayed in an HCAExplorer object. By default, only a few columns are shown. We can change which columns are shown by using select. For example, the following will only show projects.projectTitle and samples.organ columns when the object as shown.

hca <- hca %>% select('projects.projectTitle', 'samples.organ')
hca

## class: HCAExplorer 
## Using azul backend at:
##   https://service.explore.data.humancellatlas.org 
## 
## Donor count: 289 
## Specimens: 780 
## Estimated Cells: 4535231 
## Files: 536961 
## File Size: 30.2 Tb 
## 
## Showing projects with 15 results per page.# A tibble: 15 x 2
##    projects.projectTitle                        samples.organ                   
##    <chr>                                        <chr>                           
##  1 1.3 Million Brain Cells from E18 Mice        brain                           
##  2 A Single-Cell Transcriptomic Map of the Hum… pancreas                        
##  3 A single-cell molecular map of mouse gastru… embryo                          
##  4 A single-cell reference map of transcriptio… blood, hematopoietic system, lu…
##  5 A single-cell transcriptome atlas of the ad… eye                             
##  6 Assessing the relevance of organoids to mod… <NA>                            
##  7 Bone marrow plasma cells from hip replaceme… hematopoietic system            
##  8 Cell hashing with barcoded antibodies enabl… blood                           
##  9 Census of Immune Cells                       blood, immune system            
## 10 Comparison, calibration, and benchmarking o… <NA>                            
## 11 Dissecting the human liver cellular landsca… liver                           
## 12 Ischaemic sensitivity of human tissue by si… esophagus, lung, spleen         
## 13 Melanoma infiltration of stromal and immune… lymph node, skin of body, tumor 
## 14 Precursors of human CD4+ cytotoxic T lympho… blood                           
## 15 Profiling of CD34+ cells from human bone ma… <NA>                            
## Showing page 1 of 2

The original selection can be restored with resetSelect()

hca <- resetSelect(hca)
hca

## class: HCAExplorer 
## Using azul backend at:
##   https://service.explore.data.humancellatlas.org 
## 
## Donor count: 289 
## Specimens: 780 
## Estimated Cells: 4535231 
## Files: 536961 
## File Size: 30.2 Tb 
## 
## Showing projects with 15 results per page.# A tibble: 15 x 7
##    projects.projec… samples.sampleE… samples.organ protocols.libra…
##    <chr>            <chr>            <chr>         <chr>           
##  1 1.3 Million Bra… specimens        brain         10X v2 sequenci…
##  2 A Single-Cell T… specimens        pancreas      inDrop, NULL    
##  3 A single-cell m… specimens        embryo        10X 3' v1 seque…
##  4 A single-cell r… specimens        blood, hemat… 10X v2 sequenci…
##  5 A single-cell t… specimens        eye           10X v2 sequenci…
##  6 Assessing the r… organoids        <NA>          10X v2 sequenci…
##  7 Bone marrow pla… specimens        hematopoieti… MARS-seq, NULL  
##  8 Cell hashing wi… specimens        blood         CITE-seq, NULL  
##  9 Census of Immun… specimens        blood, immun… 10X v2 sequenci…
## 10 Comparison, cal… cellLines        <NA>          DroNc-Seq, Drop…
## 11 Dissecting the … specimens        liver         10X v2 sequenci…
## 12 Ischaemic sensi… specimens        esophagus, l… 10X v2 sequenci…
## 13 Melanoma infilt… specimens        lymph node, … Smart-seq2, NULL
## 14 Precursors of h… specimens        blood         Smart-seq2, NULL
## 15 Profiling of CD… cellLines        <NA>          10X v2 sequenci…
## # … with 3 more variables: protocols.pairedEnd <chr>,
## #   donorOrganisms.genusSpecies <chr>, samples.disease <chr>
## Showing page 1 of 2

To toggle whether projects, samples, or file are being displayed in the tibble, the activate() method can be used to choose which to display.

## The HCAExplorer object is activated here by 'samples'
hca <- hca %>% activate('samples')
hca

## class: HCAExplorer 
## Using azul backend at:
##   https://service.explore.data.humancellatlas.org 
## 
## Donor count: 289 
## Specimens: 780 
## Estimated Cells: 4535231 
## Files: 536961 
## File Size: 30.2 Tb 
## 
## Showing projects with 15 results per page.# A tibble: 15 x 12
##    samples.id projects.projec… samples.sampleE… samples.organ samples.organPa…
##    <chr>      <chr>            <chr>            <chr>         <chr>           
##  1 E18_20160… 1.3 Million Bra… specimens        brain         cortex          
##  2 H1_pancre… A Single-Cell T… specimens        pancreas      NULL            
##  3 embryo_po… A single-cell m… specimens        embryo        NULL            
##  4 PP001, PP… A single-cell r… specimens        blood, hemat… Left lateral ba…
##  5 17-010-R,… A single-cell t… specimens        eye           retinal neural …
##  6 Org_HPSI0… Assessing the r… organoids        <NA>          <NA>            
##  7 Hip10_spe… Bone marrow pla… specimens        hematopoieti… bone marrow     
##  8 Specimen_… Cell hashing wi… specimens        blood         NULL            
##  9 1_BM2, 1_… Census of Immun… specimens        blood, immun… bone marrow, um…
## 10 cell_line… Comparison, cal… cellLines        <NA>          <NA>            
## 11 P1TLH_liv… Dissecting the … specimens        liver         caudate lobe    
## 12 302C72hSp… Ischaemic sensi… specimens        esophagus, l… esophagus mucos…
## 13 1104_LN, … Melanoma infilt… specimens        lymph node, … NULL            
## 14 Subject10… Precursors of h… specimens        blood         peripheral bloo…
## 15 HS_BM_1_c… Profiling of CD… cellLines        <NA>          <NA>            
## # … with 7 more variables: cellSuspensions.selectedCellType <chr>,
## #   protocols.libraryConstructionApproach <chr>, protocols.pairedEnd <chr>,
## #   donorOrganisms.genusSpecies <chr>, donorOrganisms.organismAge <chr>,
## #   donorOrganisms.biologicalSex <chr>, samples.disease <chr>
## Showing page 1 of 2

## Revert back to showing projects with 'projects'
hca <- hca %>% activate('projects')
hca

## class: HCAExplorer 
## Using azul backend at:
##   https://service.explore.data.humancellatlas.org 
## 
## Donor count: 289 
## Specimens: 780 
## Estimated Cells: 4535231 
## Files: 536961 
## File Size: 30.2 Tb 
## 
## Showing projects with 15 results per page.# A tibble: 15 x 7
##    projects.projec… samples.sampleE… samples.organ protocols.libra…
##    <chr>            <chr>            <chr>         <chr>           
##  1 1.3 Million Bra… specimens        brain         10X v2 sequenci…
##  2 A Single-Cell T… specimens        pancreas      inDrop, NULL    
##  3 A single-cell m… specimens        embryo        10X 3' v1 seque…
##  4 A single-cell r… specimens        blood, hemat… 10X v2 sequenci…
##  5 A single-cell t… specimens        eye           10X v2 sequenci…
##  6 Assessing the r… organoids        <NA>          10X v2 sequenci…
##  7 Bone marrow pla… specimens        hematopoieti… MARS-seq, NULL  
##  8 Cell hashing wi… specimens        blood         CITE-seq, NULL  
##  9 Census of Immun… specimens        blood, immun… 10X v2 sequenci…
## 10 Comparison, cal… cellLines        <NA>          DroNc-Seq, Drop…
## 11 Dissecting the … specimens        liver         10X v2 sequenci…
## 12 Ischaemic sensi… specimens        esophagus, l… 10X v2 sequenci…
## 13 Melanoma infilt… specimens        lymph node, … Smart-seq2, NULL
## 14 Precursors of h… specimens        blood         Smart-seq2, NULL
## 15 Profiling of CD… cellLines        <NA>          10X v2 sequenci…
## # … with 3 more variables: protocols.pairedEnd <chr>,
## #   donorOrganisms.genusSpecies <chr>, samples.disease <chr>
## Showing page 1 of 2

Looking at the bottom of the output, it can be that there are more pages of results to be shown. The next set of entries can be obtained using the nextResults method.

hca <- nextResults(hca)
hca

## class: HCAExplorer 
## Using azul backend at:
##   https://service.explore.data.humancellatlas.org 
## 
## Donor count: 289 
## Specimens: 780 
## Estimated Cells: 4535231 
## Files: 536961 
## File Size: 30.2 Tb 
## 
## Showing projects with 15 results per page.# A tibble: 13 x 7
##    projects.projec… samples.sampleE… samples.organ protocols.libra…
##    <chr>            <chr>            <chr>         <chr>           
##  1 Reconstructing … specimens        blood, decid… 10X v2 sequenci…
##  2 Single Cell Tra… specimens        kidney        inDrop, NULL    
##  3 Single cell pro… cellLines, spec… embryo, NULL  10X v2 sequenci…
##  4 Single cell tra… specimens        pancreas      Smart-seq2, NULL
##  5 Single-cell RNA… cellLines        <NA>          Smart-seq2, NULL
##  6 Single-cell RNA… specimens        pancreas      Smart-seq2, NULL
##  7 Spatio-temporal… specimens        kidney        10X v2 sequenci…
##  8 Structural Remo… specimens        colon         10X 3' v2 seque…
##  9 Systematic comp… specimens        blood, brain  10X v2 sequenci…
## 10 Tabula Muris: T… specimens        adipose tiss… Smart-seq2, NULL
## 11 The Single Cell… specimens        kidney        10X 5' v2 seque…
## 12 The emergent la… specimens        embryo, endo… 10X v2 sequenci…
## 13 Transcriptomic … specimens        eye           10x 3' v3 seque…
## # … with 3 more variables: protocols.pairedEnd <chr>,
## #   donorOrganisms.genusSpecies <chr>, samples.disease <chr>
## Showing page 2 of 2

Querying the HCAExplorer

Once the HCAExplorer object is made, one can beging browsing the data present in the Human Cell Atlas.

Suppose we would like to search projects that have samples taken from a particular organ. First, it is helpdul to understand which fields are available to query upon. To do this, use the fields() method.

hca <- HCAExplorer()
fields(hca)

##  [1] "organ"                       "sampleEntityType"           
##  [3] "project"                     "assayType"                  
##  [5] "instrumentManufacturerModel" "institution"                
##  [7] "donorDisease"                "organismAgeUnit"            
##  [9] "organismAge"                 "pairedEnd"                  
## [11] "preservationMethod"          "genusSpecies"               
## [13] "projectTitle"                "modelOrganPart"             
## [15] "disease"                     "specimenOrganPart"          
## [17] "workflow"                    "contactName"                
## [19] "specimenOrgan"               "effectiveOrgan"             
## [21] "organPart"                   "publicationTitle"           
## [23] "cellLineType"                "libraryConstructionApproach"
## [25] "biologicalSex"               "laboratory"                 
## [27] "projectDescription"          "selectedCellType"           
## [29] "specimenDisease"             "fileFormat"                 
## [31] "modelOrgan"

This function return all possible fields that can be queried upon. We can now see that their is a field named “organ”. Since, we are looking at what values are avaiable for querying on organs, we can now use the values() method to do just that.

values(hca, 'organ')

## # A tibble: 33 x 2
##    value hits                
##    <chr> <chr>               
##  1 blood 6                   
##  2 5     kidney              
##  3 4     pancreas            
##  4 4     brain               
##  5 3     embryo              
##  6 3     lung                
##  7 3     eye                 
##  8 2     hematopoietic system
##  9 2     liver               
## 10 2     skin of body        
## # … with 23 more rows

We can now see all possible values of 'organ' across all project as well as their frequency. Let's now decide that we would like to see projects that involve either blood or brain samples. The next step is to perform the query.

The HCAExplorer extends the functionality of the dplyr package's filter() and select() methods.

The filter() method allows the user to query the Human Cell Atlas by relating fields to certain values. Character fields can be queried using the operators:

==
%in%

Combination operators can be used to combine queries

We can use either the == or %in% operator in a filter statement to contruct a query.

hca2 <- hca %>% filter(organ == c('blood', 'brain'))
hca <- hca %>% filter(organ %in% c('blood', 'brain'))
hca

## class: HCAExplorer 
## Using azul backend at:
##   https://service.explore.data.humancellatlas.org 
## 
## Donor count: 53 
## Specimens: 161 
## Estimated Cells: 1676505 
## Files: 134443 
## File Size: 7.1 Tb 
## 
## Showing projects with 15 results per page.# A tibble: 8 x 7
##   projects.projec… samples.sampleE… samples.organ protocols.libra…
##   <chr>            <chr>            <chr>         <chr>           
## 1 1.3 Million Bra… specimens        brain         10X v2 sequenci…
## 2 A single-cell r… specimens        blood, hemat… 10X v2 sequenci…
## 3 Cell hashing wi… specimens        blood         CITE-seq, NULL  
## 4 Census of Immun… specimens        blood, immun… 10X v2 sequenci…
## 5 Precursors of h… specimens        blood         Smart-seq2, NULL
## 6 Reconstructing … specimens        blood, decid… 10X v2 sequenci…
## 7 Systematic comp… specimens        blood, brain  10X v2 sequenci…
## 8 Tabula Muris: T… specimens        adipose tiss… Smart-seq2, NULL
## # … with 3 more variables: protocols.pairedEnd <chr>,
## #   donorOrganisms.genusSpecies <chr>, samples.disease <chr>
## Showing page 1 of 1

Suppose we also wish to also search for results based on the disease. We already know a “disease” field exists from our field() function. Now we can see what disease values are present in our current results.

values(hca, 'disease')

## # A tibble: 3 x 2
##   value hits                         
##   <chr> <chr>                        
## 1 5     normal                       
## 2 3     orofaciodigital syndrome VIII
## 3 1     5

These are the possible values only for the results of our previous search. Now suppose we would like to search for project only that have samples with no disease (we see through values() that this is labeled as “normal”). We can now accomplish this with any of the following searchs. To show multiple searches, we will also use the methods undoQuery() and resetQuery() to step reset our search. undoQuery() can step back one or many queries. resetQuery() undos all queries.

hca <- hca %>% filter(disease == 'normal')
hca <- undoQuery(hca, n = 2L)

hca <- hca %>% filter(organ %in% c('Brain', 'brain'), disease == 'normal')
hca <- resetQuery(hca)

hca <- hca %>% filter(organ %in% c('Brain', 'brain') & disease == 'normal')
hca

## class: HCAExplorer 
## Using azul backend at:
##   https://service.explore.data.humancellatlas.org 
## 
## Donor count: 4 
## Specimens: 4 
## Estimated Cells: 1345175 
## Files: 48909 
## File Size: 3.4 Tb 
## 
## Showing projects with 15 results per page.# A tibble: 2 x 7
##   projects.projec… samples.sampleE… samples.organ protocols.libra…
##   <chr>            <chr>            <chr>         <chr>           
## 1 1.3 Million Bra… specimens        brain         10X v2 sequenci…
## 2 Systematic comp… specimens        blood, brain  10X v2 sequenci…
## # … with 3 more variables: protocols.pairedEnd <chr>,
## #   donorOrganisms.genusSpecies <chr>, samples.disease <chr>
## Showing page 1 of 1

We can refine out search further by using subsetting to only include a few results. Here, the [ symbol can be used to select paricular rows by either index or project name. These selections are added to our search as a query against the “projectId”. Here we take the first two results from our HCAExplorer object.

hca <- hca[1:2,]
hca

## class: HCAExplorer 
## Using azul backend at:
##   https://service.explore.data.humancellatlas.org 
## 
## Donor count: 4 
## Specimens: 4 
## Estimated Cells: 1345175 
## Files: 48909 
## File Size: 3.4 Tb 
## 
## Showing projects with 15 results per page.# A tibble: 2 x 7
##   projects.projec… samples.sampleE… samples.organ protocols.libra…
##   <chr>            <chr>            <chr>         <chr>           
## 1 1.3 Million Bra… specimens        brain         10X v2 sequenci…
## 2 Systematic comp… specimens        blood, brain  10X v2 sequenci…
## # … with 3 more variables: protocols.pairedEnd <chr>,
## #   donorOrganisms.genusSpecies <chr>, samples.disease <chr>
## Showing page 1 of 1

Obtaining manifest files from the HCAExplorer

Now that we have completed our query, we can obtain the file manifest of our selected projects. First, we must find which possible file formats are available for download. To do this, we use the getManifestFileFormats().

formats <- getManifestFileFormats(hca)
formats

##  [1] "fastq"    "csv"      "txt"      "fastq.gz" "bam"      "results" 
##  [7] "matrix"   "bai"      "unknown"  "csv.gz"   "npy"      "npz"

Now that we have the possible file formats, we can download the manifest as a tibble. To do this, we use the getManifest() method.

manifest <- getManifest(hca, fileFormat = formats[1])
manifest

## # A tibble: 16,377 x 44
##    bundle_uuid bundle_version      file_name file_format read_index file_size
##    <chr>       <dttm>              <chr>     <chr>       <chr>          <dbl>
##  1 005e1897-7… 2019-05-16 21:18:13 E18_2016… fastq       read2      470298189
##  2 08e574ca-0… 2019-05-16 21:18:13 E18_2016… fastq       index1      43976974
##  3 08ff63dc-5… 2019-05-16 21:18:13 E18_2016… fastq       read2      365789618
##  4 08ff63dc-5… 2019-05-16 21:18:13 E18_2016… fastq       read1       84924334
##  5 005e1897-7… 2019-05-16 21:18:13 E18_2016… fastq       read1      109170151
##  6 005e1897-7… 2019-05-16 21:18:13 E18_2016… fastq       index1      46689743
##  7 08e574ca-0… 2019-05-16 21:18:13 E18_2016… fastq       read2      481091238
##  8 5b1517a5-0… 2019-05-16 21:18:13 E18_2016… fastq       read1      103606544
##  9 7831bf6f-c… 2019-05-16 21:18:13 E18_2016… fastq       read1       82449358
## 10 0ef1b7a4-4… 2019-05-16 21:18:13 E18_2016… fastq       index1      32305229
## # … with 16,367 more rows, and 38 more variables: file_uuid <chr>,
## #   file_version <dttm>, file_sha256 <chr>, file_content_type <chr>,
## #   cell_suspension.provenance.document_id <chr>,
## #   cell_suspension.biomaterial_core.biomaterial_id <chr>,
## #   cell_suspension.estimated_cell_count <dbl>,
## #   cell_suspension.selected_cell_type <chr>,
## #   sequencing_process.provenance.document_id <chr>,
## #   sequencing_protocol.instrument_manufacturer_model <chr>,
## #   sequencing_protocol.paired_end <lgl>,
## #   library_preparation_protocol.library_construction_approach <chr>,
## #   project.provenance.document_id <chr>,
## #   project.contributors.institution <chr>,
## #   project.contributors.laboratory <chr>,
## #   project.project_core.project_short_name <chr>,
## #   project.project_core.project_title <chr>,
## #   specimen_from_organism.provenance.document_id <chr>,
## #   specimen_from_organism.diseases <chr>, specimen_from_organism.organ <chr>,
## #   specimen_from_organism.organ_part <chr>,
## #   specimen_from_organism.preservation_storage.preservation_method <chr>,
## #   donor_organism.sex <chr>,
## #   donor_organism.biomaterial_core.biomaterial_id <chr>,
## #   donor_organism.provenance.document_id <chr>,
## #   donor_organism.genus_species <chr>, donor_organism.diseases <chr>,
## #   donor_organism.organism_age <dbl>, donor_organism.organism_age_unit <chr>,
## #   cell_line.provenance.document_id <lgl>,
## #   cell_line.biomaterial_core.biomaterial_id <lgl>,
## #   organoid.provenance.document_id <lgl>,
## #   organoid.biomaterial_core.biomaterial_id <lgl>, organoid.model_organ <lgl>,
## #   organoid.model_organ_part <lgl>, `_entity_type` <chr>,
## #   sample.provenance.document_id <chr>,
## #   sample.biomaterial_core.biomaterial_id <chr>

Downloading Expression Matrices

HCAExplorer is able to download expression matrices availiable on the HCA Data Portal site. These are precomputed matrices and the HCAMatrixBrowser package should be used if the user wants to generate their own matrices.

The checkExpressionMatricesAvailability() method returns a tibble displaying whether the projects in the HCAExplorer object are available for download.

hca <- HCAExplorer()
checkExpressionMatricesAvailability(hca, format = "loom")

## # A tibble: 15 x 3
##    projects.projectTitle                   donorOrganisms.genu… matrix.availiab…
##    <chr>                                   <chr>                <lgl>           
##  1 1.3 Million Brain Cells from E18 Mice   Mus musculus         FALSE           
##  2 A Single-Cell Transcriptomic Map of th… Homo sapiens, Mus m… FALSE           
##  3 A single-cell molecular map of mouse g… Mus musculus         FALSE           
##  4 A single-cell reference map of transcr… Homo sapiens         TRUE            
##  5 A single-cell transcriptome atlas of t… Homo sapiens         TRUE            
##  6 Assessing the relevance of organoids t… Homo sapiens         TRUE            
##  7 Bone marrow plasma cells from hip repl… Homo sapiens         FALSE           
##  8 Cell hashing with barcoded antibodies … Homo sapiens         FALSE           
##  9 Census of Immune Cells                  Homo sapiens         TRUE            
## 10 Comparison, calibration, and benchmark… Homo sapiens         FALSE           
## 11 Dissecting the human liver cellular la… Homo sapiens         TRUE            
## 12 Ischaemic sensitivity of human tissue … Homo sapiens         TRUE            
## 13 Melanoma infiltration of stromal and i… Mus musculus         FALSE           
## 14 Precursors of human CD4+ cytotoxic T l… Homo sapiens         FALSE           
## 15 Profiling of CD34+ cells from human bo… Homo sapiens         TRUE

The downloadExpressionMatrices() method downloads the expression matrices and returns them as a certain format.

If format is "loom", a list of LoomExperiments objects will be returned.
If format is "csv", a list of tibbles objects will be returned.
If format is "mtx", a list of SingleCellExperiments objects will be returned.

Some entries may contain multiple organisms for download, usually either "Homo sapiens" or "Mus musculus". If the organism argument is not specified, all tables will attempt to be downloaded.

By default, expression matrices will be saved using BiocFileCache to mantain a persistent copy of the file between sessions, as specified by the useBiocFileCache argument. If useBiocFileCache = FALSE, a temporary copy of the expression matrices will be saved. Although using BiocFileCache is recommeneded, we specify useBiocFileCache = FALSE here so that this example does not create a persistent copy of the file.

## Create HCAExplorer object
hca <- HCAExplorer()

## Obtain the fifth project by subsetting
hca <- hca[5]

## Download project's expression matrix file as a LoomExperiment object
le <- downloadExpressionMatrices(hca, format = "loom", useBiocFileCache = FALSE)
le

sessionInfo()

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] HCAExplorer_1.4.0 dplyr_1.0.2       BiocStyle_2.18.0  knitr_1.30       
## 
## loaded via a namespace (and not attached):
##  [1] MatrixGenerics_1.2.0        Biobase_2.50.0             
##  [3] httr_1.4.2                  tidyr_1.1.2                
##  [5] tidygraph_1.2.0             bit64_4.0.5                
##  [7] jsonlite_1.7.1              assertthat_0.2.1           
##  [9] BiocManager_1.30.10         stats4_4.0.3               
## [11] BiocFileCache_1.14.0        blob_1.2.1                 
## [13] GenomeInfoDbData_1.2.4      Rsamtools_2.6.0            
## [15] yaml_2.2.1                  pillar_1.4.6               
## [17] RSQLite_2.2.1               lattice_0.20-41            
## [19] glue_1.4.2                  digest_0.6.27              
## [21] GenomicRanges_1.42.0        XVector_0.30.0             
## [23] htmltools_0.5.0             Matrix_1.2-18              
## [25] plyr_1.8.6                  XML_3.99-0.5               
## [27] pkgconfig_2.0.3             zlibbioc_1.36.0            
## [29] purrr_0.3.4                 HDF5Array_1.18.0           
## [31] BiocParallel_1.24.0         tibble_3.0.4               
## [33] generics_0.0.2              IRanges_2.24.0             
## [35] ellipsis_0.3.1              SummarizedExperiment_1.20.0
## [37] BiocGenerics_0.36.0         cli_2.1.0                  
## [39] magrittr_1.5                crayon_1.3.4               
## [41] ps_1.4.0                    memoise_1.1.0              
## [43] evaluate_0.14               fansi_0.4.1                
## [45] xml2_1.3.2                  tools_4.0.3                
## [47] hms_0.5.3                   lifecycle_0.2.0            
## [49] matrixStats_0.57.0          stringr_1.4.0              
## [51] Rhdf5lib_1.12.0             S4Vectors_0.28.0           
## [53] DelayedArray_0.16.0         Biostrings_2.58.0          
## [55] compiler_4.0.3              GenomeInfoDb_1.26.0        
## [57] rlang_0.4.8                 rhdf5_2.34.0               
## [59] grid_4.0.3                  RCurl_1.98-1.2             
## [61] rstudioapi_0.11             rhdf5filters_1.2.0         
## [63] LoomExperiment_1.8.0        rappdirs_0.3.1             
## [65] SingleCellExperiment_1.12.0 igraph_1.2.6               
## [67] bitops_1.0-6                rmarkdown_2.5              
## [69] DBI_1.1.0                   curl_4.3                   
## [71] R6_2.4.1                    GenomicAlignments_1.26.0   
## [73] rtracklayer_1.50.0          utf8_1.1.4                 
## [75] bit_4.0.4                   readr_1.4.0                
## [77] stringi_1.5.3               parallel_4.0.3             
## [79] Rcpp_1.0.5                  vctrs_0.3.4                
## [81] dbplyr_1.4.4                tidyselect_1.1.0           
## [83] xfun_0.18

Developer notes

The S3 object-oriented programming paradigm is used.
Methods from the dplyr package can be used to manipulate objects in the HCAExplorer package.
In the future, we wish to expand the functionalit of this packages to cover the remaining functionality of the hca dcp api.