Contents

Installation

library(cBioPortalData)
library(AnVIL)

Introduction

This vignette lays out the two main user-facing functions for downloading and representing data from the cBioPortal API. cBioDataPack makes use of the legacy distribution data method in cBioPortal (via tarballs). cBioPortalData allows for a more flexibile approach to obtaining data based on several available parameters including available molecular profiles.

Two main interfaces

cBioDataPack: Obtain Study Data as Zipped Tarballs

This function will access the packaged data from and return an integrative MultiAssayExperiment representation.

## Use ask=FALSE for non-interactive use
cBioDataPack("laml_tcga", ask = FALSE)
## A MultiAssayExperiment object of 12 listed
##  experiments with user-defined names and respective classes.
##  Containing an ExperimentList class object of length 12:
##  [1] CNA: SummarizedExperiment with 24776 rows and 191 columns
##  [2] RNA_Seq_expression_median: SummarizedExperiment with 19720 rows and 179 columns
##  [3] RNA_Seq_mRNA_median_all_sample_Zscores: SummarizedExperiment with 19720 rows and 179 columns
##  [4] RNA_Seq_v2_expression_median: SummarizedExperiment with 20531 rows and 173 columns
##  [5] RNA_Seq_v2_mRNA_median_Zscores: SummarizedExperiment with 20440 rows and 173 columns
##  [6] RNA_Seq_v2_mRNA_median_all_sample_Zscores: SummarizedExperiment with 20531 rows and 173 columns
##  [7] cna_hg19.seg: RaggedExperiment with 13571 rows and 191 columns
##  [8] linear_CNA: SummarizedExperiment with 24776 rows and 191 columns
##  [9] methylation_hm27: SummarizedExperiment with 10919 rows and 194 columns
##  [10] methylation_hm450: SummarizedExperiment with 10919 rows and 194 columns
##  [11] mutations_extended: RaggedExperiment with 2584 rows and 197 columns
##  [12] mutations_mskcc: RaggedExperiment with 2584 rows and 197 columns
## Features:
##  experiments() - obtain the ExperimentList instance
##  colData() - the primary/phenotype DFrame
##  sampleMap() - the sample availability DFrame
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment
##  *Format() - convert into a long or wide DFrame
##  assays() - convert ExperimentList to a SimpleList of matrices

cBioPortalData: Obtain data from the cBioPortal API

This function provides a more flexible and granular way to request a MultiAssayExperiment object from a study ID, molecular profile, gene panel, sample list.

cbio <- cBioPortal()
acc <- cBioPortalData(api = cbio, by = "hugoGeneSymbol", studyId = "acc_tcga",
    genePanelId = "IMPACT341",
    molecularProfileIds = c("acc_tcga_rppa", "acc_tcga_linear_CNA")
)
## harmonizing input:
##   removing 1 colData rownames not in sampleMap 'primary'
acc
## A MultiAssayExperiment object of 2 listed
##  experiments with user-defined names and respective classes.
##  Containing an ExperimentList class object of length 2:
##  [1] acc_tcga_rppa: SummarizedExperiment with 57 rows and 46 columns
##  [2] acc_tcga_linear_CNA: SummarizedExperiment with 339 rows and 90 columns
## Features:
##  experiments() - obtain the ExperimentList instance
##  colData() - the primary/phenotype DFrame
##  sampleMap() - the sample availability DFrame
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment
##  *Format() - convert into a long or wide DFrame
##  assays() - convert ExperimentList to a SimpleList of matrices

Clearing the cache

cBioDataPack

In cases where a download is interrupted, the user may experience a corrupt cache. The user can clear the cache for a particular study by using the removeCache function. Note that this function only works for data downloaded through the cBioDataPack function.

removeCache("laml_tcga")

cBioPortalData

For users who wish to clear the entire cBioPortalData cache, it is recommended that they use:

unlink("~/.cache/cBioPortalData/")

sessionInfo

sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows Server 2012 R2 x64 (build 9600)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=C                          
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] cBioPortalData_2.0.10       MultiAssayExperiment_1.14.0
##  [3] SummarizedExperiment_1.18.2 DelayedArray_0.14.1        
##  [5] matrixStats_0.57.0          Biobase_2.48.0             
##  [7] GenomicRanges_1.40.0        GenomeInfoDb_1.24.2        
##  [9] IRanges_2.22.2              S4Vectors_0.26.1           
## [11] BiocGenerics_0.34.0         AnVIL_1.0.3                
## [13] dplyr_1.0.2                 BiocStyle_2.16.1           
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-6              bit64_4.0.5              
##  [3] progress_1.2.2            httr_1.4.2               
##  [5] GenomicDataCommons_1.12.0 tools_4.0.3              
##  [7] R6_2.4.1                  DBI_1.1.0                
##  [9] tidyselect_1.1.0          prettyunits_1.1.1        
## [11] TCGAutils_1.8.1           bit_4.0.4                
## [13] curl_4.3                  compiler_4.0.3           
## [15] cli_2.1.0                 rvest_0.3.6              
## [17] formatR_1.7               xml2_1.3.2               
## [19] rtracklayer_1.48.0        bookdown_0.21            
## [21] readr_1.4.0               askpass_1.1              
## [23] rappdirs_0.3.1            rapiclient_0.1.3         
## [25] RCircos_1.2.1             stringr_1.4.0            
## [27] digest_0.6.25             Rsamtools_2.4.0          
## [29] rmarkdown_2.4             XVector_0.28.0           
## [31] pkgconfig_2.0.3           htmltools_0.5.0          
## [33] dbplyr_1.4.4              limma_3.44.3             
## [35] rlang_0.4.8               rstudioapi_0.11          
## [37] RSQLite_2.2.1             generics_0.0.2           
## [39] jsonlite_1.7.1            BiocParallel_1.22.0      
## [41] RCurl_1.98-1.2            magrittr_1.5             
## [43] GenomeInfoDbData_1.2.3    futile.logger_1.4.3      
## [45] Matrix_1.2-18             Rcpp_1.0.5               
## [47] fansi_0.4.1               lifecycle_0.2.0          
## [49] stringi_1.5.3             yaml_2.2.1               
## [51] RaggedExperiment_1.12.0   RJSONIO_1.3-1.4          
## [53] zlibbioc_1.34.0           BiocFileCache_1.12.1     
## [55] grid_4.0.3                blob_1.2.1               
## [57] crayon_1.3.4              lattice_0.20-41          
## [59] Biostrings_2.56.0         splines_4.0.3            
## [61] GenomicFeatures_1.40.1    hms_0.5.3                
## [63] ps_1.4.0                  knitr_1.30               
## [65] pillar_1.4.6              codetools_0.2-16         
## [67] biomaRt_2.44.4            futile.options_1.0.1     
## [69] XML_3.99-0.5              glue_1.4.2               
## [71] evaluate_0.14             lambda.r_1.2.4           
## [73] data.table_1.13.0         BiocManager_1.30.10      
## [75] vctrs_0.3.4               tidyr_1.1.2              
## [77] openssl_1.4.3             purrr_0.3.4              
## [79] assertthat_0.2.1          xfun_0.18                
## [81] survival_3.2-7            tibble_3.0.4             
## [83] RTCGAToolbox_2.18.0       GenomicAlignments_1.24.0 
## [85] AnnotationDbi_1.50.3      memoise_1.1.0            
## [87] ellipsis_0.3.1