The RaggedExperiment package provides a flexible data representation for copy number, mutation and other ragged array schema for genomic location data. It aims to provide a framework for a set of samples that have differing numbers of genomic ranges.
The RaggedExperiment
class derives from a GRangesList
representation and provides a semblance of a rectangular dataset. The row and column dimensions of the RaggedExperiment
correspond to the number of ranges in the entire dataset and the number of samples represented in the data, respectively.
source("https://bioconductor.org/biocLite.R")
BiocInstaller::biocLite("RaggedExperiment")
Loading the package:
library(RaggedExperiment)
RaggedExperiment
objectWe start with a couple of GRanges
objects, each representing an individual sample:
sample1 <- GRanges(
c(GENEA = "chr1:1-10:-", GENEB = "chr2:15-18:+"),
score = 3:4)
sample2 <- GRanges(
c(GENEC = "chr1:1-10:-", GENED = "chr2:11-18:+"),
score = 1:2)
Include column data colData
to describe the samples:
colDat <- DataFrame(id = 1:2)
GRanges
objectsragexp <- RaggedExperiment(sample1 = sample1,
sample2 = sample2,
colData = colDat)
ragexp
## class: RaggedExperiment
## dim: 4 2
## assays(1): score
## rownames(4): GENEA GENEB GENEC GENED
## colnames(2): sample1 sample2
## colData names(1): id
GRangesList
instancegrl <- GRangesList(sample1 = sample1, sample2 = sample2)
RaggedExperiment(grl, colData = colDat)
## class: RaggedExperiment
## dim: 4 2
## assays(1): score
## rownames(4): GENEA GENEB GENEC GENED
## colnames(2): sample1 sample2
## colData names(1): id
list
of GRanges
rangeList <- list(sample1 = sample1, sample2 = sample2)
RaggedExperiment(rangeList, colData = colDat)
## class: RaggedExperiment
## dim: 4 2
## assays(1): score
## rownames(4): GENEA GENEB GENEC GENED
## colnames(2): sample1 sample2
## colData names(1): id
List
of GRanges
with metadataNote: In cases where a SimpleGenomicRangesList
is provided along with accompanying metadata (accessed by mcols
), the metadata is used as the colData
for the RaggedExperiment
.
grList <- List(sample1 = sample1, sample2 = sample2)
mcols(grList) <- colDat
RaggedExperiment(grList)
## class: RaggedExperiment
## dim: 4 2
## assays(1): score
## rownames(4): GENEA GENEB GENEC GENED
## colnames(2): sample1 sample2
## colData names(1): id
rowRanges(ragexp)
## GRanges object with 4 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## GENEA chr1 [ 1, 10] -
## GENEB chr2 [15, 18] +
## GENEC chr1 [ 1, 10] -
## GENED chr2 [11, 18] +
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
dimnames(ragexp)
## [[1]]
## [1] "GENEA" "GENEB" "GENEC" "GENED"
##
## [[2]]
## [1] "sample1" "sample2"
colData
colData(ragexp)
## DataFrame with 2 rows and 1 column
## id
## <integer>
## sample1 1
## sample2 2
Subsetting a RaggedExperiment
is akin to subsetting a matrix
object. Rows correspond to genomic ranges and columns to samples or specimen. It is possible to subset using integer
, character
, and logical
indices.
The overlapsAny
and subsetByOverlaps
functionalities are available for use for RaggedExperiment
. Please see the corresponding documentation in RaggedExperiment
and GenomicRanges
.
RaggedExperiment
package provides several different functions for representing ranged data in a rectangular matrix via the *Assay
methods.
The most straightforward matrix representation of a RaggedExperiment
will return a matrix of dimensions equal to the product of the number of ranges and samples.
dim(ragexp)
## [1] 4 2
Reduce(`*`, dim(ragexp))
## [1] 8
sparseAssay(ragexp)
## sample1 sample2
## GENEA 3 NA
## GENEB 4 NA
## GENEC NA 1
## GENED NA 2
length(sparseAssay(ragexp))
## [1] 8
Samples with identical ranges are placed in the same row. Non-disjoint ranges are not collapsed.
compactAssay(ragexp)
## sample1 sample2
## chr1:1-10:- 3 1
## chr2:11-18:+ NA 2
## chr2:15-18:+ 4 NA
This function returns a matrix of disjoint ranges across all samples. Elements of the matrix are summarized by applying the simplify
functional argument to assay values of overlapping ranges.
disjoinAssay(ragexp, simplify = mean)
## sample1 sample2
## chr1:1-10:- 3 1
## chr2:11-14:+ NA 2
## chr2:15-18:+ 4 2
The qreduceAssay
function works with a query
parameter that highlights a window of ranges for the resulting matrix. The returned matrix will have dimensions length(query)
by ncol(x)
. Elements contain assay values for the i th query range and the j th sample, summarized according to the simplify
functional argument.
First we define our summary function that calculates a weighted average score per query range. Note that there are three arguments to this function. Please see the documentation ?qreduceAssay
for more details.
weightedmean <- function(scores, ranges, qranges)
sum(scores * width(ranges)) / sum(width(ranges))
A call to qreduceAssay
involves the RaggedExperiment
, the GRanges
query and the simplify
functional argument.
qreduceAssay(ragexp,
query = GRanges(c("chr1:1-10:-", "chr2:11-18:+")),
simplify = weightedmean)
## sample1 sample2
## chr1:1-10:- 3 1
## chr2:11-18:+ 4 2
The RaggedExperiment
provides a family of parallel functions for coercing to the SummarizedExperiment
class. By selecting a particular assay index (i
), a parallel assay coercion method can be achieved.
Here is the list of function names:
sparseSummarizedExperiment
compactSummarizedExperiment
disjoinSummarizedExperiment
qreduceSummarizedExperiment
See the documentation for details.
sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.5-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.5-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] RaggedExperiment_1.0.0 GenomicRanges_1.28.0 GenomeInfoDb_1.12.0
## [4] IRanges_2.10.0 S4Vectors_0.14.0 BiocGenerics_0.22.0
## [7] BiocStyle_2.4.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.10 knitr_1.15.1
## [3] XVector_0.16.0 magrittr_1.5
## [5] zlibbioc_1.22.0 lattice_0.20-35
## [7] stringr_1.2.0 tools_3.4.0
## [9] grid_3.4.0 SummarizedExperiment_1.6.0
## [11] Biobase_2.36.0 matrixStats_0.52.2
## [13] htmltools_0.3.5 yaml_2.1.14
## [15] rprojroot_1.2 digest_0.6.12
## [17] bookdown_0.3 Matrix_1.2-9
## [19] GenomeInfoDbData_0.99.0 bitops_1.0-6
## [21] RCurl_1.95-4.8 evaluate_0.10
## [23] rmarkdown_1.4 DelayedArray_0.2.0
## [25] stringi_1.1.5 compiler_3.4.0
## [27] backports_1.0.5