A number of recently developed next-generation sequencing based methods (e.g. PAR-CLIP, Bisulphite sequencing, RRBS) specifically induce nucleotide substitutions within the short reads with respect to the reference genome. wavClusteR provides functions for the analysis of the data obtained by such methods with a major focus on PAR-CLIP. The package leverages on experimentally induced substitutions to identify high confidence signals (e.g. RNA-binding sites) in the data. A wavClusteR workflow consists of two steps:
The package supports multicore computing. For a detailed description of the method please refer to Sievers et al., 2012; Comoglio et al., 2015.
wavClusteR expects a sorted and indexed BAM file as input. A streamlined workflow to generate the required input from a fastq file is outlined below (line 1 is pseudo code, replace it with the aligner specific syntax).
#ALIGN:
sample.fastq -> sample.sam
#CONVERT:
samtools view -b -S sample.sam -o sample.bam
#SORT:
samtools sort sample.bam sample_sorted
#INDEXING:
samtools index sample_sorted.bam
samtools view
from SAMtoolssamtools sort
samtools index
.wavClusteR provides a chunk of a human Argonaute 2 (AGO2) PAR-CLIP data set from Kishore et al., 2011. Reads in the chunk map to the genomic interval chrX:23996166-24023263. This data set is used below to illustrate a workflow for PAR-CLIP data analysis.
A sorted and indexed BAM file can be loaded into the R session with readSortedBam. This function extracts aligned reads, sequences and the mismatch MD field, and returns a GRanges object.
library(wavClusteR)
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, append,
## as.data.frame, cbind, colnames, do.call, duplicated, eval,
## evalq, get, grep, grepl, intersect, is.unsorted, lapply,
## lengths, mapply, match, mget, order, paste, pmax, pmax.int,
## pmin, pmin.int, rank, rbind, rownames, sapply, setdiff, sort,
## table, tapply, union, unique, unsplit, which, which.max,
## which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:base':
##
## colMeans, colSums, expand.grid, rowMeans, rowSums
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Rsamtools
## Loading required package: Biostrings
## Loading required package: XVector
filename <- system.file( "extdata", "example.bam", package = "wavClusteR" )
Bam <- readSortedBam(filename = filename)
Bam
## GRanges object with 5358 ranges and 2 metadata columns:
## seqnames ranges strand | qseq
## <Rle> <IRanges> <Rle> | <DNAStringSet>
## [1] chrX [24001819, 24001844] - | CAGAGATAAA...TATTTTAAAG
## [2] chrX [24001819, 24001843] - | CAGAGATAAA...ATATTTTAAG
## [3] chrX [24001834, 24001863] - | ATATTTTAGA...ATTTTATTTA
## [4] chrX [24001836, 24001865] - | TTTTTAAAGA...TTTATTTAAA
## [5] chrX [24001841, 24001876] - | AAAGATTAAA...TTTTCTTCAT
## ... ... ... ... . ...
## [5354] chrX [24023018, 24023051] - | GTTTCACAGC...AAAAATATGT
## [5355] chrX [24023018, 24023051] - | GTTTCACAGC...AAAAATATGT
## [5356] chrX [24023019, 24023051] - | TTTCACAGCG...AAAAATATGT
## [5357] chrX [24023019, 24023051] - | TTTCACAGCG...AAAAATATGT
## [5358] chrX [24023067, 24023090] - | CAAAGGCGCG...GGTTTATTTT
## MD
## <character>
## [1] 26
## [2] 24A0
## [3] 8A21
## [4] 0A13A15
## [5] 24A11
## ... ...
## [5354] 10A23
## [5355] 10A23
## [5356] 9A23
## [5357] 9A23
## [5358] 9A14
## -------
## seqinfo: 25 sequences from an unspecified genome; no seqlengths
Prior to model fitting, genome-wide substitutions need to be identified and filtered based on a coverage threshold. The getAllSub function identifies all genomic positions exhibiting at least one substitution and coverage above this threshold.
countTable <- getAllSub( Bam, minCov = 10 )
## Loading required package: doMC
## Loading required package: foreach
## Loading required package: iterators
## Considering substitutions, n = 497, processing in 1 chunks
## chunk #: 1
## considering the + strand
## Computing local coverage at substitutions...
## considering the - strand
## Computing local coverage at substitutions...
head( countTable )
## GRanges object with 6 ranges and 3 metadata columns:
## seqnames ranges strand | substitutions coverage
## <Rle> <IRanges> <Rle> | <character> <numeric>
## [1] chrX [24001959, 24001959] - | TC 17
## [2] chrX [24001973, 24001973] - | TC 17
## [3] chrX [24001977, 24001977] - | TC 13
## [4] chrX [24002046, 24002046] - | TC 10
## [5] chrX [24002057, 24002057] - | TC 10
## [6] chrX [24002147, 24002147] - | TC 22
## count
## <integer>
## [1] 2
## [2] 12
## [3] 1
## [4] 1
## [5] 6
## [6] 3
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
The function returns a GRanges object specifying genomic position, strand, substitution type (e.g. “TC”: T in the reference genome; C in the read), strand-specific coverage and number of observed substitutions at the position.
The coverage threshold minCov
defines the genomic positions to be used for parameter estimation, thus providing a means to tune the stringency of the analysis. Currently, wavClusteR does not allow to learn this threshold from the data. However, since the model is based on relative substitution frequencies, the value of minCov
will influence the variance of the estimated parameters. Therefore, values smaller than default (minCov=10
) are not recommended.
Once all substitutions are computed, a diagnostic substitution profile can be plotted with plotSubstitutions.
plotSubstitutions( countTable, highlight = "TC" )
The function returns a barplot showing the total number of genomic positions that exhibit a given type of substitution and highlights the substitution type that is expected to be generated by the experimental procedure. The percentage of substitution of this type is also shown. This plot can be readily used to assess the quality of the sequenced libraries and can be used to compare different data sets generated under the same experimental condition.
wavClusteR uses the identified genomic positions to learn a non-parametric mixture model discriminating PAR-CLIP-specific from extrinsic transitions. Model parameters are estimated by fitMixtureModel.
model <- fitMixtureModel(countTable, substitution = "TC")
The function returns a list of:
As the small size of our example data set does not to estimate the model reliably, the mixture model for the entire AGO2 dataset has been precomputed and is provided by the package.
data(model)
str(model)
## List of 5
## $ l1: Named num 0.181
## ..- attr(*, "names")= chr "TC"
## $ l2: Named num 0.819
## ..- attr(*, "names")= chr "TC"
## $ p : num [1:999] 7.52 9.44 10.05 10.38 10.48 ...
## $ p1: num [1:999] 89.6 64.4 50.4 41.5 35.3 ...
## $ p2: num [1:999] 0 0 1.14 3.51 5 ...
The model fit can be inspected using getExpInterval.
(support <- getExpInterval( model, bayes = TRUE ) )
## $supportStart
## [1] 0.007
##
## $supportEnd
## [1] 0.98
The function returns two diagnostic plots. The first plot illustrates the estimated densities \(p\), \(p_1\) and \(p_2\), and log odds
\[ o=]
The second plot shows the resulting posterior class probability, i.e. the probability that a given relative substitution frequency (RSF, horizontal axis) has been induced by PAR-CLIP. The area under the curve corresponding to the returned RSF interval is colored, and the RSF interval indicated. By default, getExpInterval returns the RSF interval according to the Bayes classifier, i.e. a posterior probability cutoff of 0.5. However, the user can specify a custom RSF interval:
By means of the rightProb and leftProb parameters, e.g.
(support <- getExpInterval( model, bayes = FALSE, leftProb = 0.9, rightProb = 0.9 ) )
## $supportStart
## [1] 0.076
##
## $supportEnd
## [1] 0.905
By inspecting the posterior class probability density and passing the RSF interval boundaries when calling high-confidence substitutions (see point 6).
Finally, the model can be used to produce further diagnostic plots. Particularly, the total number of reads exhibitng a given substitution and an RSF-based partitioning of genomic positions with substitutions can be obtained by
plotSubstitutions( countTable, highlight = "TC", model )
The RSF support is used to filter all observed transitions to define a set of high-confidence, PAR-CLIP induced transitions. These are identified by getHighConfSub. The function returns a GRanges object with genomic position, strand, strand-specific coverage, occurence (count), and relative substitution frequency (rsf) for each high-confidence substitution. In a call to getHighConfSub, the RSF interval returned by getExpInterval can be supplied as support
argument directly
highConfSub <- getHighConfSub( countTable,
support = support,
substitution = "TC" )
head( highConfSub )
## GRanges object with 6 ranges and 3 metadata columns:
## seqnames ranges strand | coverage count
## <Rle> <IRanges> <Rle> | <numeric> <integer>
## [1] chrX [24001959, 24001959] - | 17 2
## [2] chrX [24001973, 24001973] - | 17 12
## [3] chrX [24001977, 24001977] - | 13 1
## [4] chrX [24002046, 24002046] - | 10 1
## [5] chrX [24002057, 24002057] - | 10 6
## [6] chrX [24002147, 24002147] - | 22 3
## rsf
## <numeric>
## [1] 0.117647058823529
## [2] 0.705882352941177
## [3] 0.0769230769230769
## [4] 0.1
## [5] 0.6
## [6] 0.136363636363636
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
or, alternatively, by specifying supportStart
and supportEnd
, which define the range of RSF of interest.
highConfSub <- getHighConfSub( countTable,
supportStart = 0.2,
supportEnd = 0.7,
substitution = "TC" )
head( highConfSub )
Binding sites (clusters) are identified by getClusters. The function takes high-confidence substitution sites and the coverage function
coverage <- coverage( Bam )
coverage$chrX
## integer-Rle of length 24023090 with 914 runs
## Lengths: 24001818 15 2 ... 1 15 24
## Values : 0 2 3 ... 21 0 1
as an input. From package version 2.0, cluster boundaries are resolved using the Mini-Rank Norm (MRN) Comoglio et al., 2015, which is up to 10x faster and more sensitive than the previously adopted algorithm based on continuous wavelet transform of the coverage function Sievers et al., 2012. Briefly, the MRN algorithm finds an optimal cluster boundary for each high-confidence substitution by solving an optimization problem that integrates prior knowledge on the geometry of PAR-CLIP clusters. Two options are available:
Hard thresholding, i.e. the coverage function is denoised using a global threshold. Empirically, minCov=1
worked well on all tested datasets for which minCov = 10
. Alternatively, 10% of the mode of the coverage distribution at high-confidence substitutions represents a possible choice.
clusters <- getClusters( highConfSub = highConfSub,
coverage = coverage,
sortedBam = Bam,
method = "mrn",
threshold = 1,
cores = 1 )
## Computing start/end read positions
## Number of chromosomes exhibiting high confidence transitions: 1
## ...Processing = chrX
clusters
## GRanges object with 184 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chrX [24001953, 24001975] -
## [2] chrX [24001953, 24001976] -
## [3] chrX [24001953, 24001977] -
## [4] chrX [24002044, 24002057] -
## [5] chrX [24002044, 24002057] -
## ... ... ... ...
## [180] chrX [24006559, 24006573] -
## [181] chrX [24006560, 24006565] -
## [182] chrX [24007061, 24007068] -
## [183] chrX [24007061, 24007083] -
## [184] chrX [24007061, 24007083] -
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
Local thresholding, based on a global estimation of background levels via a Gaussian mixture model. Omitting the threshold parameter in the call to getClusters enables local thresholding
clusters <- getClusters( highConfSub = highConfSub,
coverage = coverage,
sortedBam = Bam,
method = "mrn",
cores = 1 )
## Computing start/end read positions
## Learning background threshold by fitting a GMM
## Estimated threshold (% of maximum local coverage differences) from 184 sampled transitions: 5.26
## Number of chromosomes exhibiting high confidence transitions: 1
## ...Processing = chrX
clusters
## GRanges object with 184 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chrX [24001953, 24001959] -
## [2] chrX [24001963, 24001973] -
## [3] chrX [24001972, 24001977] -
## [4] chrX [24002044, 24002057] -
## [5] chrX [24002046, 24002057] -
## ... ... ... ...
## [180] chrX [24006569, 24006574] -
## [181] chrX [24006569, 24006574] -
## [182] chrX [24007062, 24007068] -
## [183] chrX [24007073, 24007076] -
## [184] chrX [24007074, 24007077] -
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
Note: the CWT-based method is maintained for reproducibility and can be called by specifying method = "cwt"
in the function call above.
Once the clusters are identified, the corresponding genomic regions can be merged in a strand-specific manner. Statistics for each merged cluster (a wavCluster), can be computed using filterClusters.
require(BSgenome.Hsapiens.UCSC.hg19)
## Loading required package: BSgenome.Hsapiens.UCSC.hg19
## Loading required package: BSgenome
## Loading required package: rtracklayer
wavclusters <- filterClusters( clusters = clusters,
highConfSub = highConfSub,
coverage = coverage,
model = model,
genome = Hsapiens,
refBase = "T",
minWidth = 12)
## Computing log odds...
## Refining cluster sizes...
## Combining clusters...
## Quantifying transitions within clusters...
## Computing statistics...
##
|
|= | 2%
|
|=== | 5%
|
|==== | 7%
|
|====== | 9%
|
|======= | 11%
|
|========= | 14%
|
|========== | 16%
|
|============ | 18%
|
|============= | 20%
|
|=============== | 23%
|
|================ | 25%
|
|================== | 27%
|
|=================== | 30%
|
|===================== | 32%
|
|====================== | 34%
|
|======================== | 36%
|
|========================= | 39%
|
|=========================== | 41%
|
|============================ | 43%
|
|============================== | 45%
|
|=============================== | 48%
|
|================================ | 50%
|
|================================== | 52%
|
|=================================== | 55%
|
|===================================== | 57%
|
|====================================== | 59%
|
|======================================== | 61%
|
|========================================= | 64%
|
|=========================================== | 66%
|
|============================================ | 68%
|
|============================================== | 70%
|
|=============================================== | 73%
|
|================================================= | 75%
|
|================================================== | 77%
|
|==================================================== | 80%
|
|===================================================== | 82%
|
|======================================================= | 84%
|
|======================================================== | 86%
|
|========================================================== | 89%
|
|=========================================================== | 91%
|
|============================================================= | 93%
|
|============================================================== | 95%
|
|================================================================ | 98%
|
|=================================================================| 100%
## Consolidating results...
wavclusters
## GRanges object with 44 ranges and 7 metadata columns:
## seqnames ranges strand | Ntransitions MeanCov
## <Rle> <IRanges> <Rle> | <integer> <numeric>
## [1] chrX [24001950, 24001980] - | 3 14.870968
## [2] chrX [24002044, 24002057] - | 2 9.642857
## [3] chrX [24002139, 24002161] - | 3 21.956522
## [4] chrX [24002323, 24002352] - | 4 9.766667
## [5] chrX [24002670, 24002687] - | 3 14.333333
## ... ... ... ... . ... ...
## [40] chrX [24005918, 24005971] - | 5 15.14815
## [41] chrX [24006171, 24006191] - | 3 11.19048
## [42] chrX [24006537, 24006554] - | 3 32.00000
## [43] chrX [24006557, 24006577] - | 3 26.80952
## [44] chrX [24007059, 24007081] - | 3 16.30435
## NbasesInRef CrossLinkEff
## <integer> <numeric>
## [1] 12 0.25
## [2] 4 0.50
## [3] 8 0.38
## [4] 11 0.36
## [5] 7 0.43
## ... ... ...
## [40] 15 0.33
## [41] 7 0.43
## [42] 5 0.60
## [43] 7 0.43
## [44] 7 0.43
## Sequence SumLogOdds
## <factor> <numeric>
## [1] CGGCTTGGGAAAAATTACCTGACAAAAATGT 7.142001
## [2] CTAGGATTATTTGA 4.972714
## [3] TTAGGTAGAAAATAACCTCTCCC 7.457388
## [4] AATATTGAAGTTATACGGTGTACTGAAAGG 10.534644
## [5] TTAAATTATGAATTCTCA 7.795383
## ... ... ...
## [40] CAATGTTAGACCAATGGCTTTGATAGTAAAACTGGTGAGGTTTTTCTTTATATG 12.922715
## [41] AGGATGGAATCGCTGTAATGA 7.468965
## [42] GTGGAAGATGAGGTGATT 7.632600
## [43] ACCGGTGATAAACAATTTGTT 7.457649
## [44] TGCTGGTGAACATTCTGAAAGTA 8.298195
## RelLogOdds
## <numeric>
## [1] 0.5951667
## [2] 1.2431784
## [3] 0.9321735
## [4] 0.9576949
## [5] 1.1136261
## ... ...
## [40] 0.8615143
## [41] 1.0669950
## [42] 1.5265200
## [43] 1.0653784
## [44] 1.1854564
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
The function takes as input the following elements:
and it returns a GRanges object with the following metadata:
refBase
(NbasesInRef)The relative log odds can be used to rank clusters according to statistical confidence.
wavClusteR provides a number of functions (summarized in the table below) for post-processing of the identified binding sites.
Task | Function | Output format |
---|---|---|
Export all identified substitutions or high-confidence substitutions | exportHighConfSub | BED |
Export clusters | exportClusters | BED |
Export coverage function | exportCoverage | BigWig |
Visualize the size distribution of wavClusters | plotSizeDistribution | histogram |
Annotate clusters with respect to genomic features (e.g. CDS, introns, 3’-UTRs, 5’-UTRs) in a strand-specific manner | annotateClusters | dot chart, vector |
Compute metagene profiles of wavClusters, where the density of wavClusters is represented as a function of a reference genomic coordinates | getMetaGene | line plot, vector |
Compute metaTSS profiles based on all aligned reads in the input BAM file | line plot, vector | |
Visualize wavClusteR statistics and meta data to learn pairwise relationships between variables | plotStatistics | pairs plot |
High-confidence substitutions can be exported with
exportHighConfSub( highConfSub = highConfSub,
filename = "hcTC.bed",
trackname = "hcTC",
description = "hcTC" )
where trackname
and description
set the corresponding attributes in the UCSC BED file. By replacing highConfSub
with another set of substitutions (e.g. all identified substitutions of a given type), those can be exported likewise.
wavClusters can be exported with
exportClusters( clusters = wavclusters,
filename = "wavClusters.bed",
trackname = "wavClusters",
description = "wavClusters" )
and the coverage function can be exported with
exportCoverage( coverage = coverage, filename = "coverage.bigWig" )
wavClusters can be annotated with respect to genomic features using annotateClusters. This function generates a strand-specific dot chart representing wavClusters annotation. annotateClusters takes as an input the wavClusters and a transcriptDB object containing transcript annotations. The latter can be generated using makeTxDbFromUCSC (GenomicFeatures package)
txDB <- makeTxDbFromUCSC(genome = "hg19", tablename = "ensGene")
and is automatically downloaded by annotateClusters if missing.
Then, the annotateClusters can be called as follows
annotateClusters( clusters = wavclusters,
txDB = txDB,
plot = TRUE,
verbose = TRUE)
Four dot charts are returned by the function showing the percentage of clusters mapping to different transcript features localized on the same (sense) or on the opposite (antisense) strand, the relative sequence length of different compartments relative to the total transcriptome length and the normalized enrichment of clusters across functional compartments.
Note: multiple hits, i.e. wavClusters that overlap with more than one genomic feature, are reported as “multiple”, whereas wavClusters that map outside of the considered features are labeled as “other”. The latter are then annotated with respect to features on the antisense strand.
A graphical representation of the density of wavClusters as a function of a binning of genomic coordinates across all annotated genes can be obtained using the getMetaGene function.
getMetaGene( clusters = wavclusters,
txDB = txDB,
upstream = 1e3,
downstream = 1e3,
nBins = 40,
nBinsUD = 10,
minLength = 1,
plot = TRUE,
verbose = TRUE )
In this example, genes were divided in 40 bins (nBins
) and an upstream
/downstream
region spanning 1kb was considered. This region was subdivided in 10 bins (nBinsUD
). No restriction on gene length was applied (minLength
). getMetaGene returns a numeric vector of length nBins
+ 2*nBinsUD
with normalized counts, which can be used, for instance, to compare the distribution of wavClusters across several PAR-CLIP samples.
In addition to metagene profiles, meta transcription start site (TSS) profiles based on all mapped reads can be generated using getMetaTSS.
getMetaTSS( sortedBam = Bam,
txDB = txDB,
upstream = 1e3,
downstream = 1e3,
nBins = 40,
unique = FALSE,
plot = TRUE,
verbose = TRUE )
Here, the upstream
and downstream
parameters control the width of the window to be considered, and nBins
controls the resolution of the profile. If unique=TRUE
, then overlapping windows are discarded. getMetaTSS returns a numeric vector of length nBins
with normalized read counts.
The size distribution of wavClusters can be visualized by
plotSizeDistribution( clusters = wavclusters, showCov = TRUE, col = "skyblue2" )
Cluster statistics can be plotted as pairs plot using
plotStatistics( clusters = wavclusters,
corMethod = "spearman",
lower = panel.smooth )
sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.42.0
## [3] rtracklayer_1.34.0 doMC_1.3.4
## [5] iterators_1.0.8 foreach_1.4.3
## [7] wavClusteR_2.8.0 Rsamtools_1.26.0
## [9] Biostrings_2.42.0 XVector_0.14.0
## [11] GenomicRanges_1.26.0 GenomeInfoDb_1.10.0
## [13] IRanges_2.8.0 S4Vectors_0.12.0
## [15] BiocGenerics_0.20.0 BiocStyle_2.2.0
##
## loaded via a namespace (and not attached):
## [1] SummarizedExperiment_1.4.0 splines_3.3.1
## [3] lattice_0.20-34 colorspace_1.2-7
## [5] htmltools_0.3.5 yaml_2.1.13
## [7] GenomicFeatures_1.26.0 chron_2.3-47
## [9] XML_3.98-1.4 survival_2.39-5
## [11] foreign_0.8-67 DBI_0.5-1
## [13] BiocParallel_1.8.0 RColorBrewer_1.1-2
## [15] splus2R_1.2-2 plyr_1.8.4
## [17] stringr_1.1.0 zlibbioc_1.20.0
## [19] munsell_0.4.3 gtable_0.2.0
## [21] codetools_0.2-15 evaluate_0.10
## [23] labeling_0.3 latticeExtra_0.6-28
## [25] Biobase_2.34.0 knitr_1.14
## [27] biomaRt_2.30.0 wmtsa_2.0-2
## [29] AnnotationDbi_1.36.0 Rcpp_0.12.7
## [31] acepack_1.3-3.3 scales_0.4.0
## [33] formatR_1.4 Hmisc_3.17-4
## [35] gridExtra_2.2.1 ggplot2_2.1.0
## [37] digest_0.6.10 stringi_1.1.2
## [39] ade4_1.7-4 grid_3.3.1
## [41] tools_3.3.1 bitops_1.0-6
## [43] magrittr_1.5 ifultools_2.0-4
## [45] RCurl_1.95-4.8 tibble_1.2
## [47] RSQLite_1.0.0 Formula_1.2-1
## [49] cluster_2.0.5 seqinr_3.3-3
## [51] MASS_7.3-45 Matrix_1.2-7.1
## [53] data.table_1.9.6 assertthat_0.1
## [55] rmarkdown_1.1 mclust_5.2
## [57] rpart_4.1-10 compiler_3.3.1
## [59] GenomicAlignments_1.10.0 nnet_7.3-12
Comoglio, F., Sievers, C. & Paro, R. (2015) Sensitive and highly resolved inidentification of RNA-protein interaction sites in PAR-CLIP data. BMC Bioinformatics, 16, 32
Langmead,B., Trapnell,C., Pop,M. & Salzberg,S.L. (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25
Kishore, S. et al. (2011) A quantitative analysis of CLIP methods for identifying binding sites of RNA-binding proteins. Nature Methods 8(7), 559-564
Sievers, C., Schlumpf, T., Sawarkar, R., Comoglio, F. & Paro, R. (2012) Mixture models and wavelet transforms reveal high confidence RNA-protein interaction sites in MOV10 PAR-CLIP data. Nucleic Acids Res 40(2), e160