summarize_numerical {annotatr} | R Documentation |
Given a GRanges
of annotated regions, summarize numerical data columns based on a grouping.
summarize_numerical( annotated_regions, by = c("annot.type", "annot.id"), over, quiet = FALSE )
annotated_regions |
The |
by |
A character vector of the columns of |
over |
A character vector of the numerical columns in |
quiet |
Print progress messages (FALSE) or not (TRUE). |
NOTE: We do not take the distinct values of seqnames
, start
, end
, annot.type
as in the other summarize_*()
functions because in the case of a region that intersected two distinct exons, using distinct_()
would destroy the information of the mean of the numerical column over one of the exons, which is not desirable.
A grouped dplyr::tbl_df
, and the count
, mean
, and sd
of the cols
by
the groupings.
### Test on a very simple bed file to demonstrate different options # Get premade CpG annotations data('annotations', package = 'annotatr') r_file = system.file('extdata', 'test_read_multiple_data_nohead.bed', package='annotatr') extraCols = c(pval = 'numeric', mu1 = 'integer', mu0 = 'integer', diff_exp = 'character') r = read_regions(con = r_file, genome = 'hg19', extraCols = extraCols, rename_score = 'coverage') a = annotate_regions( regions = r, annotations = annotations, ignore.strand = TRUE) # Testing over normal by sn1 = summarize_numerical( annotated_regions = a, by = c('annot.type', 'annot.id'), over = c('coverage', 'mu1', 'mu0'), quiet = FALSE) # Testing over a different by sn2 = summarize_numerical( annotated_regions = a, by = c('diff_exp'), over = c('coverage', 'mu1', 'mu0'))