binNdimensions {BRGenomics} | R Documentation |
Divide data along different dimensions into equally spaced bins, and summarize the datapoints that fall into any of these n-dimensional bins.
binNdimensions( dims.df, nbins = 10, use_bin_numbers = TRUE, ncores = getOption("mc.cores", 2L) ) aggregateByNdimBins( x, dims.df, nbins = 10, FUN = mean, ..., ignore.na = TRUE, drop = FALSE, empty = NA, use_bin_numbers = TRUE, ncores = getOption("mc.cores", 2L) ) densityInNdimBins( dims.df, nbins = 10, use_bin_numbers = TRUE, ncores = getOption("mc.cores", 2L) )
dims.df |
A dataframe containing one or more columns of numerical data for which bins will be generated. |
nbins |
Either a number giving the number of bins to use for all dimensions (default = 10), or a vector containing the number of bins to use for each dimension of input data given. |
use_bin_numbers |
A logical indicating if ordinal bin numbers should be
returned ( |
ncores |
Number of cores to use for computations. |
x |
The name of the dimension in |
FUN |
A function to use for aggregating data within each bin. |
... |
Additional arguments passed to |
ignore.na |
Logical indicating if |
drop |
A logical indicating if empty bin combinations should be removed
from the output. By default ( |
empty |
When |
These functions take in data along 1 or more dimensions, and for
each dimension the data is divided into evenly-sized bins from the minimum
value to the maximum value. For instance, if each row of dims.df
were a gene, the columns (the different dimensions) would be various
quantitative measures of that gene, e.g. expression level, number of exons,
length, etc. If plotted in cartesian coordinates, each gene would be a
single datapoint, and each measurement would be a separate dimension.
binNdimensions
returns the bin numbers themselves. The output
dataframe has the same dimensions as the input dims.df
, but each
input data has been replaced by its bin number (an integer). If
codeuse_bin_numbers = FALSE, the center points of the bins are returned
instead of the bin numbers.
aggregateByNdimBins
summarizes some input data x
in each
combination of bins, i.e. in each n-dimensional bin. Each row of the output
dataframe is a unique combination of the input bins (i.e. each
n-dimensional bin), and the output columns are identical to those in
dims.df
, with the addition of one or more columns containing the
aggregated data in each n-dimensional bin. If the input x
was a
vector, the column is named "value"; if the input x
was a dataframe,
the column names from x
are maintained.
densityInNdimBins
returns a dataframe just like
aggregateByNdimBins
, except the "value" column contains the number
of observations that fall into each n-dimensional bin.
A dataframe.
Mike DeBerardine
data("PROseq") # import included PROseq data data("txs_dm6_chr4") # import included transcripts #--------------------------------------------------# # find counts in promoter, early genebody, and near CPS #--------------------------------------------------# pr <- promoters(txs_dm6_chr4, 0, 100) early_gb <- genebodies(txs_dm6_chr4, 500, 1000, fix.end = "start") cps <- genebodies(txs_dm6_chr4, -500, 500, fix.start = "end") df <- data.frame(counts_pr = getCountsByRegions(PROseq, pr), counts_gb = getCountsByRegions(PROseq, early_gb), counts_cps = getCountsByRegions(PROseq, cps)) #--------------------------------------------------# # divide genes into 20 bins for each measurement #--------------------------------------------------# bin3d <- binNdimensions(df, nbins = 20, ncores = 1) length(txs_dm6_chr4) nrow(bin3d) bin3d[1:6, ] #--------------------------------------------------# # get number of genes in each bin #--------------------------------------------------# bin_counts <- densityInNdimBins(df, nbins = 20, ncores = 1) bin_counts[1:6, ] #--------------------------------------------------# # get mean cps reads in bins of promoter and genebody reads #--------------------------------------------------# bin2d_cps <- aggregateByNdimBins("counts_cps", df, nbins = 20, ncores = 1) bin2d_cps[1:6, ] subset(bin2d_cps, is.finite(counts_cps))[1:6, ] #--------------------------------------------------# # get median cps reads for those bins #--------------------------------------------------# bin2d_cps_med <- aggregateByNdimBins("counts_cps", df, nbins = 20, FUN = median, ncores = 1) bin2d_cps_med[1:6, ] subset(bin2d_cps_med, is.finite(counts_cps))[1:6, ]