compute_kmer_enrichment {transite} | R Documentation |
Compares foreground sequence set to background sequence set and computes enrichment values for each possible k-mer.
compute_kmer_enrichment( foreground_kmers, background_kmers, permutation = FALSE, chisq_p_value_threshold = 0.05, p_adjust_method = "BH" )
foreground_kmers |
k-mer counts of the foreground set
(generated by |
background_kmers |
k-mer counts of the background set
(generated by |
permutation |
if |
chisq_p_value_threshold |
threshold below which Fisher's exact test is used instead of Pearson's chi-squared test |
p_adjust_method |
see |
Usually uses Pearson's chi-squared test, but recalculates p-values
with Fisher's exact test
for Pearson's chi-squared test p-values <= chisq_p_value_threshold
.
The reason this is done is
computational efficiency. Fisher's exact tests are computationally
demanding and are only
performed in situations, where exact p-values are preferred, e.g.,
if expected < 5 or
significant p-values.
enrichment of k-mers in specified foreground sequences. A data frame with the following columns is returned:
foreground_count | foreground counts for each k-mer |
background_count | background counts for each k-mer |
enrichment | k-mer enrichment |
p_value | p-value of k-mer enrichment (either from Fisher's exact test or Pearson's chi-squared test) |
adj_p_value | multiple testing corrected p-value |
Other k-mer functions:
calculate_kmer_enrichment()
,
check_kmers()
,
count_homopolymer_corrected_kmers()
,
draw_volcano_plot()
,
estimate_significance_core()
,
estimate_significance()
,
generate_kmers()
,
generate_permuted_enrichments()
,
run_kmer_spma()
,
run_kmer_tsma()
# define simple sequence sets for foreground and background foreground_set <- c( "CAACAGCCUUAAUU", "CAGUCAAGACUCC", "CUUUGGGGAAU", "UCAUUUUAUUAAA", "AAUUGGUGUCUGGAUACUUCCCUGUACAU", "AUCAAAUUA", "AGAU", "GACACUUAAAGAUCCU", "UAGCAUUAACUUAAUG", "AUGGA", "GAAGAGUGCUCA", "AUAGAC", "AGUUC", "CCAGUAA" ) background_set <- c( "CAACAGCCUUAAUU", "CAGUCAAGACUCC", "CUUUGGGGAAU", "UCAUUUUAUUAAA", "AAUUGGUGUCUGGAUACUUCCCUGUACAU", "AUCAAAUUA", "AGAU", "GACACUUAAAGAUCCU", "UAGCAUUAACUUAAUG", "AUGGA", "GAAGAGUGCUCA", "AUAGAC", "AGUUC", "CCAGUAA", "UUAUUUA", "AUCCUUUACA", "UUUUUUU", "UUUCAUCAUU", "CCACACAC", "CUCAUUGGAG", "ACUUUGGGACA", "CAGGUCAGCA" ) foreground_kmers <- generate_kmers(foreground_set, 6) background_kmers <- generate_kmers(background_set, 6) kmer_enrichment_values <- compute_kmer_enrichment(foreground_kmers, background_kmers)