run_kmer_spma {transite} | R Documentation |
SPMA helps to illuminate the relationship between RBP binding evidence and the transcript sorting criterion, e.g., fold change between treatment and control samples.
run_kmer_spma( sorted_transcript_sequences, sorted_transcript_values = NULL, transcript_values_label = "transcript value", motifs = NULL, k = 6, n_bins = 40, midpoint = 0, x_value_limits = NULL, max_model_degree = 1, max_cs_permutations = 1e+07, min_cs_permutations = 5000, fg_permutations = 5000, p_adjust_method = "BH", p_combining_method = "fisher", n_cores = 1 )
sorted_transcript_sequences |
character vector of ranked sequences,
either DNA
(only containing upper case characters A, C, G, T) or RNA (A, C, G, U).
The sequences in |
sorted_transcript_values |
vector of sorted transcript values, i.e.,
the fold change or signal-to-noise ratio or any other quantity that was used
to sort the transcripts that were passed to |
transcript_values_label |
label of transcript sorting criterion
(e.g., |
motifs |
a list of motifs that is used to score the specified sequences.
If |
k |
length of k-mer, either |
n_bins |
specifies the number of bins in which the sequences will be divided, valid values are between 7 and 100 |
midpoint |
for enrichment values the midpoint should be |
x_value_limits |
sets limits of the x-value color scale (used to
harmonize color scales of different spectrum plots), see |
max_model_degree |
maximum degree of polynomial |
max_cs_permutations |
maximum number of permutations performed in Monte Carlo test for consistency score |
min_cs_permutations |
minimum number of permutations performed in Monte Carlo test for consistency score |
fg_permutations |
numer of foreground permutations |
p_adjust_method |
see |
p_combining_method |
one of the following: Fisher (1932)
( |
n_cores |
number of computing cores to use |
In order to investigate how motif targets are distributed across a spectrum of transcripts (e.g., all transcripts of a platform, ordered by fold change), Spectrum Motif Analysis visualizes the gradient of RBP binding evidence across all transcripts.
The k-mer-based approach differs from the matrix-based approach by how the sequences are scored. Here, sequences are broken into k-mers, i.e., oligonucleotide sequences of k bases. And only statistically significantly enriched or depleted k-mers are then used to calculate a score for each RNA-binding protein, which quantifies its target overrepresentation.
A list with the following components:
foreground_scores | the result of run_kmer_tsma
for the binned data |
spectrum_info_df | a data frame with the SPMA results |
spectrum_plots | a list of spectrum plots, as generated by
score_spectrum |
classifier_scores | a list of classifier scores, as returned by
classify_spectrum
|
Other SPMA functions:
classify_spectrum()
,
run_matrix_spma()
,
score_spectrum()
,
subdivide_data()
Other k-mer functions:
calculate_kmer_enrichment()
,
check_kmers()
,
compute_kmer_enrichment()
,
count_homopolymer_corrected_kmers()
,
draw_volcano_plot()
,
estimate_significance_core()
,
estimate_significance()
,
generate_kmers()
,
generate_permuted_enrichments()
,
run_kmer_tsma()
# example data set background_df <- transite:::ge$background_df # sort sequences by signal-to-noise ratio background_df <- dplyr::arrange(background_df, value) # character vector of named and ranked (by signal-to-noise ratio) sequences background_seqs <- gsub("T", "U", background_df$seq) names(background_seqs) <- paste0(background_df$refseq, "|", background_df$seq_type) results <- run_kmer_spma(background_seqs, sorted_transcript_values = background_df$value, transcript_values_label = "signal-to-noise ratio", motifs = get_motif_by_id("M178_0.6"), n_bins = 20, fg_permutations = 10) ## Not run: results <- run_kmer_spma(background_seqs, sorted_transcript_values = background_df$value, transcript_values_label = "signal-to-noise ratio") ## End(Not run)