create_data {BANDITS} | R Documentation |
create_data
imports the equivalence classes and create a 'BANDITS_data' object.
create_data(salmon_or_kallisto, gene_to_transcript, salmon_path_to_eq_classes = NULL, kallisto_equiv_classes = NULL, kallisto_equiv_counts = NULL, kallisto_counts = NULL, eff_len, n_cores = NULL, transcripts_to_keep = NULL, max_genes_per_group = 50)
salmon_or_kallisto |
a character string indicating the input data: 'salmon' or 'kallisto'. |
gene_to_transcript |
a matrix or data.frame with a list of gene-to-transcript correspondances. The first column represents the gene id, while the second one contains the transcript id. |
salmon_path_to_eq_classes |
(for salmon input only) a vector of length equals to the number of samples: each element indicates the path to the equivalence classes of the respective sample (computed by salmon). |
kallisto_equiv_classes |
(for kallisto input only) a vector of length equals to the number of samples: each element indicates the path to the equivalence classes ('.ec' files) of the respective sample (computed by kallisto). |
kallisto_equiv_counts |
(for kallisto input only) a vector of length equals to the number of samples: each element indicates the path to the counts of the equivalence classes ('.tsv' files) of the respective sample (computed by kallisto). |
kallisto_counts |
(for kallisto input only) a matrix or data.frame, with 1 column per sample and 1 row per transcript, containing the estimated abundances for each transcript in each sample, computed by kallisto. The matrix must be unfiltered and the order or rows must be unchanged. |
eff_len |
a vector containing the effective length of transcripts; the vector names indicate the transcript ids.
Ideally, created via |
n_cores |
the number of cores to parallelize the tasks on. It is highly suggested to use at least one core per sample (default if not specificied by the user). |
transcripts_to_keep |
a vector containing the list of transcripts to keep.
Ideally, created via |
max_genes_per_group |
an integer number specifying the maximum number of genes that each group can contain. When equivalence classes contain transcripts from distinct genes, these genes are analyzed together. For computational reasons, 'max_genes_per_group' sets a limit to the number of genes that each group can contain. |
A BANDITS_data
object.
Simone Tiberi simone.tiberi@uzh.ch
eff_len_compute
, filter_transcripts
, filter_genes
, BANDITS_data
# specify the directory of the internal data: data_dir = system.file("extdata", package = "BANDITS") # load gene_to_transcript matching: data("gene_tr_id", package = "BANDITS") # Specify the directory of the transcript level estimated counts. sample_names = paste0("sample", seq_len(4)) quant_files = file.path(data_dir, "STAR-salmon", sample_names, "quant.sf") # Load the transcript level estimated counts via tximport: library(tximport) txi = tximport(files = quant_files, type = "salmon", txOut = TRUE) counts = txi$counts # Optional (recommended): transcript pre-filtering transcripts_to_keep = filter_transcripts(gene_to_transcript = gene_tr_id, transcript_counts = counts, min_transcript_proportion = 0.01, min_transcript_counts = 10, min_gene_counts = 20) # compute the Median estimated effective length for each transcript: eff_len = eff_len_compute(x_eff_len = txi$length) # specify the path to the equivalence classes: equiv_classes_files = file.path(data_dir, "STAR-salmon", sample_names, "aux_info", "eq_classes.txt") # create data from 'salmon' and filter internally lowly abundant transcripts: input_data = create_data(salmon_or_kallisto = "salmon", gene_to_transcript = gene_tr_id, salmon_path_to_eq_classes = equiv_classes_files, eff_len = eff_len, n_cores = 2, transcripts_to_keep = transcripts_to_keep) input_data # create data from 'kallisto' and filter internally lowly abundant transcripts: kallisto_equiv_classes = file.path(data_dir, "kallisto", sample_names, "pseudoalignments.ec") kallisto_equiv_counts = file.path(data_dir, "kallisto", sample_names, "pseudoalignments.tsv") input_data_2 = create_data(salmon_or_kallisto = "kallisto", gene_to_transcript = gene_tr_id, kallisto_equiv_classes = kallisto_equiv_classes, kallisto_equiv_counts = kallisto_equiv_counts, kallisto_counts = counts, eff_len = eff_len, n_cores = 2, transcripts_to_keep = transcripts_to_keep) input_data_2