importAllelicCounts {fishpond} | R Documentation |
Read in Salmon quantification of allelic counts from a
diploid transcriptome. Assumes that diploid transcripts
are marked with the following suffix: an underscore and
a consistent symbol for each of the two alleles,
e.g. ENST123_M
and ENST123_P
,
or ENST123_alt
and ENST123_ref
.
There must be exactly two alleles for each transcript,
and the --keep-duplicates
option should be used in
Salmon indexing to avoid removing transcripts with identical sequence.
The output object has half the number of transcripts,
with the two alleles either stored in a "wide"
object,
or as re-named "assays"
. Note carefully that the symbol
provided to a1
is used as the effect allele,
and a2
is used as the non-effect allele
(see the format
argument description and Value
description below).
importAllelicCounts( coldata, a1, a2, format = c("wide", "assays"), tx2gene = NULL, ... )
coldata |
a data.frame as used in |
a1 |
the symbol for the effect allele |
a2 |
the symbol for the non-effect allele |
format |
either |
tx2gene |
optional, a data.frame with first column indicating
transcripts, second column indicating genes (or any other transcript
grouping). Either this should include the |
... |
any arguments to pass to tximeta |
Requires the tximeta package.
skipMeta=TRUE
is used, as it is assumed
the diploid transcriptome does not match any reference
transcript collection. This may change in future iterations
of the function, depending on developments in upstream
software.
a SummarizedExperiment, with allele counts (and other data)
combined into a wide matrix [a2 | a1]
, or as assays (a1, then a2).
The original strings associated with a1 and a2 are stored in the
metadata of the object, in the alleles
list element.
Note the ref
level of se$allele
will be "a2"
,
such that comparisons by default will be a1 vs a2 (effect vs non-effect).