APA sites can be detected using the PAT-Seq protocol. This protocol produces 3’-end focussed reads. We examine GSE83162. This is a time-series experiment in which two strains of yeast are released into synchronized cell cycling and observed through two cycles. Yeast are treated with \(\alpha\)-factor, which causes them to stop dividing in antici… pation of a chance to mate. When the \(\alpha\)-factor is washed away, they resume cycling.
Each gene has several APA sites, ordered from furthest upstrand to furthest downstrand. For each sample, we have a read count at each site.
For each gene:
We define the “shift” of a particular sample relative to all reads from all samples.
A “shift” score is first assigned to each site, being the proportion of all reads upstrand of the site minus the proportion of all reads downstrand of it (i.e. an average over all reads where upstrand reads are 1, downstrand reads are -1 and reads from the site itself are 0). The measurement for each sample is then an average over the site score for each read. The weight is the number of reads.
Shifts scores range from -1 to 1, and summarize whether upstrand (more negative) or downstrand (more positive) sites are being favoured. The weighted average score is zero.
The weights are the number of reads, however for a randomly chosen read we can estimate its variance based on the number of reads at each site and the site scores. (This estimate of variance includes any biological signal, so it’s not exactly a residual variance.) This is stored in the rowData()
of the weitrix, and can be used to further calibrate weights. We prefer to defer this sort of calibration until after we’ve discoverd components of variation, as it tends to give high weight to genes with little APA. There are clearly some alternative choices to how weighting could be performed, and we hope the weitrix package gives you basic building blocks with which you can experiment!
library(tidyverse)
library(reshape2)
library(limma)
library(topconfects)
library(weitrix)
# Produce consistent results
set.seed(12345)
# BiocParallel supports multiple backends.
# If the default hangs or errors, try others.
BiocParallel::register( BiocParallel::SnowParam() )
# The most reliable backed is to use serial processing
#BiocParallel::register( BiocParallel::SerialParam() )
peaks <- system.file("GSE83162", "peaks.csv.gz", package="weitrix") %>%
read_csv()
counts <- system.file("GSE83162", "peak_count.csv.gz", package="weitrix") %>%
read_csv() %>%
column_to_rownames("name") %>%
as.matrix()
genes <- system.file("GSE83162", "genes.csv.gz", package="weitrix") %>%
read_csv() %>%
column_to_rownames("name")
samples <- data.frame(sample=I(colnames(counts))) %>%
extract(sample, c("strain","time"), c("(.+)-(.+)"), remove=FALSE) %>%
mutate(
strain=factor(strain,unique(strain)),
time=factor(time,unique(time)))
rownames(samples) <- samples$sample
groups <- dplyr::select(peaks, group=gene_name, name=name)
# Note the order of this data frame is important
## sample strain time
## WT-tpre WT-tpre WT tpre
## WT-t0m WT-t0m WT t0m
## WT-t15m WT-t15m WT t15m
## WT-t30m WT-t30m WT t30m
## WT-t45m WT-t45m WT t45m
## WT-t60m WT-t60m WT t60m
## WT-t75m WT-t75m WT t75m
## WT-t90m WT-t90m WT t90m
## WT-t105m WT-t105m WT t105m
## WT-t120m WT-t120m WT t120m
## DeltaSet1-tpre DeltaSet1-tpre DeltaSet1 tpre
## DeltaSet1-t0m DeltaSet1-t0m DeltaSet1 t0m
## DeltaSet1-t15m DeltaSet1-t15m DeltaSet1 t15m
## DeltaSet1-t30m DeltaSet1-t30m DeltaSet1 t30m
## DeltaSet1-t45m DeltaSet1-t45m DeltaSet1 t45m
## DeltaSet1-t60m DeltaSet1-t60m DeltaSet1 t60m
## DeltaSet1-t75m DeltaSet1-t75m DeltaSet1 t75m
## DeltaSet1-t90m DeltaSet1-t90m DeltaSet1 t90m
## DeltaSet1-t105m DeltaSet1-t105m DeltaSet1 t105m
## DeltaSet1-t120m DeltaSet1-t120m DeltaSet1 t120m
## # A tibble: 10 x 2
## group name
## <chr> <chr>
## 1 YHL048W peak3910
## 2 YHL048W peak3911
## 3 YHL048W peak3912
## 4 YHL047C peak4139
## 5 YHL047C peak4140
## 6 YHL047C peak4141
## 7 YOL164W peak8533
## 8 YOL164W peak8534
## 9 YKL220C peak5235
## 10 YKL220C peak5236
## WT-tpre WT-t0m WT-t15m WT-t30m WT-t45m
## peak3910 58 24 46 54 41
## peak3911 25 9 24 28 18
## peak3912 7 9 10 13 18
## peak4139 30 4 7 12 13
## peak4140 8 1 1 8 2
## peak4141 55 7 13 19 20
## peak8533 37 35 42 64 47
## peak8534 38 50 44 53 36
## peak5235 16 4 2 7 4
## peak5236 14 5 2 5 4
A “shift” weitrix is constructed based on a matrix of site-level counts, plus a data frame grouping sites into genes. The order of this data frame is important, earlier sites are considered upstrand of later sites.
## Calculating shifts in 1 blocks
colData(wei) <- cbind(colData(wei), samples)
rowData(wei) <- cbind(rowData(wei), genes[match(rownames(wei), rownames(genes)),])
Having obtained a weitrix, everthing discussed for the poly(A) tail length example is applicable here as well. We will only perform a brief exploratory analysis here.
We can look for components of variation.
Pushing a point somewhat, we examine four components.
comp <- comp_seq[[4]]
matrix_long(comp$col, row_info=samples, varnames=c("sample","component")) %>%
ggplot(aes(x=time, y=value, color=strain, group=strain)) +
geom_hline(yintercept=0) +
geom_line() +
geom_point(alpha=0.75, size=3) +
facet_grid(component ~ .) +
labs(title="Sample scores for each component", y="Sample score", x="Time", color="Strain")
A weitrix created with counts_shift
has a built-in default trend formula, so we don’t need to give a formula explicitly to weitrix_calibrate_trend
.
##
## Call:
## lm(formula = log(dispersion_before) ~ log(per_read_var) + splines::ns(log(total_reads),
## 3), data = data)
##
## Coefficients:
## (Intercept) log(per_read_var) splines::ns(log(total_reads), 3)1 splines::ns(log(total_reads), 3)2 splines::ns(log(total_reads), 3)3
## 0.6608 1.2922 1.2574 3.4292 5.2988
The calibration is based on two predictors of dispersion, the total number of reads and the estimated per read variance. To illustrate the calibration, we facet by bins of the per-read variance and show total reads on the x axis.
rowData(cal_comp) %>% as.data.frame() %>%
ggplot(aes(x=total_reads, y=dispersion_before)) +
facet_wrap(~ cut(per_read_var, 9)) +
geom_point(size=0.1) +
geom_point(aes(y=dispersion_trend), color="red", size=0.1) +
scale_x_log10() + scale_y_log10() +
labs(y="Dispersion (log scale)", x="Total reads (log scale)")
Treat these results with caution. Confindence bounds take into account uncertainty in the loadings but not in the scores! What follows is best regarded as exploratory rather than a final result.
## $table
## rank index confect effect se df fdr_zero AveExpr name per_read_var total_reads symbol biotype product dispersion_before dispersion_trend dispersion_after
## 1 1081 2.623 4.095282 0.3233005 37.09804 1.453819e-12 0.0081521309 YLR058C 0.18681547 6958 SHM2 protein_coding Cytosolic serine hydroxymethyltransferase; converts serine to glycine plus 5,10 methylenetetrahydrofolate; major isoform involved in generating precursors for purine, pyrimidine, amino acid, and lipid biosynthesis [Source:SGD;Acc:S000004048] 0.4720028 0.5377023 0.8778143
## 2 1714 -2.286 -3.308600 0.2363738 37.09804 1.003059e-13 0.0034781506 YPR080W 0.20630257 72240 TEF1 protein_coding Translational elongation factor EF-1 alpha; also encoded by TEF2; functions in the binding reaction of aminoacyl-tRNA (AA-tRNA) to ribosomes; may also have a role in tRNA re-export from the nucleus; TEF1 has a paralog, TEF2, that arose from the whole genome duplication [Source:SGD;Acc:S000006284] 2.8879972 2.0845182 1.3854507
## 3 980 -2.012 -2.762481 0.1791028 37.09804 1.475130e-14 -0.0132153436 YKL096W-A 0.08003464 374176 CWP2 protein_coding Covalently linked cell wall mannoprotein; major constituent of the cell wall; plays a role in stabilizing the cell wall; involved in low pH resistance; precursor is GPI-anchored [Source:SGD;Acc:S000001956] 12.4679567 1.7402749 7.1643607
## 4 116 1.968 3.271127 0.3183936 37.09804 4.781863e-10 0.0091149690 YBR127C 0.19375593 21804 VMA2 protein_coding Subunit B of V1 peripheral membrane domain of vacuolar H+-ATPase; an electrogenic proton pump found throughout the endomembrane system; contains nucleotide binding sites; also detected in the cytoplasm; protein abundance increases in response to DNA replication stress [Source:SGD;Acc:S000000331] 1.7298560 0.9808838 1.7635688
## 5 61 1.848 2.590432 0.1847149 37.09804 1.003059e-13 0.0046972242 YBL072C 0.12048014 47370 RPS8A protein_coding Protein component of the small (40S) ribosomal subunit; homologous to mammalian ribosomal protein S8, no bacterial homolog; RPS8A has a paralog, RPS8B, that arose from the whole genome duplication [Source:SGD;Acc:S000000168] 0.7267604 0.8129659 0.8939616
## 6 1208 1.774 2.630249 0.2163273 37.09804 4.272739e-12 0.0098433471 YML026C 0.11684482 42444 RPS18B protein_coding Protein component of the small (40S) ribosomal subunit; homologous to mammalian ribosomal protein S18 and bacterial S13; RPS18B has a paralog, RPS18A, that arose from the whole genome duplication; protein abundance increases in response to DNA replication stress [Source:SGD;Acc:S000004488] 1.2202363 0.7340497 1.6623348
## 7 389 1.386 3.764979 0.6117836 37.09804 2.775519e-05 0.0005048926 YDR342C 0.16037644 2866 HXT7 protein_coding High-affinity glucose transporter; member of the major facilitator superfamily, nearly identical to Hxt6p, expressed at high basal levels relative to other HXTs, expression repressed by high glucose levels; HXT7 has a paralog, HXT4, that arose from the whole genome duplication [Source:SGD;Acc:S000002750] 0.8411184 0.3078273 2.7324361
## 8 449 1.386 1.979302 0.1538498 37.09804 1.088709e-12 0.0089272010 YDR524C-B 0.12463621 164069 <NA> protein_coding Putative protein of unknown function; YDR524C-B has a paralog, YCL048W-A, that arose from the whole genome duplication [Source:SGD;Acc:S000028739] 1.8299679 1.8017543 1.0156589
## 9 1007 -1.348 -2.295557 0.2481236 37.09804 6.438371e-09 -0.0212733375 YKL180W 0.17678312 57320 RPL17A protein_coding Ribosomal 60S subunit protein L17A; copurifies with the Dam1 complex (aka DASH complex); homologous to mammalian ribosomal protein L17 and bacterial L22; RPL17A has a paralog, RPL17B, that arose from the whole genome duplication; protein abundance increases in response to DNA replication stress [Source:SGD;Acc:S000001663] 1.3260816 1.4897439 0.8901406
## 10 157 1.253 2.284407 0.2728184 37.09804 6.227989e-08 0.0138000877 YBR249C 0.16608558 15246 ARO4 protein_coding 3-deoxy-D-arabino-heptulosonate-7-phosphate (DAHP) synthase; catalyzes the first step in aromatic amino acid biosynthesis and is feedback-inhibited by tyrosine or high concentrations of phenylalanine or tryptophan; relative distribution to the nucleus increases upon DNA replication stress [Source:SGD;Acc:S000000453] 0.3420829 0.6690268 0.5113142
## ...
## 108 of 1805 non-zero log2 fold change at FDR 0.05
## Prior df 21.1
## $table
## rank index confect effect se df fdr_zero AveExpr name per_read_var total_reads symbol biotype product dispersion_before dispersion_trend dispersion_after
## 1 1165 4.725 6.333307 0.3652031 37.09804 1.449652e-16 -0.048806064 YLR333C 0.2382249 46413 RPS25B protein_coding Protein component of the small (40S) ribosomal subunit; homologous to mammalian ribosomal protein S25, no bacterial homolog; RPS25B has a paralog, RPS25A, that arose from the whole genome duplication [Source:SGD;Acc:S000004325] 1.6642720 1.9390304 0.8583011
## 2 1058 4.725 6.300358 0.3642853 37.09804 1.449652e-16 -0.058877133 YLL024C 0.1959787 133748 SSA2 protein_coding ATP-binding protein; involved in protein folding and vacuolar import of proteins; member of heat shock protein 70 (HSP70) family; associated with the chaperonin-containing T-complex; present in the cytoplasm, vacuolar membrane and cell wall; 98% identical with paralog Ssa1p, but subtle differences between the two proteins provide functional specificity with respect to propagation of yeast [URE3] prions and vacuolar-mediated degradations of gluconeogenesis enzymes [Source:SGD;Acc:S000003947] 5.0694797 2.8426375 1.7833718
## 3 1092 4.146 4.861057 0.1706544 37.09804 1.467195e-23 -0.041241296 YLR110C 0.2070177 712909 CCW12 protein_coding Cell wall mannoprotein; plays a role in maintenance of newly synthesized areas of cell wall; localizes to periphery of small buds, septum region of larger buds, and shmoo tip; CCW12 has a paralog, YDR134C, that arose from the whole genome duplication [Source:SGD;Acc:S000004100] 3.5625640 9.2005844 0.3872106
## 4 975 -3.241 -5.050285 0.4420673 37.09804 3.752609e-11 0.033977287 YKL081W 0.2490052 104942 TEF4 protein_coding Gamma subunit of translational elongation factor eEF1B; stimulates the binding of aminoacyl-tRNA (AA-tRNA) to ribosomes by releasing eEF1A (Tef1p/Tef2p) from the ribosomal complex [Source:SGD;Acc:S000001564] 10.5608612 3.3323546 3.1691888
## 5 118 -3.144 -5.408491 0.5637299 37.09804 4.105049e-09 0.044535051 YBR135W 0.2460829 3619 CKS1 protein_coding Cyclin-dependent protein kinase regulatory subunit and adaptor; interacts with Cdc28p(Cdk1p); required for G1/S and G2/M phase transitions and budding; mediates the phosphorylation and degradation of Sic1p; modulates proteolysis of M-phase targets through interactions with the proteasome; role in transcriptional regulation, recruiting proteasomal subunits to target gene promoters [Source:SGD;Acc:S000000339] 0.3506411 0.5846784 0.5997162
## 6 163 2.226 5.572505 0.8534124 37.09804 7.702222e-06 -0.033413526 YBR269C 0.2496553 2289 SDH8 protein_coding Protein required for assembly of succinate dehydrogenase; interacts with flavinylated Sdh1p and may function as a chaperone for free Sdh1p, protecting its FAD cofactor from redox reactions before assembly of the complex; soluble protein of the mitochondrial matrix; respiratory defect of null mutant is functionally complemented by Drosophila and human orthologs [Source:SGD;Acc:S000000473] 0.8467677 0.5038549 1.6805784
## 7 398 -2.226 -4.340509 0.5418674 37.09804 1.973804e-07 0.055220190 YDR363W-A 0.1454215 21193 SEM1 protein_coding Component of lid subcomplex of 26S proteasome regulatory subunit; involved in mRNA export mediated by TREX-2 complex (Sac3p-Thp1p); assumes different conformations in different contexts, functions as molecular glue stabilizing the Rpn3p/Rpn7p regulatory heterodimer, and tethers it to lid helical bundle; ortholog of human DSS1; protein abundance increases in response to DNA replication stress [Source:SGD;Acc:S000007235] 3.0518641 0.6669735 4.5756904
## 8 1357 -2.217 -3.743948 0.3957898 37.09804 4.484556e-09 0.015982241 YNL067W 0.2809500 30030 RPL9B protein_coding Ribosomal 60S subunit protein L9B; homologous to mammalian ribosomal protein L9 and bacterial L6; RPL9B has a paralog, RPL9A, that arose from a single-locus duplication [Source:SGD;Acc:S000005011] 1.6924634 1.8816916 0.8994372
## 9 1369 2.192 4.477181 0.5988064 37.09804 7.320140e-07 0.004611877 YNL098C 0.2404676 7185 RAS2 protein_coding GTP-binding protein; regulates nitrogen starvation response, sporulation, and filamentous growth; farnesylation and palmitoylation required for activity and localization to plasma membrane; homolog of mammalian Ras proto-oncogenes; RAS2 has a paralog, RAS1, that arose from the whole genome duplication [Source:SGD;Acc:S000005042] 1.2555578 0.7558133 1.6612009
## 10 1072 2.117 3.543367 0.3773086 37.09804 4.827892e-09 -0.001872867 YLR027C 0.2498340 12186 AAT2 protein_coding Cytosolic aspartate aminotransferase involved in nitrogen metabolism; localizes to peroxisomes in oleate-grown cells [Source:SGD;Acc:S000004017] 0.6146675 1.0153725 0.6053616
## ...
## 158 of 1805 non-zero log2 fold change at FDR 0.05
## Prior df 21.1
## $table
## rank index confect effect se df fdr_zero AveExpr name per_read_var total_reads symbol biotype product dispersion_before dispersion_trend dispersion_after
## 1 561 -1.597 -4.257612 0.5904463 37.09804 8.813503e-06 0.015324847 YFR052C-A 0.14797847 4967 <NA> protein_coding Dubious open reading frame; unlikely to encode a functional protein, based on available experimental and comparative sequence data; overlaps ORF HXK1/YFR053C [Source:SGD;Acc:S000028768] 2.1054260 0.3442562 6.1158699
## 2 302 1.597 3.039468 0.3334457 37.09804 9.478111e-08 -0.027501019 YDR077W 0.07946431 66406 SED1 protein_coding Major stress-induced structural GPI-cell wall glycoprotein; associates with translating ribosomes, possible role in mitochondrial genome maintenance; ORF contains two distinct variable minisatellites; SED1 has a paralog, SPI1, that arose from the whole genome duplication [Source:SGD;Acc:S000002484] 9.1438365 0.5779659 15.8207198
## 3 122 1.368 2.664296 0.3094646 37.09804 2.031366e-07 0.025164735 YBR149W 0.19813807 9498 ARA1 protein_coding NADP+ dependent arabinose dehydrogenase; involved in carbohydrate metabolism; purified as homodimer; naturally occurs with a N-terminus degradation product [Source:SGD;Acc:S000000353] 0.4830366 0.6683890 0.7226879
## 4 1763 1.314 4.357374 0.7437264 37.09804 2.179993e-04 0.029427305 snR40 0.26487599 3100 <NA> snoRNA C/D box small nucleolar RNA (snoRNA); guides 2'-O-methylation of large subunit (LSU) rRNA at position U898 and small subunit (SSU) rRNA at position G1271 [Source:SGD;Acc:S000007304] 1.0956754 0.6060495 1.8078974
## 5 1284 1.279 3.792998 0.6258023 37.09804 1.546128e-04 0.011229528 YMR194C-B 0.23747967 2270 CMC4 protein_coding Protein that localizes to the mitochondrial intermembrane space; localizes via the Mia40p-Erv1p system; contains twin cysteine-x(9)-cysteine motifs [Source:SGD;Acc:S000028514] 0.6931035 0.4710097 1.4715269
## 6 1369 -1.136 -3.020946 0.4764942 37.09804 7.778172e-05 0.004611877 YNL098C 0.24046761 7185 RAS2 protein_coding GTP-binding protein; regulates nitrogen starvation response, sporulation, and filamentous growth; farnesylation and palmitoylation required for activity and localization to plasma membrane; homolog of mammalian Ras proto-oncogenes; RAS2 has a paralog, RAS1, that arose from the whole genome duplication [Source:SGD;Acc:S000005042] 1.2555578 0.7558133 1.6612009
## 7 1342 1.001 3.085774 0.5342159 37.09804 2.507915e-04 0.063641680 YNL036W 0.24993851 5866 NCE103 protein_coding Carbonic anhydrase; metalloenzyme that catalyzes CO2 hydration to bicarbonate, which is an important metabolic substrate, and protons; not expressed under conditions of high CO2, such as inside a growing colony, but transcription is induced in response to low CO2 levels, such as on the colony surface in ambient air; poorly transcribed under aerobic conditions and at an undetectable level under anaerobic conditions; abundance increases in response to DNA replication stress [Source:SGD;Acc:S000004981] 1.8360769 0.7270536 2.5253668
## 8 999 -0.935 -3.698403 0.7165034 37.09804 1.277495e-03 -0.043991761 YKL163W 0.13916375 808 PIR3 protein_coding O-glycosylated covalently-bound cell wall protein; required for cell wall stability; expression is cell cycle regulated, peaking in M/G1 and also subject to regulation by the cell integrity pathway; coding sequence contains length polymorphisms in different strains; PIR3 has a paralog, HSP150, that arose from the whole genome duplication [Source:SGD;Acc:S000001646] 0.3999990 0.1843304 2.1700116
## 9 1798 0.790 7.105064 1.6423028 36.09804 6.675396e-03 0.174639982 tP(UGG)N1 0.23279608 507 <NA> tRNA Proline tRNA (tRNA-Pro), predicted by tRNAscan-SE analysis; target of K. lactis zymocin [Source:SGD;Acc:S000006685] 0.9546508 0.3416163 2.7945121
## 10 716 0.761 3.650191 0.7642330 37.09804 2.967594e-03 -0.023851955 YGR243W 0.23448147 1172 MPC3 protein_coding Highly conserved subunit of mitochondrial pyruvate carrier; more highly expressed in glucose-containing minimal medium than in lactate-containing medium; expression regulated by osmotic and alkaline stresses; protein abundance increases in response to DNA replication stress; MPC3 has a paralog, MPC2, that arose from the whole genome duplication [Source:SGD;Acc:S000003475] 0.4390249 0.3857570 1.1380866
## ...
## 116 of 1805 non-zero log2 fold change at FDR 0.05
## Prior df 21.1
## $table
## rank index confect effect se df fdr_zero AveExpr name per_read_var total_reads symbol biotype product dispersion_before dispersion_trend dispersion_after
## 1 807 4.340 7.173661 0.6222420 37.09804 7.194688e-11 0.008910808 YIL015W 0.2089827 4313 BAR1 protein_coding Aspartyl protease; secreted into the periplasmic space of mating type a cell; helps cells find mating partners; cleaves and inactivates alpha factor allowing cells to recover from alpha-factor-induced cell cycle arrest [Source:SGD;Acc:S000001277] 1.5614024 0.5075171 3.0765511
## 2 6 -4.144 -6.497713 0.5442612 37.09804 5.148635e-11 0.050189640 RDN5-1 0.2394686 4029 <NA> rRNA 5S ribosomal RNA (5S rRNA); only complete sequence of 6 repeated RDN5 alleles; able to support viability when provided as sole form of 5S rRNA; component of the large (60S) ribosomal subunit; localized to the nucleolus via interaction with Rpl5p; may play a role in translational frame fidelity; transcription is mediated by PolIII and activated by TFIIIA and TFIIIE [Source:SGD;Acc:S000006479] 1.1862176 0.5888347 2.0145170
## 3 1791 1.488 4.556913 0.7326588 37.09804 8.565663e-05 -0.041137359 tK(CUU)M 0.2881947 2843 <NA> tRNA Lysine tRNA (tRNA-Lys), predicted by tRNAscan-SE analysis; a small portion is imported into mitochondria via interaction with mt lysyl-tRNA synthetase Msk1p and is necessary to decode AAG codons at high temperature, when base modification of mt-encoded tRNA-Lys is reduced [Source:SGD;Acc:S000006627] 1.3640011 0.6545605 2.0838426
## 4 1148 1.447 4.435021 0.7301372 37.09804 8.565663e-05 0.014684787 YLR286C 0.2423945 1548 CTS1 protein_coding Endochitinase; required for cell separation after mitosis; transcriptional activation during the G1 phase of the cell cycle is mediated by transcription factor Ace2p [Source:SGD;Acc:S000004276] 0.6212891 0.4304991 1.4431834
## 5 1789 -1.242 -3.252752 0.5004892 37.09804 4.741936e-05 -0.048926352 tK(CUU)J 0.0900000 1200 <NA> tRNA Lysine tRNA (tRNA-Lys), predicted by tRNAscan-SE analysis; a small portion is imported into mitochondria via interaction with mt lysyl-tRNA synthetase Msk1p and is necessary to decode AAG codons at high temperature, when base modification of mt-encoded tRNA-Lys is reduced [Source:SGD;Acc:S000006625] 0.3226725 0.1124900 2.8684541
## 6 1072 -1.129 -2.413280 0.3247461 37.09804 3.377547e-06 -0.001872867 YLR027C 0.2498340 12186 AAT2 protein_coding Cytosolic aspartate aminotransferase involved in nitrogen metabolism; localizes to peroxisomes in oleate-grown cells [Source:SGD;Acc:S000004017] 0.6146675 1.0153725 0.6053616
## 7 1800 -1.100 -4.638443 0.9065992 37.09804 6.537417e-04 0.039423397 tP(UGG)O3 0.2500000 892 <NA> tRNA Proline tRNA (tRNA-Pro), predicted by tRNAscan-SE analysis; target of K. lactis zymocin [Source:SGD;Acc:S000006689] 0.5098653 0.3987189 1.2787588
## 8 1369 -1.070 -2.863295 0.4649357 37.09804 8.565663e-05 0.004611877 YNL098C 0.2404676 7185 RAS2 protein_coding GTP-binding protein; regulates nitrogen starvation response, sporulation, and filamentous growth; farnesylation and palmitoylation required for activity and localization to plasma membrane; homolog of mammalian Ras proto-oncogenes; RAS2 has a paralog, RAS1, that arose from the whole genome duplication [Source:SGD;Acc:S000005042] 1.2555578 0.7558133 1.6612009
## 9 868 -1.059 -2.861125 0.4722974 37.09804 8.565663e-05 0.010685254 YJL054W 0.1756457 1918 TIM54 protein_coding Component of the mitochondrial TIM22 complex; involved in insertion of polytopic proteins into the inner membrane [Source:SGD;Acc:S000003590] 0.2114320 0.3021140 0.6998419
## 10 1787 -0.967 -4.593567 0.9633346 37.09804 1.325257e-03 0.044953075 tH(GUG)G2 0.2459710 898 <NA> tRNA Histidine tRNA (tRNA-His), predicted by tRNAscan-SE analysis [Source:SGD;Acc:S000006596] 0.5445439 0.3908452 1.3932470
## ...
## 178 of 1805 non-zero log2 fold change at FDR 0.05
## Prior df 21.1
Let’s examine peak-level read counts for some genes we’ve identified.
examine <- function(gene_wanted, title) {
peak_names <- filter(peaks, gene_name==gene_wanted)$name
counts[peak_names,] %>% melt(value.name="reads", varnames=c("peak","sample")) %>%
left_join(samples, by="sample") %>%
ggplot(aes(x=factor(as.integer(peak)), y=reads)) +
facet_grid(strain ~ time) + geom_col() +
labs(x="Peak",y="Reads",title=title)
}
examine("YLR058C", "SHM2, from C1")