The rhinotypeR
package is designed to simplify the
genotyping of rhinoviruses using the VP4/2 genomic region. Having worked
on rhinoviruses for a few years, I noticed that assigning genotypes
after sequencing was particularly laborious, and needed several manual
interventions. We, therefore, developed this package to address this
challenge by streamlining the process by enabling a user to download
prototype sequences, calculate genetic pairwise distances, and compare
the distances to prototype strains for genotype assignment. It also
provides visualization options such as frequency plots and simple
phylogenetic trees.
You can install rhinotypeR from BioConductor using
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("rhinotypeR")
library(rhinotypeR)
The getPrototypeSeqs
function downloads the prototype
sequences required for genotyping. These should the be combined with the
newly generated sequences, aligned using a suitable software, and
imported into R. For example, to download to the Desktop directory, one
can run:
getPrototypeSeqs("~/Desktop")
Use the Biostrings package to read FASTA files containing sequence data. This extracts the sequence data and headers information and should be stored into an object for downstream analysis.
sequences <- Biostrings::readDNAStringSet(system.file("extdata", "input_aln.fasta", package="rhinotypeR"))
The SNPeek
function visualizes single nucleotide
polymorphisms (SNPs) in the sequences, with a select sequence acting as
the reference. To specify the reference sequences, move it to the bottom
of the alignment before importing into R. Substitutions are color-coded
by the nucleotide i.e.,
A = green
T = red
C = blue
G = yellow
SNPeek(sequences)
The pairwiseDistances
function calculates genetic
distances between sequences, using a specified evolutionary model.
distances <- pairwiseDistances(sequences, model = "p-distance", gapDeletion = TRUE)
The distance matrix looks like:
## AF343653.1_B26 MT177836.1 MT177837.1 AY040242.1_B97
## AF343653.1_B26 0.0000000 0.2435897 0.2435897 0.2243590
## MT177836.1 0.2435897 0.0000000 0.0000000 0.1185897
## MT177837.1 0.2435897 0.0000000 0.0000000 0.1185897
## AY040242.1_B97 0.2243590 0.1185897 0.1185897 0.0000000
## AF343654.1_B27 0.2147436 0.1698718 0.1698718 0.1794872
## AF343654.1_B27 AY040239.1_B93 AY040240.1_B84
## AF343653.1_B26 0.2147436 0.2435897 0.2115385
## MT177836.1 0.1698718 0.1506410 0.1923077
## MT177837.1 0.1698718 0.1506410 0.1923077
## AY040242.1_B97 0.1794872 0.1634615 0.2083333
## AF343654.1_B27 0.0000000 0.1185897 0.1891026
The assignTypes function assigns genotypes to the sequences by comparing genetic distances to prototype strains.
genotypes <- assignTypes(sequences, model = "p-distance", gapDeletion = TRUE, threshold = 0.105)
head(genotypes)
## query assignedType distance reference
## MT177836.1 MT177836.1 unassigned NA AY040242.1_B97
## MT177837.1 MT177837.1 unassigned NA AY040242.1_B97
## MT177838.1 MT177838.1 B99 0.08974359 AF343652.1_B99
## MT177793.1 MT177793.1 B42 0.08012821 AY016404.1_B42
## MT177794.1 MT177794.1 B106 0.05769231 KP736587.1_B106
## MT177795.1 MT177795.1 B106 0.05769231 KP736587.1_B106
The plotFrequency
function visualizes the frequency of
assigned genotypes. This function uses the output of
assignTypes
as input.
plotFrequency(genotypes)
The plotDistances
function visualizes pairwise genetic
distances in a heatmap. This function uses the output of
pairwiseDistances
as input.
plotDistances(distances)
The plotTree
function plots a simple phylogenetic tree.
This function uses the output of pairwiseDistances
as
input.
# sub-sample
sampled_distances <- distances[1:30,1:30]
plotTree(sampled_distances, hang = -1, cex = 0.6, main = "A simple tree", xlab = "", ylab = "Genetic distance")
The rhinotypeR package simplifies the process of genotyping rhinoviruses and analyzing their genetic data. By automating various steps and providing visualization tools, it enhances the efficiency and accuracy of rhinovirus epidemiological studies.
sessionInfo()
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rhinotypeR_1.1.0
##
## loaded via a namespace (and not attached):
## [1] crayon_1.5.3 httr_1.4.7 cli_3.6.3
## [4] knitr_1.48 rlang_1.1.4 xfun_0.48
## [7] highr_0.11 UCSC.utils_1.3.0 jsonlite_1.8.9
## [10] S4Vectors_0.45.0 Biostrings_2.75.0 htmltools_0.5.8.1
## [13] sass_0.4.9 stats4_4.5.0 rmarkdown_2.28
## [16] evaluate_1.0.1 jquerylib_0.1.4 fastmap_1.2.0
## [19] IRanges_2.41.0 lifecycle_1.0.4 GenomeInfoDb_1.43.0
## [22] compiler_4.5.0 XVector_0.47.0 digest_0.6.37
## [25] R6_2.5.1 GenomeInfoDbData_1.2.13 bslib_0.8.0
## [28] tools_4.5.0 zlibbioc_1.53.0 BiocGenerics_0.53.0
## [31] cachem_1.1.0