DB2Seqs {DECIPHER} | R Documentation |
Exports a database containing sequences to a FASTA or FASTQ formatted file of sequence records.
DB2Seqs(file, dbFile, tblName = "Seqs", identifier = "", type = "BStringSet", limit = -1, replaceChar = NA, nameBy = "description", orderBy = "row_names", removeGaps = "none", append = FALSE, width = 80, compress = FALSE, chunkSize = 1e5, sep = "::", clause = "", verbose = TRUE)
file |
Character string giving the location where the file should be written. |
dbFile |
A SQLite connection object or a character string specifying the path to the database file. |
tblName |
Character string specifying the table in which to extract the data. |
identifier |
Optional character string used to narrow the search results to those matching a specific identifier. If "" then all identifiers are selected. |
type |
The type of |
limit |
Number of results to display. The default ( |
replaceChar |
Optional character used to replace any characters of the sequence that are not present in the |
nameBy |
Character string giving the column name(s) for identifying each sequence record. If more than one column name is provided, the information in each column is concatenated, separated by |
orderBy |
Character string giving the column name for sorting the results. Defaults to the order of entries in the database. Optionally can be followed by |
removeGaps |
Determines how gaps ("-" or "." characters) are removed in the sequences. This should be (an unambiguous abbreviation of) one of |
append |
Logical indicating whether to append the output to the existing |
width |
Integer specifying the maximum number of characters per line of sequence. Not applicable when exporting to a FASTQ formatted file. |
compress |
Logical specifying whether to compress the output file using gzip compression. |
chunkSize |
Number of sequences to write to the |
sep |
Character string providing the separator between fields in each sequence's name, by default pairs of colons (“::”). |
clause |
An optional character string to append to the query as part of a “where clause”. |
verbose |
Logical indicating whether to display status. |
Sequences are exported into either a FASTA or FASTQ file as determined by the type
of sequences. If type
is an XStringSet
then sequences are exported to FASTA format. Quality information for QualityScaledXStringSet
s are interpreted as PredQuality
scores before export to FASTQ format.
If type
is "BStringSet"
(the default) then sequences are exported to a FASTA file exactly the same as they were when imported. If type
is "DNAStringSet"
then all U's are converted to T's before export, and vise-versa if type
is "RNAStringSet"
. All remaining characters not in the XStringSet
's alphabet are converted to replaceChar
or removed if replaceChar
is ""
. Note that if replaceChar
is NA
(the default), it will result in an error when an unexpected character is found.
Writes a FASTA or FASTQ formatted file containing the sequence records in the database.
Returns the number of sequence records written to the file
.
Erik Wright eswright@pitt.edu
ES Wright (2016) "Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R". The R Journal, 8(1), 352-359.
db <- system.file("extdata", "Bacteria_175seqs.sqlite", package="DECIPHER") tf <- tempfile() DB2Seqs(tf, db, limit=10) file.show(tf) # press 'q' to exit unlink(tf)