msa {msa} | R Documentation |
The msa
function provides a unified interface to
the three multiple sequence alignment algorithms in this package:
‘ClustalW’, ‘ClustalOmega’, and ‘MUSCLE’.
msa(inputSeqs, method=c("ClustalW", "ClustalOmega", "Muscle"), cluster="default", gapOpening="default", gapExtension="default", maxiters="default", substitutionMatrix="default", type="default", order=c("aligned", "input"), verbose=FALSE, help=FALSE, ...)
inputSeqs |
input sequences; this argument can be a character vector,
an object of class |
method |
specifies the multiple sequence alignment to be used;
currently, |
cluster |
parameter related to sequence clustering; its
interpretation and default value depends on the method;
see |
gapOpening |
gap opening penalty; the defaults are
specific to the algorithm (see |
gapExtension |
gap extension penalty; the defaults are
specific to the algorithm (see |
maxiters |
maximum number of iterations; its
interpretation and default value depends on the method;
see |
substitutionMatrix |
substitution matrix for scoring matches and
mismatches; format and defaults depend on the algorithm;
see |
type |
type of the input sequences |
order |
how the sequences should be ordered in the output object;
if |
verbose |
if |
help |
if |
... |
all other parameters are passed on to the multiple
sequence algorithm, i.e. to one of the functions
|
msa
is a simple wrapper function that unifies the interfaces of
the three functions msaClustalW
,
msaClustalOmega
, and msaMuscle
. Which
function is called, is controlled by the method
argument.
Note that the input sequences may be reordered by the multiple
sequence alignment algorithms in order to group together similar
sequences (see also description of argument order
above).
So, if the input order should be preserved or if the input order
should be recovered later, we strongly recommend to always assign
unique names to the input sequences. As noted in the description
of the inputSeqs
argument above, all functions, msa()
,
msaClustalW
, msaClustalOmega
, and
msaMuscle
, also allow
for direct reading from FASTA files. This is mainly for the reason of
memory efficiency if the sequence data set is very large. Otherwise,
we want to encourage users to first read the sequences into the R
workspace. If sequences are read from a FASTA file
directly, the order of output sequences is completely under
the control of the respective
algorithm and does not allow for checking whether the sequences are
named uniquely in the FASTA file. The preservation of the input order
works also for sequence data read from a FASTA file, but only for
ClustalW and ClustalOmega; MUSCLE does not support this (see also
argument order
above and msaMuscle
).
Depending on the type of sequences for which it was called,
msa
returns a MsaAAMultipleAlignment
,
MsaDNAMultipleAlignment
, or
MsaRNAMultipleAlignment
object.
If called with help=TRUE
, msa
returns
an invisible NULL
.
Enrico Bonatesta and Christoph Horejs-Kainrath <msa@bioinf.jku.at>
http://www.bioinf.jku.at/software/msa
U. Bodenhofer, E. Bonatesta, C. Horejs-Kainrath, and S. Hochreiter (2015). msa: an R package for multiple sequence alignment. Bioinformatics 31(24):3997-3999. DOI: 10.1093/bioinformatics/btv494.
http://www.clustal.org/download/clustalw_help.txt
http://www.clustal.org/omega/README
http://www.drive5.com/muscle/muscle.html
Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22):4673-4680. DOI: 10.1093/nar/22.22.4673.
Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Soeding, J., Thompson, J. D., and Higgins, D. G. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7:539. DOI: 10.1038/msb.2011.75.
Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5):1792-1797. DOI: 10.1093/nar/gkh340.
Edgar, R. C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113. DOI: 10.1186/1471-2105-5-113.
msaClustalW
,
msaClustalOmega
, msaMuscle
,
msaPrettyPrint
, MsaAAMultipleAlignment
,
MsaDNAMultipleAlignment
,
MsaRNAMultipleAlignment
,
MsaMetaData
## read sequences filepath <- system.file("examples", "exampleAA.fasta", package="msa") mySeqs <- readAAStringSet(filepath) ## call unified interface msa() for default method (ClustalW) and ## default parameters msa(mySeqs) ## call ClustalOmega through unified interface msa(mySeqs, method="ClustalOmega") ## call MUSCLE through unified interface with some custom parameters msa(mySeqs, method="Muscle", gapOpening=12, gapExtension=3, maxiters=16, cluster="upgmamax", SUEFF=0.4, brenner=FALSE, order="input", verbose=FALSE)