annotate.protein_id {specL} | R Documentation |
This function assigns the protein identifier for a list of tandem mass specs having a peptide sequence assigned.
annotate.protein_id(data, file = NULL, fasta = read.fasta(file = file, as.string = TRUE, seqtype = "AA"), digestPattern = "(([RK])|(^)|(^M))")
data |
list of records containing mZ and peptide sequences. |
file |
file name of a FASTA file. |
fasta |
a fasta object as returned by the |
digestPattern |
a regex pattern which can be used by the |
The protein sequences a read by the read.fasta
function
of the seqinr
package. The protein identifier is written
to the protein proteinInformation
variable.
If the function is called on a multi-core architecture it uses mclapply
.
It is recommended to load the FASTA file prior to running
annotate.protein_id
using
myFASTA <- read.fasta(file = file,
as.string = TRUE,
seqtype = "AA")
instead of providing the FASTA file name to the function.
it returns a list object.
Jonas Grossmann and Christian Panse, 2014
?read.fasta
of the seqinr
package.
http://www.uniprot.org/help/fasta-headers
# annotate.protein_id # our Fasta sequence irtFASTAseq <- paste(">zz|ZZ_FGCZCont0260|", "iRT_Protein_with_AAAAK_spacers concatenated Biognosys\n", "LGGNEQVTRAAAAKGAGSSEPVTGLDAKAAAAKVEATFGVDESNAKAAAAKYILAGVENS", "KAAAAKTPVISGGPYEYRAAAAKTPVITGAPYEYRAAAAKDGLDAASYYAPVRAAAAKAD", "VTPADFSEWSKAAAAKGTFIIDPGGVIRAAAAKGTFIIDPAAVIRAAAAKLFLQFGAQGS", "PFLK\n") # be realistic, do it from file Tfile <- file(); cat(irtFASTAseq, file = Tfile); #use read.fasta from seqinr fasta.irtFASTAseq <-read.fasta(Tfile, as.string=TRUE, seqtype="AA") close(Tfile) #annotate with proteinID # -> here we find all psms from the one proteinID above peptideStd <- specL::annotate.protein_id(peptideStd, fasta=fasta.irtFASTAseq) #show indices for all PSMs where we have a proteinInformation which(unlist(lapply(peptideStd, function(x){nchar(x$proteinInformation)>0})))