formatSTRINGPPI {cisPath} | R Documentation |
This method is used to format the PPI file which is downloaded from the STRING database.
formatSTRINGPPI(input, mappingFile, taxonId, output, minScore=700) ## S4 method for signature 'character,character,character,character' formatSTRINGPPI(input, mappingFile, taxonId, output, minScore=700)
input |
File downloaded from the STRING database (character(1)). |
mappingFile |
Identifier mapping file (character(1)). |
taxonId |
NCBI taxonomy specie identifier (character(1)). |
output |
Output file (character(1)). |
minScore |
Filter out PPI information with STRING scores less than this value. (integer(1)). |
The input file is downloaded from the STRING database (http://string-db.org/).
The URL of this file is http://string-db.org/newstring_download/protein.links.v9.1.txt.gz.
Access http://string-db.org/newstring_download/species.v9.1.txt to determine the parameter taxonId
.
Access http://string-db.org/newstring_cgi/show_download_page.pl for more details.
If you make use of this file, please cite the STRING database.
Each line of the output file contains Swiss-Prot accession numbers and gene names for two interacting proteins.
An edge value is estimated for each link between two interacting proteins.
This value is defined as max(1,log(1000-STRING_SCORE,100))
.
This may be treated as the “cost” while determining the shortest paths between proteins.
Advanced users can edit the file and change this value for each edge.
Szklarczyk,D. and et al. (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res, 39, D561-D568.
Franceschini,A. and et al. (2013) STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res, 41, D808-D815.
UniProt Consortium and others. (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 40, D71-D75.
cisPath
, getMappingFile
, formatPINAPPI
, formatSIFfile
, formatiRefIndex
, combinePPI
.
library(cisPath) # Generate the identifier mapping file input <- system.file("extdata", "uniprot_sprot_human10.dat", package="cisPath") mappingFile <- file.path(tempdir(), "mappingFile.txt") getMappingFile(input, output=mappingFile, taxonId="9606") # Format the file downloaded from STRING database output <- file.path(tempdir(), "STRINGPPI.txt") fileFromSTRING <- system.file("extdata", "protein.links.txt", package="cisPath") formatSTRINGPPI(fileFromSTRING, mappingFile, "9606", output, 700) ## Not run: if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("R.utils") library(R.utils) outputDir <- file.path(getwd(), "cisPath_test") dir.create(outputDir, showWarnings=FALSE, recursive=TRUE) # Generate the identifier mapping file fileFromUniProt <- file.path(outputDir, "uniprot_sprot_human.dat") mappingFile <- file.path(outputDir, "mappingFile.txt") getMappingFile(fileFromUniProt, output=mappingFile) # Download STRING PPI for Homo sapiens (compressed:~27M, decompressed:~213M) destfile <- file.path(outputDir, "9606.protein.links.v9.1.txt.gz") cat("Downloading...\n") download.file("http://string-db.org/newstring_download/protein.links.v9.1/9606.protein.links.v9.1.txt.gz", destfile) cat("Uncompressing...\n") gunzip(destfile, overwrite=TRUE, remove=FALSE) # Format STRING PPI fileFromSTRING <- file.path(outputDir, "9606.protein.links.v9.1.txt") STRINGPPI <- file.path(outputDir, "STRINGPPI.txt") formatSTRINGPPI(fileFromSTRING, mappingFile, "9606", output=STRINGPPI, 700) ## End(Not run)