This project contains useful logic but is no longer maintained. In particular, as of 2023-05-12, the docker image larsjuhljensen/tagger does not exist and the referenced dictionary URLs do not exist either.
For the latest developments, 👉 check the official STRING tagger repository
STRING tagger dockerized as a REST API.
-
Features:
- Docker image directly depends on original Docker image (larsjuhljensen/tagger)
- This new image contains the STRING dictionaries
- REST API
- Fast annotation of individual or small document-texts:
- original image expects big batches and thus ignores the initial time-consuming operation such as loading of the heavy dictionaries,
- this new image loads the dictionaries through the REST API web server only once and keeps them in memory
- Mapping of STRING ids to UniProt ids
- Supports Unicode
- Response in JSON format
- Install docker (for help, refer to this docker cheat sheet)
- Check with
docker infothe amount of runtime memory. Our version requires 4GB to run but, for example, Docker for Mac is set to use 2 GB runtime memory by default. This can be changed following these instructions.
- Clone this repository:
git clone https://github.com/juanmirocks/STRING-tagger-server.git cdto the created folder- Build the image from the Dockerfile:
docker build -t tagger . - Run the docker container:
docker run -p 5000:5000 tagger(uses STRING dictionaries)
- (optionally) run with your dictionaries:
docker run -p 5000:5000 -v ${your_dics_folder}:/app/tagger/dics tagger
- Check whether the server is running:
localhost:5000/ should display a 'Welcome' message
Supported organisms and their taxonomy ids
Proteins are tagged for these organisms:
- Homo sapiens (Human), with NCBI Taxonomy ID: 9606
- Arabidopsis thaliana (Mouse-ear cress): 3702
- Saccharomyces cerevisiae (Baker's yeast): 4932
- Mus musculus (Mouse): 10090
- Schizosaccharomyces pombe (Fission yeast): 4896
- Escherichia coli str. K-12 substr. MG1655: 511145
- Caenorhabditis elegans: 6239
- Drosophila melanogaster (Fruit fly): 7227
- Danio rerio (Zebrafish) (Brachydanio rerio): 7955
- Rattus norvegicus (Rat): 10116 (currently there is no file for conversion to uniprot Id for this organism)
Parameters used when sending a post request:
-
ids:
- -22: used to tag subcellular localization, normalized (linked) to GO ID.
- -3: used to tag organism names, normalized to TAXONOMY ID.
- taxonomy id (example
9606): used to tag proteins for the specified taxonomy id, normalized to STRING ID and UNIPROT ID.
-
autodetect:
- True:
- When text is
tp53 mouse, then it returns the STRING ID and UNIPROT ID for mouse, even if taxonomy id for mouse is not included in the ids parameter. - When text is
tp53, then it only returns the STRING ID and UNIPROT ID for the specified taxonomy ids in the ids parameter.
- When text is
- False:
- When text is
tp53 mouse, then it returns the STRING ID and UNIPROT ID only for the specified taxonomy id in the ids parameter even thoughmouseis included in the text. - When text is given as
tp53, then it returns the STRING ID and UNIPROT ID only for the specified taxonomy id in the ids parameter, so it gives the same response as in the previous case.
- When text is
- True:
-
text:
- text to tag (example:
tp53 mouse)
- text to tag (example:
-
output:
- choose between (not documented):
simple(default),full,tagger-unicode,tagger-raw
- choose between (not documented):
The default parameters annotate subcellular localization and all organisms' proteins found in the text, always tagging humans' proteins:
ids=-22, 9606autodetect=True
curl -H "Content-type: application/json" -X POST http://127.0.0.1:5000/annotate -d '{"text":"Brachydanio rerio or danio rerio have aldh9a1a and ab-cb8"}'
curl -i -H "Content-Type: application/json" -X POST http://localhost:5000/annotate -d '{"ids":"-22,10090","text":"p53"}'
curl -i -H "Content-Type: application/json" -X POST http://localhost:5000/annotate -d '{"ids":"-22,9606","autodetect":"False","text":"p53 mouse tp53"}'docker run -p 5000:5000 tagger test_server.py