https://{{base_url}}/v1/seq-info/{{gmsc_id}}
Where {{gmsc_id}} is of the form GMSC10.100AA.xxx_xxx_xxx
or GMSC10.90AA.xxx_xxx_xxx
.
Returns
{
"id": "GMSC10.xxAA.xxx_xxx_xxxx",
"nucleotide": "ATC...",
"aminoacid": "MAV...",
"taxonomy": "s__Bacteroides_vulgatus",
"habitat": "human gut",
"quality": {
"antifam": true,
"terminal": true,
"rnacode": 0.9,
"metat": 1,
"metap": 1,
"riboseq": 0.9
}
}
Note that the quality
field is only present for 90AA sequences.
https://{{base_url}}/v1/seq-info-multi/
This is a POST
-only endpoint, expecting a JSON package consisting of a
dictonary with an entry seq_ids
, which is a list of strings (identifiers).
For example:
{
"seq_ids": [
"GMSC10.90AA.123_456_789",
"GMSC10.90AA.123_456_790",
...]
}
Returns a list of entries like the outputs of seq-info
.
https://{{base_url}}/v1/seq-filter/
POST
endpoint, with arguments:
hq_only
: boolean. optional (only active for 90AA)habitat
: str. mandatorytaxonomy
: str. optionalquality_antifam
: boolean. optionalquality_terminal
: boolean. optionalquality_rnacode
: float. optionalquality_metat
: integer. optionalquality_metap
: integer. optionalquality_riboseq
: float. optional
habitat
is treated as a comma separated list (e.g., you can use marine,freshwater
to match all the entities that are present in both marine and freshwater).
taxonomy
is a substring match so you can pass any taxonomic level (e.g., passing o__Pelagibacterales
will match d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Pelagibacterales;f__Pelagibacteraceae;g__AAA240-E13
).
Returns
{
"status":"Ok",
"results": [
{
"habitat":"marine,plant associated,sediment",
"seq_id":"GMSC10.90AA.000_013_322",
"taxonomy":"d__Bacteria"},
....
]
}
At most 1,001 entries are returned.
https://{{base_url}}/v1/cluster-info/{{gmsc_90AA_id}}
Returns the membership of the given cluster. At most 20 results are thick (meaning that metadata is also returned). For the rest, only identifiers are returned. Example output
{
"status":" Ok",
"cluster": [
{
"aminoacid":"MAAAGFLIVSFKPFEKPSRNAATTAGFSAENFEFTMIALPYSLRP",
"habitat":"soil",
"nucleotide":"ATGGCCGCGGCCGGATTCTTGATCGTGTCCTTCAAGCCTTTCGAGAAGCCTTCGAGAAACGCCGCGACGACGGCCGGCTTCTCGGCCGAGAATTTCGAGTTCACGATGATCGCGCTGCCGTACAGCTTGAGACCGTAA",
"seq_id":"GMSC10.100AA.547_444_661",
"taxonomy":"d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__VAZQ01;s__VAZQ01 sp005883115"
}, ...
]
}
NOTE. These are not recommended for public use. For large-scale analyses,
we recommend you use the
GMSC-mapper command line tool
locally. Public API endpoints will be maintained for the long-term. No such commitment
is made for endpoints marked internal
. You have been warned.
https://{{base_url}}/internal/seq-search
(POST
)
Arguments:
sequence_faa
: FASTA formatted set of sequencesis_contigs
: bool (whenTrue
, inputs are assumed to be DNA contigs)
Returns
{
"status": "message (normally 'Ok')",
"search-id": "xxxxx"
}
https://{{base_url}}/internal/seq-search/{{search_id}}
Returns
{
"search_id": "str",
"status": "str",
"results": [
{
"query_id": "query_1",
"aminoacid": "MHEDVIQFARNEVWSLV....",
"taxonomy": "s__Bacteroides_vulgatus",
"habitat": "human gut",
"hits": [
{ "id": "GMSC10.xxAA.xxx_xxx_xxxx",
"e_value": "2.1e-23",
"aminoacid": "MHEELIQFARNEV...",
"identity": "98.4"
}, ...
]
}, ...]
status
will be one of Running
(if the results are not yet ready), Done
,
or Expired
. In the case of Done
, the results
field will be filled in.
Dependencies
flask
numpy
pandas
polars
Running this (in test mode) can be done with
python -m flask run
Testing can be done with curl
:
curl http://127.0.0.1:5000/v1/seq-info/GMSC10.100AA.000_000_002
These examples assume you are running the test version on
http://127.0.0.1:5000/
. Adapt as necessary.
Searching requires using POST
and a FASTA file. For example, if you have the
file example.faa
, you can use
curl -X POST --form "sequence_faa=$(cat example.faa)" http://127.0.0.1:5000/internal/seq-search/
The output will look something like this:
{"search_id":"1-jmgi","status":"Ok"}
You can later use the given ID (in this case 1-jmgi
, but it will be different
every time the app runs) to retrieve the results:
curl http://127.0.0.1:5000/internal/seq-search/1-jgmi
Results will look like one of the following
{"search_id":"1-jmgi","status":"Running"}
{"search_id":"1-jmgi","status":"Done", results":[...]}
{"search_id":"1-jmgi","status":"Expired"}
Search ID are of the form #-xxxx
where #
is just an index counting up and
xxxx
is a random string.
Indexing is done by the make-indices.py
Jug
script. It expects FASTA and other files to be present in the gsmc-db
subdirectory.