Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add private_api/proteins/count and private_api/proteins/filter endpoints #65

Merged
merged 7 commits into from
Mar 18, 2025

Conversation

pverscha
Copy link
Member

@pverscha pverscha commented Mar 17, 2025

This PR implements two new private_api endpoints for the proteins, and makes changes to the output and input of the existing private_api/proteins endpoint.

private_api/proteins

The existing implementation has been replaced. If all proteins that match a given peptide are required, you should use api/v2/pept2prot instead of private_api/proteins. The new implementation of this endpoint requires a JSON-object with one key accessions. This is a list of UniProt accession IDs for which the name, db_type and taxon will be returned.

Example

Input

{
    "accessions": [
        "Q12031",
        "B2FLM6"
    ]
}

Output

[
    {
        "uniprot_accession_id": 357,
        "name": "Pimeloyl-[acyl-carrier protein] methyl ester esterase",
        "taxon_id": 522373,
        "db_type": "swissprot"
    },
    {
        "uniprot_accession_id": 49,
        "name": "Mitochondrial 2-methylisocitrate lyase ICL2",
        "taxon_id": 559292,
        "db_type": "swissprot"
    }
]

private_api/proteins/count

A new endpoint that accepts one filter argument (string). It will then return the amount of UniProtKB entries in the Unipept database that match the filter. This endpoint counts UniProt entries where either:

  • Entry name contains the filter string (case-sensitive)
  • UniProt accession number contains the filter string
  • Database type contains the filter string
  • Taxon ID exactly matches filter string if it can be parsed as an integer

NOTE: This endpoint will return either the correct count if less than 100 000 entries match, or 100 000 if it's more.

Example

Input

{
    "filter": "methyl"
}

Output

{
    "count": 25526
}

private_api/proteins/filter

A new endpoint that accepts a filter argument (string) and then information required to perform correct pagination of the returned results (start: required, end: required, sort_by: optional and sort_descending: optional). It will return all UniProt accession IDs of the UniProtKB entries in the Unipept database that match the filter and are in between the given bounds. Endpoint returns accession IDs where either:

  • Entry name contains the filter string (case-sensitive)
  • UniProt accession number contains the filter string
  • Database type contains the filter string
  • Taxon ID exactly matches filter string if it can be parsed as an integer

Example

Input

{
    "filter": "methyl",
    "start": 0,
    "end": 10,
    "sort_by": "uniprot_accession_id",
    "sort_descending": true
}
[
    "Q12031",
    "B2FLM6",
    "B5QYY9",
    "P54839",
    "B0R515",
    "Q39KT9",
    "B5BFM0",
    "C0PY28",
    "B6YS43",
    "C3LFJ0"
]

@pverscha pverscha requested a review from tibvdm March 17, 2025 10:00
@pverscha pverscha added the enhancement New feature or request label Mar 17, 2025
@pverscha pverscha merged commit 559d019 into develop Mar 18, 2025
1 check passed
@pverscha pverscha deleted the feature/proteins-filter-endpoint branch March 18, 2025 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants