Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filtering of UniProtKB to API #56

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from
Open

Conversation

tibvdm
Copy link
Collaborator

@tibvdm tibvdm commented Oct 4, 2024

Add add-hoc filtering to the pept2filtered endpoint. This is a replacement for custom databases. Instead of building different indices for different databases, the API can now perform this filtering at runtime.

The endpoint allows filtering on:

  • taxa and its children
  • reference proteomes
  • proteins

@tibvdm tibvdm requested a review from pverscha October 4, 2024 09:30
Copy link
Member

@pverscha pverscha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. The major comment that I have is that we should somehow inform the user that something might have failed by returning a HTTP 422 status code, together with an error message in the body of the HTTP response.

Ideally, this response reports what exactly went wrong (e.g. "Error: Invalid proteome ID xxxx provided" or "Error: Invalid NCBI taxon ID xxxx provided").

I've also tested the changes:

  1. The newly added compact option for pept2taxa is working as accepted.
  2. Providing a filter for pept2filtered is not working at this time? Maybe I'm not providing the filter in the correct way. This is the request I'm making to /mpa/pept2filtered:
{
    "peptides": [
        "TTTVPWLELR",
        "AGPVLMEPLMK",
        "TGGLPYVGTGWYR",
        "TQDATHGNSLSHR",
        "VYFTADEAKAAHEAGER",
        "GAALSQNDNMLQMPLLTPVADETWGVK",
        "QGEGDKGTMKYEEFAK",
        "MLANAEEAGYKLLDRR"
    ],
    "filter": {
        "taxa": [2]
    }
}

And I get an error 422 with this error message as a result: Failed to deserialize the JSON body into the target type: no variant of enum Filter found in flattened data at line 15 column 1.

Comment on lines +70 to +80
let filter_proteins: Box<dyn UniprotFilter> = match filter {
Filter::Taxa(taxa) => {
Box::new(TaxaFilter::new(taxa, lineage_store))
},
Filter::Proteomes(proteomes) => {
Box::new(ProteomeFilter::new(proteomes).await.unwrap())
},
Filter::Proteins(proteins) => {
Box::new(ProteinFilter::new(proteins))
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should consider adding some kind of extra error handling. What happens if a user passes a reference proteome, UniProt accession or taxon ID that doesn't exist? Ideally, we should return a HTTP 422 status code in this case: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/422

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are also gonna have to update the documentation of the API on the website. I think this is a configuration option that other people might also want to use. (I mean the addition of the compact parameter).

Comment on lines +21 to +25
impl<'a> TaxaFilter<'a> {
pub fn new(taxa: HashSet<u32>, lineage_store: &'a LineageStore) -> Self {
TaxaFilter { taxa, lineage_store }
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When creating a new instance of the ProteomeFilter, an error can be thrown if something goes wrong while retrieving information about the proteome from the UniProt API. For the taxa, maybe we should consider doing the same. In this function, we should check if all given taxa IDs are known to Unipept. If not, we need to report to the user that they provided an unknown taxon ID (using an HTTP status code 422).

@pverscha pverscha changed the title Feature/filtering Add filtering of UniProtKB to API Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants