-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add filtering of UniProtKB to API #56
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. The major comment that I have is that we should somehow inform the user that something might have failed by returning a HTTP 422 status code, together with an error message in the body of the HTTP response.
Ideally, this response reports what exactly went wrong (e.g. "Error: Invalid proteome ID xxxx provided" or "Error: Invalid NCBI taxon ID xxxx provided").
I've also tested the changes:
- The newly added
compact
option forpept2taxa
is working as accepted. - Providing a filter for
pept2filtered
is not working at this time? Maybe I'm not providing the filter in the correct way. This is the request I'm making to/mpa/pept2filtered
:
{
"peptides": [
"TTTVPWLELR",
"AGPVLMEPLMK",
"TGGLPYVGTGWYR",
"TQDATHGNSLSHR",
"VYFTADEAKAAHEAGER",
"GAALSQNDNMLQMPLLTPVADETWGVK",
"QGEGDKGTMKYEEFAK",
"MLANAEEAGYKLLDRR"
],
"filter": {
"taxa": [2]
}
}
And I get an error 422 with this error message as a result: Failed to deserialize the JSON body into the target type: no variant of enum Filter found in flattened data at line 15 column 1
.
let filter_proteins: Box<dyn UniprotFilter> = match filter { | ||
Filter::Taxa(taxa) => { | ||
Box::new(TaxaFilter::new(taxa, lineage_store)) | ||
}, | ||
Filter::Proteomes(proteomes) => { | ||
Box::new(ProteomeFilter::new(proteomes).await.unwrap()) | ||
}, | ||
Filter::Proteins(proteins) => { | ||
Box::new(ProteinFilter::new(proteins)) | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should consider adding some kind of extra error handling. What happens if a user passes a reference proteome, UniProt accession or taxon ID that doesn't exist? Ideally, we should return a HTTP 422 status code in this case: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/422
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are also gonna have to update the documentation of the API on the website. I think this is a configuration option that other people might also want to use. (I mean the addition of the compact
parameter).
impl<'a> TaxaFilter<'a> { | ||
pub fn new(taxa: HashSet<u32>, lineage_store: &'a LineageStore) -> Self { | ||
TaxaFilter { taxa, lineage_store } | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When creating a new instance of the ProteomeFilter
, an error can be thrown if something goes wrong while retrieving information about the proteome from the UniProt API. For the taxa, maybe we should consider doing the same. In this function, we should check if all given taxa IDs are known to Unipept. If not, we need to report to the user that they provided an unknown taxon ID (using an HTTP status code 422).
Add add-hoc filtering to the pept2filtered endpoint. This is a replacement for custom databases. Instead of building different indices for different databases, the API can now perform this filtering at runtime.
The endpoint allows filtering on: