-
-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Taxonomy suggestions API v3 for packaging shapes and materials #8008
Conversation
Note: I added some tests for getting suggestions that match synonyms and xx: entries. Those don't return good results (or no results at all) right now, it will be improved in a future PR. |
One question: why do we continue to prefer to store stats in a "sto" rather than a json file ? (which could be also reused and served as a static). I don't really see the advantage of sto vs json. Particularly for something we load once in memory. As a json it's easier to expose to other programs. |
This reverts commit e2c3d46.
Agreed, we can switch to JSON for this kind of files that we load once. We probably could do it for taxonomies too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a big work !
I would really like the suggestion functions not to use any HTTP reference.
Also when I see this, (with all current limitations) I can't stand thinking would'nt it be better to tackle this problem using a specific mongodb collection ? (or elasticsearch ?).
docs/reference/api-v3.yml
Outdated
If a string is passed, an additional sort is done to put first suggestions that start with the string, followed by suggestions with a word that start with the string, and then suggestions that contain the string anywhere. | ||
parameters: | ||
- schema: | ||
type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it better to use an Enum ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll create a "tagtype" schema.
{($translations_to{$tagtype}{$a}{$search_lc} || $translations_to{$tagtype}{$a}{"xx"} || $a) | ||
cmp($translations_to{$tagtype}{$b}{$search_lc} || $translations_to{$tagtype}{$b}{"xx"} || $b)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make it a real function because it's hard to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding a cmp_taxonomy_tags_alphabetically($tagtype, $target_lc, $a, $b) function in Tags.pm
=head2 get_taxonomy_suggestions_matching_string ($request_ref, $tagtype, $string) | ||
|
||
Generate taxonomy suggestions matching a string. | ||
|
||
The generation uses a brute force approach to match the input string to taxonomies. | ||
|
||
By priority, the function returns: | ||
- taxonomy entries that match the input string at the beginning | ||
- taxonomy entries that contain the input string | ||
- taxonomy entries that contain words contained in the input string | ||
|
||
=head3 Parameters | ||
|
||
=head4 $request_ref (input) | ||
|
||
Reference to the request object. | ||
|
||
=head4 tagtype - the type of tag | ||
|
||
=head4 tags_ref - reference of an array of tags to match | ||
|
||
[ | ||
countries_tags => ["en:france", "en:belgium"], | ||
categories_tags => .. | ||
|
||
] | ||
|
||
=head4 string - string to search | ||
|
||
=cut | ||
|
||
sub get_popular_taxonomy_suggestions_matching_tags_and_string ($request_ref, $tagtype, $tags_ref, $string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to be removed right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
{ | ||
test_case => 'categories-term-strawberry', | ||
method => 'GET', | ||
path => '/api/v3/taxonomy_suggestions?tagtype=categories&term=strawberry', | ||
expected_status_code => 200, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do we still support term ? Then it must be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an alias, I'm adding it to the doc.
@@ -225,6 +225,83 @@ paths: | |||
- an object sent in the packagings field will replace any pre-existing data. | |||
- an object sent in the field suffixed with _add (e.g. packagings_add) will be merged with any pre-existing data. | |||
parameters: [] | |||
/api/v3/taxonomy_suggestions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be interesting to add a param where if the parameter is present, in the response we not only send back the tag value but also it's id. (eg. with_id=1).
This might be usefull for hunger-games or other tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it makes things a bit more complex to describe in openapi as in that case we need to return an array of objects instead of an array of strings, but I think it's possible.
Co-authored-by: Alex Garel <alex@garel.org>
Co-authored-by: Alex Garel <alex@garel.org>
Co-authored-by: Alex Garel <alex@garel.org>
@alexgarel Thanks for the PR review, I think I addressed almost all points. There are a few things I will do in a next iteration (e.g. the option to return the tag id in addition to the name). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thank you so much !
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
Issue: #8002
To test the API, it's currently deployed on the .dev server:
https://world.openfoodfacts.dev/api/v3/taxonomy_suggestions?tagtype=packaging_materials&shape=box&category=snacks
Suggestions in web edit form:
Product with no category:
When there is a filter (string typed by the user), the results are ordered with entries that start with the user string, then entries that have a word starting by the user string, and then entries that contain the user string inside a word:
Selected a bottle, suggesting materials used for bottles:
Product in the yogurts category:
Yogurts + pot: