Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Taxonomy suggestions API v3 for packaging shapes and materials #8008

Merged
merged 65 commits into from
Jan 25, 2023

Conversation

stephanegigandet
Copy link
Contributor

@stephanegigandet stephanegigandet commented Jan 18, 2023

Issue: #8002

  • refactor existing /cgi/suggest.pl feature to also make it available as a v3 API through the /api/v3/taxonomy_suggestions path
  • compute a smaller packaging stats structure for popular country/category, country/category/shape, country/category/shape/material
  • added a data-default directory with the resulting packaging stats file, so that we can run tests even if we have no products
  • extend the v3 API to order packaging shapes and materials suggestions by popularity for similar products (in progress)
  • add tests for the new v3 API
  • add OpenAPI documentation for the new v3 API
  • use the new API for existing tags suggestions (e.g. categories)
  • use the new API for packaging shapes and materials, using the country / categories / packaging shape (for materials) as input to get better suggestions

To test the API, it's currently deployed on the .dev server:
https://world.openfoodfacts.dev/api/v3/taxonomy_suggestions?tagtype=packaging_materials&shape=box&category=snacks

Suggestions in web edit form:

Product with no category:

image

When there is a filter (string typed by the user), the results are ordered with entries that start with the user string, then entries that have a word starting by the user string, and then entries that contain the user string inside a word:

image

Selected a bottle, suggesting materials used for bottles:

image

Product in the yogurts category:

image

Yogurts + pot:

image

@stephanegigandet stephanegigandet requested a review from a team as a code owner January 18, 2023 16:48
@github-actions github-actions bot added API Issues related to the Open Food Facts API. More specific labels exist & should be used (API WRITE…) ✏️ Editing - Auto Suggest Providing autosuggest for taxonomized fields. Mostly used in editing scenarii Display GitHub Actions Pull requests that update Github_actions code multilingual products 📦 Packaging https://wiki.openfoodfacts.org/Category:Recycling 🧪 tests labels Jan 18, 2023
@stephanegigandet
Copy link
Contributor Author

Note: I added some tests for getting suggestions that match synonyms and xx: entries. Those don't return good results (or no results at all) right now, it will be improved in a future PR.

@alexgarel
Copy link
Member

alexgarel commented Jan 24, 2023

One question: why do we continue to prefer to store stats in a "sto" rather than a json file ? (which could be also reused and served as a static). I don't really see the advantage of sto vs json. Particularly for something we load once in memory.

As a json it's easier to expose to other programs.

@stephanegigandet
Copy link
Contributor Author

As a json it's easier to expose to other programs.

Agreed, we can switch to JSON for this kind of files that we load once. We probably could do it for taxonomies too.

Copy link
Member

@alexgarel alexgarel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a big work !

I would really like the suggestion functions not to use any HTTP reference.

Also when I see this, (with all current limitations) I can't stand thinking would'nt it be better to tackle this problem using a specific mongodb collection ? (or elasticsearch ?).

docs/reference/api-v3.yml Outdated Show resolved Hide resolved
If a string is passed, an additional sort is done to put first suggestions that start with the string, followed by suggestions with a word that start with the string, and then suggestions that contain the string anywhere.
parameters:
- schema:
type: string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it better to use an Enum ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll create a "tagtype" schema.

docs/reference/api-v3.yml Show resolved Hide resolved
docs/reference/api-v3.yml Show resolved Hide resolved
lib/ProductOpener/API.pm Outdated Show resolved Hide resolved
Comment on lines 250 to 251
{($translations_to{$tagtype}{$a}{$search_lc} || $translations_to{$tagtype}{$a}{"xx"} || $a)
cmp($translations_to{$tagtype}{$b}{$search_lc} || $translations_to{$tagtype}{$b}{"xx"} || $b)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it a real function because it's hard to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding a cmp_taxonomy_tags_alphabetically($tagtype, $target_lc, $a, $b) function in Tags.pm

lib/ProductOpener/APITaxonomySuggestions.pm Outdated Show resolved Hide resolved
Comment on lines 410 to 441
=head2 get_taxonomy_suggestions_matching_string ($request_ref, $tagtype, $string)

Generate taxonomy suggestions matching a string.

The generation uses a brute force approach to match the input string to taxonomies.

By priority, the function returns:
- taxonomy entries that match the input string at the beginning
- taxonomy entries that contain the input string
- taxonomy entries that contain words contained in the input string

=head3 Parameters

=head4 $request_ref (input)

Reference to the request object.

=head4 tagtype - the type of tag

=head4 tags_ref - reference of an array of tags to match

[
countries_tags => ["en:france", "en:belgium"],
categories_tags => ..

]

=head4 string - string to search

=cut

sub get_popular_taxonomy_suggestions_matching_tags_and_string ($request_ref, $tagtype, $tags_ref, $string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to be removed right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

Comment on lines +44 to +49
{
test_case => 'categories-term-strawberry',
method => 'GET',
path => '/api/v3/taxonomy_suggestions?tagtype=categories&term=strawberry',
expected_status_code => 200,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do we still support term ? Then it must be documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an alias, I'm adding it to the doc.

@@ -225,6 +225,83 @@ paths:
- an object sent in the packagings field will replace any pre-existing data.
- an object sent in the field suffixed with _add (e.g. packagings_add) will be merged with any pre-existing data.
parameters: []
/api/v3/taxonomy_suggestions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be interesting to add a param where if the parameter is present, in the response we not only send back the tag value but also it's id. (eg. with_id=1).

This might be usefull for hunger-games or other tools.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it makes things a bit more complex to describe in openapi as in that case we need to return an array of objects instead of an array of strings, but I think it's possible.

@github-actions github-actions bot added the store label Jan 24, 2023
@github-actions github-actions bot added 📍🏭 Packager codes https://blog.openfoodfacts.org/en/news/discover-what-food-products-are-made-near-you-with-made-near- Tags labels Jan 24, 2023
@stephanegigandet
Copy link
Contributor Author

@alexgarel Thanks for the PR review, I think I addressed almost all points. There are a few things I will do in a next iteration (e.g. the option to return the tag id in addition to the name).

Copy link
Member

@alexgarel alexgarel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thank you so much !

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@stephanegigandet stephanegigandet merged commit 43c74d6 into main Jan 25, 2023
@stephanegigandet stephanegigandet deleted the suggest-api branch January 25, 2023 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪶 Apache We use Apache as a server to run Open Food Facts API v3 API Issues related to the Open Food Facts API. More specific labels exist & should be used (API WRITE…) Display 📚 Documentation Documentation issues improve the project for everyone. ✏️ Editing - Auto Suggest Providing autosuggest for taxonomized fields. Mostly used in editing scenarii 📍🏭 Packager codes https://blog.openfoodfacts.org/en/news/discover-what-food-products-are-made-near-you-with-made-near- 📦 Packaging https://wiki.openfoodfacts.org/Category:Recycling Tags Template::Toolkit The templating toolkit used by product opener. The starting point for HTML/JS/CSS fixes. 🧪 tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants