Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Initial support for specific ingredients parsing #6243

Merged
merged 9 commits into from
Jan 6, 2022

Conversation

stephanegigandet
Copy link
Contributor

@stephanegigandet stephanegigandet commented Jan 4, 2022

This is in particular to resolve Nutri-Score computation errors when we fail to parse the % of fruits/vegetables or milk content indicated at the end of the ingredients list.

The resulting specific_ingredients structure is now used when computing the Nutri-Score.

It is also potentially useful to later parse other types of mentions like "Origin of something: some country" at the end of ingredients list.

Part of

@stephanegigandet stephanegigandet added 🥗 Ingredients 🚦Nutri-Score https://world.openfoodfacts.org/nutriscore 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis labels Jan 4, 2022
@stephanegigandet stephanegigandet requested a review from a team as a code owner January 4, 2022 14:14
@stephanegigandet stephanegigandet changed the title feat: Initial support for specific ingredients parsing (wip) feat: Initial support for specific ingredients parsing Jan 4, 2022
Copy link
Member

@alexgarel alexgarel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, still I need to test.

lib/ProductOpener/Ingredients.pm Show resolved Hide resolved
lib/ProductOpener/Ingredients.pm Show resolved Hide resolved
# We might have an ingredient specified multiple times (e.g. once for percent, another for origins or labels)
defined $product_ref->{specific_ingredients}{$ingredient_id} or $product_ref->{specific_ingredients}{$ingredient_id} = {};
$product_ref->{specific_ingredients}{$ingredient_id}{ingredient} = $ingredient;
$product_ref->{specific_ingredients}{$ingredient_id}{text} = $matched_text;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to smash previous values ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could concatenate them. I didn't do it because the case is largely hypothetical and the smashed values ("ingredient" and "text") are kept mostly for debugging purposes. But it's cleaner if we concatenate, I'll change it.

lib/ProductOpener/Ingredients.pm Outdated Show resolved Hide resolved
@@ -168,10 +168,10 @@ my @lists =(

["fr","p\x{e2}te de cacao* de Madagascar 75%, sucre de canne*, beurre de cacao*. * issus du commerce \x{e9}quitable et de l'agriculture biologique (100% du poids total).","pâte de cacao Commerce équitable Bio de Madagascar 75%, sucre de canne Commerce équitable Bio, beurre de cacao Commerce équitable Bio."],

["fr","Céleri - rave 21% - Eau, légumes 33,6% (carottes, céleri - rave, poivrons rouges 5,8% - haricots - petits pois bio - haricots verts - courge - radis, pommes de terre - patates - fenouil - cerfeuil tubéreux - persil plat)","Céleri-rave 21% - Eau, légumes 33,6% (carottes, céleri-rave, poivrons rouges 5,8% - haricots - petits pois bio - haricots verts - courge - radis, pommes de terre - patates - fenouil - cerfeuil tubéreux - persil plat)"],
["fr","Céleri - rave 21% - Eau, légumes 33,6% (carottes, céleri - rave, poivrons rouges 5,8% - haricots - petits pois bio - haricots verts - courge - radis, pommes de terre - patates - fenouil - cerfeuil tubéreux - persil plat)","Céleri-rave 21% - Eau, légumes 33.6% (carottes, céleri-rave, poivrons rouges 5.8% - haricots - petits pois bio - haricots verts - courge - radis, pommes de terre - patates - fenouil - cerfeuil tubéreux - persil plat)"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, that would be clearer with a line between entry / expected !
I did not understand you where changing the expected result at first sight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'll change it.

t/nutriscore.t Show resolved Hide resolved
@alexgarel
Copy link
Member

The sentences parsed with the new function do not appear in Details of the analysis of the ingredients, so it's not easy to test manually, and I also think it's a pity as it participates to nutri-score.

@stephanegigandet
Copy link
Contributor Author

Applied changes from code review, thank you!

Also added initial support to parse things like "Origin of the milk: UK".

@stephanegigandet
Copy link
Contributor Author

The sentences parsed with the new function do not appear in Details of the analysis of the ingredients, so it's not easy to test manually, and I also think it's a pity as it participates to nutri-score.

I added a sub section to list the specific ingredients in the analysis.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 5, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Member

@alexgarel alexgarel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool !

@stephanegigandet stephanegigandet merged commit f69e9a9 into main Jan 6, 2022
@stephanegigandet stephanegigandet deleted the specific-ingredients branch January 6, 2022 15:12
@teolemon teolemon added 🚦 Nutri-Score and removed 🚦Nutri-Score https://world.openfoodfacts.org/nutriscore labels May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis 🥗 Ingredients 🚦 Nutri-Score
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants