-
-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support for unit name normalization #6878
Conversation
$u = $6; | ||
# Regex captures any <number>( )?<unit-identifier> group, but leaves allowances for a preceding | ||
# token to allow for patterns like "One bag (32g)", "1 small bottle (180ml)" etc | ||
if ($serving =~ /^(.*[ \(])?(?<quantity>(\d+)(\.|,)?(\d+)?)( )?(?<unit>\w+)\b/i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yuktea , we need to be careful: if we don't recognize the unit, we don't want to return a serving size (in fact today we return the value 0).
But the proposed change would mean that something like "43 somethingwedonotunderstand" would result in a serving size of 43 (we would assume the unit is g).
One way to catch this would be to add tests like those:
is( normalize_serving_size("43 unknownthingorunit"), 0 );
is( normalize_serving_size("43 unknownthingorunit (200g)"), 200 );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adjusted the logic in the unit_to_g
so we always check for unit validity at that stage instead of assuming we've already filtered out invalid units. I'll add these tests as you recommend.
Co-authored-by: Stéphane Gigandet <stephane@openfoodfacts.org>
# token to allow for patterns like "One bag (32g)", "1 small bottle (180ml)" etc | ||
if ($serving =~ /^(.*[ \(])?(?<quantity>(\d+)(\.|,)?(\d+)?)( )?(?<unit>\w+)\b/i) { | ||
my $q = $+{quantity}; | ||
my $u = normalize_unit($+{unit}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need a normalize_unit() function if the translations are in %unit_conversion_map ?
lib/ProductOpener/Food.pm
Outdated
# (needed when outputting json and to store in mongodb as a number) | ||
# We return with + 0 to make sure the value is treated as number (needed when outputting json and to store in mongodb as a number) | ||
# lets not assume that we have a valid unit | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dangerous, as it changes the behaviour of the function. We can change the behaviour, but then that means we need to be absolutely certain that we are covering all cases.
If we do want to change it, then I think the function should return "undef" it this case, but not 0.
We also need to test all units that can be passed, in particular in the product edit form on the website.
I tested some and when the unit is "% vol" or "%", the value is converted to 0:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephanegigandet The updated PR has tests for these units
Kudos, SonarCloud Quality Gate passed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you
What
Enables recognition of unit names by incorporating unit normalization in our normalization logic. Will enable OFF to compute nutriscores
when serving sizes have unit names etc.
Screenshot
Related issue(s) and discussion