Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support for unit name normalization #6878

Merged
merged 22 commits into from
Jun 21, 2022

Conversation

yuktea
Copy link
Contributor

@yuktea yuktea commented Jun 9, 2022

What

Enables recognition of unit names by incorporating unit normalization in our normalization logic. Will enable OFF to compute nutriscores
when serving sizes have unit names etc.

Screenshot

Related issue(s) and discussion

@yuktea yuktea requested a review from a team as a code owner June 9, 2022 06:10
@yuktea yuktea changed the title feat: Initial support for unit name normalization feat: Support for unit name normalization Jun 9, 2022
$u = $6;
# Regex captures any <number>( )?<unit-identifier> group, but leaves allowances for a preceding
# token to allow for patterns like "One bag (32g)", "1 small bottle (180ml)" etc
if ($serving =~ /^(.*[ \(])?(?<quantity>(\d+)(\.|,)?(\d+)?)( )?(?<unit>\w+)\b/i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yuktea , we need to be careful: if we don't recognize the unit, we don't want to return a serving size (in fact today we return the value 0).
But the proposed change would mean that something like "43 somethingwedonotunderstand" would result in a serving size of 43 (we would assume the unit is g).

One way to catch this would be to add tests like those:

is( normalize_serving_size("43 unknownthingorunit"), 0 );
is( normalize_serving_size("43 unknownthingorunit (200g)"), 200 );

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjusted the logic in the unit_to_g so we always check for unit validity at that stage instead of assuming we've already filtered out invalid units. I'll add these tests as you recommend.

@github-actions github-actions bot added the 💥 Merge Conflicts 💥 Merge Conflicts label Jun 10, 2022
@github-actions github-actions bot removed the 💥 Merge Conflicts 💥 Merge Conflicts label Jun 13, 2022
@yuktea yuktea marked this pull request as draft June 14, 2022 20:31
@yuktea yuktea marked this pull request as ready for review June 14, 2022 22:27
@yuktea yuktea marked this pull request as draft June 15, 2022 09:44
@yuktea yuktea marked this pull request as ready for review June 15, 2022 11:23
@yuktea yuktea requested a review from stephanegigandet June 15, 2022 12:59
# token to allow for patterns like "One bag (32g)", "1 small bottle (180ml)" etc
if ($serving =~ /^(.*[ \(])?(?<quantity>(\d+)(\.|,)?(\d+)?)( )?(?<unit>\w+)\b/i) {
my $q = $+{quantity};
my $u = normalize_unit($+{unit});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need a normalize_unit() function if the translations are in %unit_conversion_map ?

# (needed when outputting json and to store in mongodb as a number)
# We return with + 0 to make sure the value is treated as number (needed when outputting json and to store in mongodb as a number)
# lets not assume that we have a valid unit
return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dangerous, as it changes the behaviour of the function. We can change the behaviour, but then that means we need to be absolutely certain that we are covering all cases.

If we do want to change it, then I think the function should return "undef" it this case, but not 0.

We also need to test all units that can be passed, in particular in the product edit form on the website.

I tested some and when the unit is "% vol" or "%", the value is converted to 0:

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephanegigandet The updated PR has tests for these units

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

Copy link
Contributor

@stephanegigandet stephanegigandet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you

@stephanegigandet stephanegigandet merged commit b1bc521 into main Jun 21, 2022
@stephanegigandet stephanegigandet deleted the feat/unit-name-normalization-support branch June 21, 2022 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
3 participants