Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: solve quantity false positives issue #2037 #2038

Merged
merged 7 commits into from
May 2, 2023

Conversation

CharlesNepote
Copy link
Member

Description:

Change regexp to avoid many false positives.

Related issues and discussion: #2037

@VaiTon
Copy link
Member

VaiTon commented Jun 30, 2019

@CharlesNepote could you provide some cases that the old regex matched and this one does not? Thank you! :-)

if ((defined $product_ref->{quantity})
and ($product_ref->{quantity} =~ /(?:.*e$)|(?:[0-9]+\s*[kmc]?[gl]?\s*e)/i)
and ($product_ref->{quantity} =~ /(?:^e\s)|(?:\se[^a-z])|(?:\se$)/i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the more complex regexp was to flag things like this : "500 ge"

@stephanegigandet
Copy link
Contributor

I think @VaiTon 's suggestion is a very good one: we should add test cases (both positives like "500 ge" and negatives like "1 litre") so that we don't break this later. There is already a test file : t/sitequality.t

@CharlesNepote
Copy link
Member Author

Yes, you're both right. I'm not very comfortable with test cases but I know it's important, so I'll do it.

@hangy hangy removed their request for review October 3, 2019 12:07
@teolemon
Copy link
Member

rebased. @CharlesNepote can you write the test so we can merge

@teolemon teolemon changed the title solve quantity false positives issue #2037 fix: solve quantity false positives issue #2037 Feb 27, 2022
@github-actions github-actions bot added the 💥 Merge Conflicts 💥 Merge Conflicts label Jul 11, 2022
@teolemon
Copy link
Member

Can we close @CharlesNepote ?

@benbenben2 benbenben2 requested a review from a team as a code owner April 22, 2023 09:12
@github-actions github-actions bot added 🧽 Data quality https://wiki.openfoodfacts.org/Quality 🧪 tests labels Apr 22, 2023
@codecov-commenter
Copy link

codecov-commenter commented Apr 22, 2023

Codecov Report

Merging #2038 (d33f3d4) into main (e0f97b6) will increase coverage by 0.08%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #2038      +/-   ##
==========================================
+ Coverage   48.16%   48.25%   +0.08%     
==========================================
  Files         109      109              
  Lines       21015    21046      +31     
  Branches     4739     4739              
==========================================
+ Hits        10122    10155      +33     
+ Misses       9618     9617       -1     
+ Partials     1275     1274       -1     
Impacted Files Coverage Δ
lib/ProductOpener/DataQualityFood.pm 59.94% <100.00%> (+0.53%) ⬆️
tests/unit/dataqualityfood.t 85.05% <100.00%> (+2.84%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@benbenben2
Copy link
Collaborator

@CharlesNepote , @stephanegigandet , @VaiTon , @teolemon

Tests have been added.
The regex (initially /(?:.*e$)|(?:[0-9]+\s*[kmc]?[gl]?\s*e)/i) have been updated.
Because it seems to be covered already by the second part of the regex, the first part of the regex (/(?:.*e$)|) has been removed.
Because this "e" letter is always (? could not find exception) preceded by "g" or "kg" or "l" or "ml", we can make [gl] as mandatory in the regex (by removing the question mark which makes it optional). This leads to: /(?:[0-9]+\s*[kmc]?[gl]\s*e)/i.

False positives such as "litre" or "verre" (ending by letter e) are not catch anymore with this regex.

@github-actions github-actions bot removed the 💥 Merge Conflicts 💥 Merge Conflicts label Apr 25, 2023
Copy link
Contributor

@stephanegigandet stephanegigandet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thank you!

@stephanegigandet stephanegigandet merged commit 80f01b6 into main May 2, 2023
@stephanegigandet stephanegigandet deleted the CharlesNepote-quantity branch May 2, 2023 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

6 participants