Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: dq new facet for opposite tags #10378

Merged
merged 15 commits into from
Aug 6, 2024
Merged

Conversation

benbenben2
Copy link
Collaborator

What

Added new facet for opposite tags in labels and categories taxonomies.

  • unit tests
  • description of the facet*

*Remark: facets don't contain language code "en:" because otherwise the description will not work.

Added new function in Tags.pm. I thought it might be useful for other use cases.

Screenshot

Screenshot_20240601_081942

Related issue(s) and discussion

@benbenben2 benbenben2 self-assigned this Jun 1, 2024
@benbenben2 benbenben2 requested a review from a team as a code owner June 1, 2024 06:31
@github-actions github-actions bot added 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies 🧪 tests Tags labels Jun 1, 2024
@benbenben2
Copy link
Collaborator Author

benbenben2 commented Jun 1, 2024

Error with unit test attributes.t, and more particularly en-attributes.json. The file is initially (expected_test_results):

{
   "attributes" : [
      {
         "description" : "Organic farming aims to protect the environment and to conserve biodiversity by prohibiting or limiting the use of synthetic fertilizers, pesticides and food additives.",
         "description_short" : "Promotes ecological sustainability and biodiversity.",
         "grade" : "a",
         "icon_url" : "https://server_domain/images/attributes/dist/organic.svg",
         "id" : "labels_organic",
         "match" : 100,
         "name" : "Organic farming",
         "status" : "known",
         "title" : "Organic product"
      },
      {
         "description" : "When you buy fair trade products, producers in developing countries are paid an higher and fairer price, which helps them improve and sustain higher social and often environmental standards.",
         "description_short" : "Helps producers in developing countries.",
         "grade" : "a",
         "icon_url" : "https://server_domain/images/attributes/dist/fair-trade.svg",
         "id" : "labels_fair_trade",
         "match" : 100,
         "name" : "Fair trade",
         "status" : "known",
         "title" : "Fair trade product"
      }
   ],
   "id" : "labels",
   "name" : "Labels"
} 

After changes from this PR it becomes

{
   'attributes' => [
                     {
                       'description' => 'Organic farming aims to protect the environment and to conserve biodiversity by prohibiting or limiting the use of synthetic fertilizers, pesticides and food additives.',
                       'description_short' => 'Organic products promote ecological sustainability and biodiversity.',
                       'grade' => 'e',
                       'icon_url' => 'https://server_domain/images/attributes/dist/not-organic.svg',
                       'id' => 'labels_organic',
                       'match' => 0,
                       'name' => 'Organic farming',
                       'status' => 'known',
                       'title' => 'Not an organic product'
                     },
                     {
                       'description' => 'When you buy fair trade products, producers in developing countries are paid an higher and fairer price, which helps them improve and sustain higher social and often environmental standards.',
                       'description_short' => 'Fair trade products help producers in developing countries.',
                       'grade' => 'e',
                       'icon_url' => 'https://server_domain/images/attributes/dist/not-fair-trade.svg',
                       'id' => 'labels_fair_trade',
                       'match' => 0,
                       'name' => 'Fair trade',
                       'status' => 'known',
                       'title' => 'Not a fair trade product'
                     }
                   ],
   'id' => 'labels',
   'name' => 'Labels'
}

This is definitely related to the PR because fair-trade and organic both contains opposites.
However, if labels.txt has been modified in this PR, nor fair-trade nor organic has been modified in the labels.txt file.
Not so clear to me how to fix that.

Copy link

sonarqubecloud bot commented Jun 1, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@benbenben2
Copy link
Collaborator Author

benbenben2 commented Jul 10, 2024

From data quality meeting minutes:

Errors rather than warnings.

-> done

Aleene: have a more generic approach: with some categories, the category should be only linked with its parents: eg. white pepper

-> done

we could use the “exclusive” tag

-> used incompatible_with instead, see below

labels and category/ingredients/labels: eg. kosher and pork, halal and pork, label/halal/category/pork => https://world.openfoodfacts.org/category/pork/label/kosher
Benoît: how would cross taxonomies: an opposite is a category.
For example label: vegetarian vs category: pork.
Pierre: eg. incompatible_with:label:en:kosher

-> done

Remark: there is a similar error for vegetarian/vegan labels and ingredients: https://hr.openfoodfacts.org/data-quality-error/en:vegan-label-but-non-vegan-ingredient

-> replace this similar error by the present one would imply to add incompatible_with for all vegan/vegetarian ingredients. Instead, keeping this similar error - using the vegan:en:yes tag - seems to be fine.

Comments

Couple of typos from previous PR #10392

Can we replace opposite tags in the taxonomy (labels) by incompatible_with (+ update Tags.pm)? (question for @stephanegigandet maybe, this tag existed already before the present PR)

Screenshot_20240710_065332

Copy link
Member

@alexgarel alexgarel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR, I have small suggestions.


For each tag of a given field ($tagtype, can be "labels" or "categories", for example),
and a given property ($prop_name, without last column (:). Can be "incompatible_with:en", for example),
return a hash of tagid <-> property_value
Copy link
Member

@alexgarel alexgarel Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's what's wanted, but it does not handle property inheritance and we have to document it !

Suggested change
return a hash of tagid <-> property_value
return a hash of tagid <-> property_value
Note that this does not handle inheritance.

sub check_incompatible_tags ($product_ref) {

# list of tags having 'incompatible_with' properties
my @tags_to_check = ("categories", "labels");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tagtypes_to_check would be less error prone.

# list of tags having 'incompatible_with' properties
my @tags_to_check = ("categories", "labels");

foreach my $tag_to_check (@tags_to_check) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here also $tagtype_to_check.

Comment on lines 2726 to 2727
foreach my $key (keys %{$incompatible_with_hash}) {
my $value = %{$incompatible_with_hash}{$key};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have named: $key --> $tag and $value -> $incompatible_tags

It makes it easier to read.

my @incompatible_tags = sort ($tag_to_check . "-" . $key, $incompatible_tag);

add_tag($product_ref, "data_quality_errors",
"en:mutually-exclusive-$incompatible_tags[0]-and-$incompatible_tags[1]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mutually-exclusive-tags- as prefix ?

Suggested change
"en:mutually-exclusive-$incompatible_tags[0]-and-$incompatible_tags[1]");
"en:mutually-exclusive-tags-$incompatible_tags[0]-and-$incompatible_tags[1]");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shan't we also add a "en:mutually-exclusive-tags" tag, to be able to get all items with incompatibilities ?

foreach my $tag_to_check (@tags_to_check) {
$log->debug("check_incompatible_tags: tag_to_check $tag_to_check") if $log->debug();

my $incompatible_with_hash = get_all_tags_having_property($product_ref, $tag_to_check, "incompatible_with:en");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
my $incompatible_with_hash = get_all_tags_having_property($product_ref, $tag_to_check, "incompatible_with:en");
# we don't need to care about inherited properties
# as every tag parent is also in the _tags field
# thus incompatibilities will pop-up
my $incompatible_with_hash = get_all_tags_having_property($product_ref, $tag_to_check, "incompatible_with:en");

@github-actions github-actions bot added the 💥 Merge Conflicts 💥 Merge Conflicts label Jul 15, 2024
@benbenben2
Copy link
Collaborator Author

Thanks for the review and the suggestions.

Screenshot_20240716_173428

Screenshot_20240716_173135

This is still a problem: #10378 (comment)

@github-actions github-actions bot added 💥 Merge Conflicts 💥 Merge Conflicts and removed 💥 Merge Conflicts 💥 Merge Conflicts labels Jul 16, 2024
@github-actions github-actions bot added 💥 Merge Conflicts 💥 Merge Conflicts and removed 💥 Merge Conflicts 💥 Merge Conflicts labels Jul 26, 2024

sub get_all_tags_having_property ($product_ref, $tagtype, $prop_name) {
my %tag_property_hash = ();
foreach my $tagid (@{$product_ref->{$tagtype . "_tags"}}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will create en empty labels_tags / categories_tags array if it did not exist.
In Attributes.pm, if there's no labels_tags, then we consider we don't know if a product has the organic label or not. But if it has other labels (e.g. fair trade), then we consider that it if were organic, we would have the organic label too.
There's a bug in Attributes.pm that results in an empty labels_tags array being considered as having labels.

@github-actions github-actions bot added the Attributes https://wiki.openfoodfacts.org/Product_Attributes label Aug 5, 2024
@alexgarel alexgarel enabled auto-merge (squash) August 6, 2024 09:22
@github-actions github-actions bot removed the 💥 Merge Conflicts 💥 Merge Conflicts label Aug 6, 2024
Copy link

sonarqubecloud bot commented Aug 6, 2024

@alexgarel alexgarel merged commit 8d32e29 into main Aug 6, 2024
12 checks passed
@alexgarel alexgarel deleted the dq_new_facet_for_opposite_tags branch August 6, 2024 11:25
alexgarel pushed a commit that referenced this pull request Aug 7, 2024
remove opposite when not needed (categories)
put back opposite when it was used, before
#10378
(labels). Remember that opposite tag is used in Import.pm.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Attributes https://wiki.openfoodfacts.org/Product_Attributes categories 🧽 Data quality - Prevention 🧽 Data quality https://wiki.openfoodfacts.org/Quality 🧽 on-the-fly quality checks Tags 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies 🧪 tests 🧪 unit tests
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Add quality facets based on the opposite: property, and on mutually-exclusive tag values
3 participants