Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: reviewing the Ecoscore.pm program and fixing the product_ecoscore.yaml schema #10875

Merged
merged 1 commit into from
Oct 11, 2024

Conversation

jforget
Copy link
Contributor

@jforget jforget commented Oct 10, 2024

docs: reviewing the Ecoscore.pm program and fixing the product_ecoscore.yaml schema

This pull request applies to:

  lib/ProductOpener/Ecoscore.pm
  docs/api/ref/schemas/product_ecoscore.yaml

Actually, nothing is modified in Ecoscore.pm.

Within the "ecoscore_data/adjustments/origins_of_ingredients", "ecoscore_data/grades" and "ecoscore_data/scores" hierarchies, we find many key-values pairs, where the key is a 2-char code, or the string "world". According to product_ecoscore.yaml, the 2-char code is a language code and the 5-char string "world" is not allowed.

On the other hand, when reading Ecoscore.pm, we find the hard-coded string "world" together with a list @ecoscore_countries_enabled_sorted, initialised not with language codes, but with actual country codes: "uk" instead of "en", "be" in addition to "nl" and "fr" for example.

So the schema is fixed to rename "language_code" to "country_code" and to allow property "world".

Another point, with the "ecoscore_data/adjustments/origin_of_ingredients" composite property. This property includes a string array "origins_from_origins_field", while the actual data records (and the Ecoscore.pm program) include both the string array "origins_from_origins_field" and the string array "origins_from_categories". The pull request fixes this.

Not included in the pull request: the "ecoscore_data/adjustments/packaging" contains a property "non_recyclable_and_non_biodegradable_materials" and a property "packagings" (plural), which is an array of objects. According to the JSON data files, sometimes (not often) these inner objects include a "non_recyclable_and_non_biodegradable" property (which is different, albeit related to "non_recyclable_and_non_biodegradable_materials" at the outer level). According to the YAML schema file, this property does not exist. Should it be added to the schema file?

Not included in the pull request, some stylistic issues. For example, the following lines

                        $agribalyse{$row_ref->[0]} = {
                                code => $row_ref->[0],    # Agribalyse code = Ciqual code
                                name_fr => $row_ref->[4],    # Nom du Produit en Français
                                name_en => $row_ref->[5],    # LCI Name
                                dqr => $row_ref->[6],    # DQR (data quality rating)
                                                                                 # warning: the AGB file has a hidden H column
                                ef_agriculture => $row_ref->[8] + 0,    # Agriculture
                                ef_processing => $row_ref->[9] + 0,    # Transformation
                                ef_packaging => $row_ref->[10] + 0,    # Emballage
                                ef_transportation => $row_ref->[11] + 0,    # Transport
                                ef_distribution => $row_ref->[12] + 0,    # Supermarché et distribution
                                ef_consumption => $row_ref->[13] + 0,    # Consommation
                                ef_total => $row_ref->[14] + 0,    # Total
                                co2_agriculture => $row_ref->[15] + 0,    # Agriculture
                                co2_processing => $row_ref->[16] + 0,    # Transformation
                                co2_packaging => $row_ref->[17] + 0,    # Emballage
                                co2_transportation => $row_ref->[18] + 0,    # Transport
                                co2_distribution => $row_ref->[19] + 0,    # Supermarché et distribution
                                co2_consumption => $row_ref->[20] + 0,    # Consommation
                                co2_total => $row_ref->[21] + 0,    # Total
                                version => $agribalyse_version
                        };

should be formatted as:

                        $agribalyse{$row_ref->[0]} = {
                                code               => $row_ref->[ 0],    # Agribalyse code = Ciqual code
                                name_fr            => $row_ref->[ 4],    # Nom du Produit en Français
                                name_en            => $row_ref->[ 5],    # LCI Name
                                dqr                => $row_ref->[ 6],    # DQR (data quality rating)
                                                                         # warning: the AGB file has a hidden H column
                                ef_agriculture     => $row_ref->[ 8] + 0,    # Agriculture
                                ef_processing      => $row_ref->[ 9] + 0,    # Transformation
                                ef_packaging       => $row_ref->[10] + 0,    # Emballage
                                ef_transportation  => $row_ref->[11] + 0,    # Transport
                                ef_distribution    => $row_ref->[12] + 0,    # Supermarché et distribution
                                ef_consumption     => $row_ref->[13] + 0,    # Consommation
                                ef_total           => $row_ref->[14] + 0,    # Total
                                co2_agriculture    => $row_ref->[15] + 0,    # Agriculture
                                co2_processing     => $row_ref->[16] + 0,    # Transformation
                                co2_packaging      => $row_ref->[17] + 0,    # Emballage
                                co2_transportation => $row_ref->[18] + 0,    # Transport
                                co2_distribution   => $row_ref->[19] + 0,    # Supermarché et distribution
                                co2_consumption    => $row_ref->[20] + 0,    # Consommation
                                co2_total          => $row_ref->[21] + 0,    # Total
                                version            => $agribalyse_version
                        };

See Perl Best Practices page 26, "Vertical Alignment".

And while I am browsing PBP, maybe you should indent with spaces instead of tabs (p 20).

While browsing test data, I have looked at the "transportation_scores" property. Most often, this property contains key-value pairs in which the value is zero. Sometimes, the values are integer numbers, like in products "0052833225082", "0078742102047", "2241447012920", "3270160503070", "3451790834080", "4063500001669", "8033049610109", "8411945200226". This is compatible with what the YAML schema file says.

But product "04083637" has float values (fractional part is either 0.3333...33 plus a random last digit or 0.6666...66 plus a random last digit). Product "2625078016210" has floating transportation scores with a 2-digit fractional part. In an old test file (not in the recent file openfoodfacts-products.jsonl.gz), product "5601009974337" had float values which were actually integer values plus a rounding error, such as 12.000000000000002 or 51.00000000000001. Is there a problem with these three products?

@jforget jforget requested a review from a team as a code owner October 10, 2024 18:25
@github-actions github-actions bot added the 📚 Documentation Documentation issues improve the project for everyone. label Oct 10, 2024
Copy link

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 49.07%. Comparing base (dc04d18) to head (fea8ce2).
Report is 696 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10875      +/-   ##
==========================================
- Coverage   49.54%   49.07%   -0.48%     
==========================================
  Files          67       77      +10     
  Lines       20650    22179    +1529     
  Branches     4980     5303     +323     
==========================================
+ Hits        10231    10884     +653     
- Misses       9131     9963     +832     
- Partials     1288     1332      +44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stephanegigandet
Copy link
Contributor

Thanks a lot for the fixes to the yaml schema.

The Eco-Score implementation has changed a bit over time, and it's possible some test products still have some old values. You can try recomputing the Eco-Score for them using the scripts/update_all_products.pl --compute-eco

Regarding transportation_values, looking at the code, we have no rounding, so it can be a float. transportation_score has rounding.

For the formatting, we use a linter that you can call with "make lint", but it does not always make the best choices like in the lines you mention.

The non_recyclable_and_non_biodegradable materials properties is going away in a few days (see this PR with upcoming changes to the Eco-Score: #10829 ) so probably not worth adding.

Copy link
Contributor

@stephanegigandet stephanegigandet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@stephanegigandet stephanegigandet merged commit 893705d into openfoodfacts:main Oct 11, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📚 Documentation Documentation issues improve the project for everyone.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants