docs: reviewing the Ecoscore.pm program and fixing the product_ecoscore.yaml schema #10875

jforget · 2024-10-10T18:25:53Z

docs: reviewing the Ecoscore.pm program and fixing the product_ecoscore.yaml schema

This pull request applies to:

  lib/ProductOpener/Ecoscore.pm
  docs/api/ref/schemas/product_ecoscore.yaml

Actually, nothing is modified in Ecoscore.pm.

Within the "ecoscore_data/adjustments/origins_of_ingredients", "ecoscore_data/grades" and "ecoscore_data/scores" hierarchies, we find many key-values pairs, where the key is a 2-char code, or the string "world". According to product_ecoscore.yaml, the 2-char code is a language code and the 5-char string "world" is not allowed.

On the other hand, when reading Ecoscore.pm, we find the hard-coded string "world" together with a list @ecoscore_countries_enabled_sorted, initialised not with language codes, but with actual country codes: "uk" instead of "en", "be" in addition to "nl" and "fr" for example.

So the schema is fixed to rename "language_code" to "country_code" and to allow property "world".

Another point, with the "ecoscore_data/adjustments/origin_of_ingredients" composite property. This property includes a string array "origins_from_origins_field", while the actual data records (and the Ecoscore.pm program) include both the string array "origins_from_origins_field" and the string array "origins_from_categories". The pull request fixes this.

Not included in the pull request: the "ecoscore_data/adjustments/packaging" contains a property "non_recyclable_and_non_biodegradable_materials" and a property "packagings" (plural), which is an array of objects. According to the JSON data files, sometimes (not often) these inner objects include a "non_recyclable_and_non_biodegradable" property (which is different, albeit related to "non_recyclable_and_non_biodegradable_materials" at the outer level). According to the YAML schema file, this property does not exist. Should it be added to the schema file?

Not included in the pull request, some stylistic issues. For example, the following lines

                        $agribalyse{$row_ref->[0]} = {
                                code => $row_ref->[0],    # Agribalyse code = Ciqual code
                                name_fr => $row_ref->[4],    # Nom du Produit en Français
                                name_en => $row_ref->[5],    # LCI Name
                                dqr => $row_ref->[6],    # DQR (data quality rating)
                                                                                 # warning: the AGB file has a hidden H column
                                ef_agriculture => $row_ref->[8] + 0,    # Agriculture
                                ef_processing => $row_ref->[9] + 0,    # Transformation
                                ef_packaging => $row_ref->[10] + 0,    # Emballage
                                ef_transportation => $row_ref->[11] + 0,    # Transport
                                ef_distribution => $row_ref->[12] + 0,    # Supermarché et distribution
                                ef_consumption => $row_ref->[13] + 0,    # Consommation
                                ef_total => $row_ref->[14] + 0,    # Total
                                co2_agriculture => $row_ref->[15] + 0,    # Agriculture
                                co2_processing => $row_ref->[16] + 0,    # Transformation
                                co2_packaging => $row_ref->[17] + 0,    # Emballage
                                co2_transportation => $row_ref->[18] + 0,    # Transport
                                co2_distribution => $row_ref->[19] + 0,    # Supermarché et distribution
                                co2_consumption => $row_ref->[20] + 0,    # Consommation
                                co2_total => $row_ref->[21] + 0,    # Total
                                version => $agribalyse_version
                        };

should be formatted as:

                        $agribalyse{$row_ref->[0]} = {
                                code               => $row_ref->[ 0],    # Agribalyse code = Ciqual code
                                name_fr            => $row_ref->[ 4],    # Nom du Produit en Français
                                name_en            => $row_ref->[ 5],    # LCI Name
                                dqr                => $row_ref->[ 6],    # DQR (data quality rating)
                                                                         # warning: the AGB file has a hidden H column
                                ef_agriculture     => $row_ref->[ 8] + 0,    # Agriculture
                                ef_processing      => $row_ref->[ 9] + 0,    # Transformation
                                ef_packaging       => $row_ref->[10] + 0,    # Emballage
                                ef_transportation  => $row_ref->[11] + 0,    # Transport
                                ef_distribution    => $row_ref->[12] + 0,    # Supermarché et distribution
                                ef_consumption     => $row_ref->[13] + 0,    # Consommation
                                ef_total           => $row_ref->[14] + 0,    # Total
                                co2_agriculture    => $row_ref->[15] + 0,    # Agriculture
                                co2_processing     => $row_ref->[16] + 0,    # Transformation
                                co2_packaging      => $row_ref->[17] + 0,    # Emballage
                                co2_transportation => $row_ref->[18] + 0,    # Transport
                                co2_distribution   => $row_ref->[19] + 0,    # Supermarché et distribution
                                co2_consumption    => $row_ref->[20] + 0,    # Consommation
                                co2_total          => $row_ref->[21] + 0,    # Total
                                version            => $agribalyse_version
                        };

See Perl Best Practices page 26, "Vertical Alignment".

And while I am browsing PBP, maybe you should indent with spaces instead of tabs (p 20).

While browsing test data, I have looked at the "transportation_scores" property. Most often, this property contains key-value pairs in which the value is zero. Sometimes, the values are integer numbers, like in products "0052833225082", "0078742102047", "2241447012920", "3270160503070", "3451790834080", "4063500001669", "8033049610109", "8411945200226". This is compatible with what the YAML schema file says.

But product "04083637" has float values (fractional part is either 0.3333...33 plus a random last digit or 0.6666...66 plus a random last digit). Product "2625078016210" has floating transportation scores with a 2-digit fractional part. In an old test file (not in the recent file openfoodfacts-products.jsonl.gz), product "5601009974337" had float values which were actually integer values plus a rounding error, such as 12.000000000000002 or 51.00000000000001. Is there a problem with these three products?

…oduct_ecoscore.yaml.

sonarqubecloud · 2024-10-10T18:28:55Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

codecov-commenter · 2024-10-10T18:59:26Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 49.07%. Comparing base (dc04d18) to head (fea8ce2).
Report is 696 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #10875      +/-   ##
==========================================
- Coverage   49.54%   49.07%   -0.48%     
==========================================
  Files          67       77      +10     
  Lines       20650    22179    +1529     
  Branches     4980     5303     +323     
==========================================
+ Hits        10231    10884     +653     
- Misses       9131     9963     +832     
- Partials     1288     1332      +44

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

stephanegigandet · 2024-10-11T08:08:04Z

Thanks a lot for the fixes to the yaml schema.

The Eco-Score implementation has changed a bit over time, and it's possible some test products still have some old values. You can try recomputing the Eco-Score for them using the scripts/update_all_products.pl --compute-eco

Regarding transportation_values, looking at the code, we have no rounding, so it can be a float. transportation_score has rounding.

For the formatting, we use a linter that you can call with "make lint", but it does not always make the best choices like in the lines you mention.

The non_recyclable_and_non_biodegradable materials properties is going away in a few days (see this PR with upcoming changes to the Eco-Score: #10829 ) so probably not worth adding.

stephanegigandet

Thank you!

After reviewing Ecoscore.pm, fixing the corresponding schema file: pr…

fea8ce2

…oduct_ecoscore.yaml.

jforget requested a review from a team as a code owner October 10, 2024 18:25

github-actions bot assigned jforget Oct 10, 2024

github-actions bot added the 📚 Documentation Documentation issues improve the project for everyone. label Oct 10, 2024

stephanegigandet approved these changes Oct 11, 2024

View reviewed changes

stephanegigandet merged commit 893705d into openfoodfacts:main Oct 11, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: reviewing the Ecoscore.pm program and fixing the product_ecoscore.yaml schema #10875

docs: reviewing the Ecoscore.pm program and fixing the product_ecoscore.yaml schema #10875

jforget commented Oct 10, 2024

sonarqubecloud bot commented Oct 10, 2024

codecov-commenter commented Oct 10, 2024

stephanegigandet commented Oct 11, 2024

stephanegigandet left a comment

docs: reviewing the Ecoscore.pm program and fixing the product_ecoscore.yaml schema #10875

docs: reviewing the Ecoscore.pm program and fixing the product_ecoscore.yaml schema #10875

Conversation

jforget commented Oct 10, 2024

docs: reviewing the Ecoscore.pm program and fixing the product_ecoscore.yaml schema

sonarqubecloud bot commented Oct 10, 2024

Quality Gate passed

codecov-commenter commented Oct 10, 2024

Codecov Report

stephanegigandet commented Oct 11, 2024

stephanegigandet left a comment

Choose a reason for hiding this comment