docs: reviewing the Ecoscore.pm program and fixing the product_ecoscore.yaml schema #10875
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
docs: reviewing the Ecoscore.pm program and fixing the product_ecoscore.yaml schema
This pull request applies to:
Actually, nothing is modified in
Ecoscore.pm
.Within the "ecoscore_data/adjustments/origins_of_ingredients", "ecoscore_data/grades" and "ecoscore_data/scores" hierarchies, we find many key-values pairs, where the key is a 2-char code, or the string "world". According to
product_ecoscore.yaml
, the 2-char code is a language code and the 5-char string "world" is not allowed.On the other hand, when reading
Ecoscore.pm
, we find the hard-coded string "world" together with a list@ecoscore_countries_enabled_sorted
, initialised not with language codes, but with actual country codes: "uk" instead of "en", "be" in addition to "nl" and "fr" for example.So the schema is fixed to rename "language_code" to "country_code" and to allow property "world".
Another point, with the "ecoscore_data/adjustments/origin_of_ingredients" composite property. This property includes a string array "origins_from_origins_field", while the actual data records (and the
Ecoscore.pm
program) include both the string array "origins_from_origins_field" and the string array "origins_from_categories". The pull request fixes this.Not included in the pull request: the "ecoscore_data/adjustments/packaging" contains a property "non_recyclable_and_non_biodegradable_materials" and a property "packagings" (plural), which is an array of objects. According to the JSON data files, sometimes (not often) these inner objects include a "non_recyclable_and_non_biodegradable" property (which is different, albeit related to "non_recyclable_and_non_biodegradable_materials" at the outer level). According to the YAML schema file, this property does not exist. Should it be added to the schema file?
Not included in the pull request, some stylistic issues. For example, the following lines
should be formatted as:
See Perl Best Practices page 26, "Vertical Alignment".
And while I am browsing PBP, maybe you should indent with spaces instead of tabs (p 20).
While browsing test data, I have looked at the "transportation_scores" property. Most often, this property contains key-value pairs in which the value is zero. Sometimes, the values are integer numbers, like in products "0052833225082", "0078742102047", "2241447012920", "3270160503070", "3451790834080", "4063500001669", "8033049610109", "8411945200226". This is compatible with what the YAML schema file says.
But product "04083637" has float values (fractional part is either 0.3333...33 plus a random last digit or 0.6666...66 plus a random last digit). Product "2625078016210" has floating transportation scores with a 2-digit fractional part. In an old test file (not in the recent file
openfoodfacts-products.jsonl.gz
), product "5601009974337" had float values which were actually integer values plus a rounding error, such as 12.000000000000002 or 51.00000000000001. Is there a problem with these three products?