From 3ff6ce95b09511ea79a6a3f603c9068f50350991 Mon Sep 17 00:00:00 2001 From: Oyenuga17 Date: Sat, 1 Apr 2023 22:13:01 +0100 Subject: [PATCH 1/3] fix:corrected spelling errors --- docs/dev/explain-packaging-data.md | 11 +- docs/dev/explain-taxonomy-build-cache.md | 5 +- docs/dev/how-to-learn-perl.md | 4 +- docs/dev/how-to-update-agribalyse-ecoscore.md | 214 +++++++++++------- 4 files changed, 142 insertions(+), 92 deletions(-) diff --git a/docs/dev/explain-packaging-data.md b/docs/dev/explain-packaging-data.md index 973997c777f6b..36a636664b8e8 100644 --- a/docs/dev/explain-packaging-data.md +++ b/docs/dev/explain-packaging-data.md @@ -50,7 +50,7 @@ In 2020, we made this field a taxonomized field. As a result, we now store the l #### Packaging information / recycling instructions text field (READ and WRITE) -In 2020, we also added a language specific field ("packaging_text_[language code]" e.g. "packaging_text_en" for English) to store free text data about the packaging. It can contain the text of the recycling instructions printed on the packaging (e.g. "bottle to recycle, cap to discard"), or can be filled in by users (e.g. "1 PET plastic bottle to recycle, 1 plastic cap"). +In 2020, we also added a language specific field ("packaging*text*[language code]" e.g. "packaging_text_en" for English) to store free text data about the packaging. It can contain the text of the recycling instructions printed on the packaging (e.g. "bottle to recycle, cap to discard"), or can be filled in by users (e.g. "1 PET plastic bottle to recycle, 1 plastic cap"). ### Current resulting packagings data structure (READ only) @@ -70,7 +70,7 @@ The "shape" and "material" fields are taxonomized using the packaging_shapes and #### Extract attributes that relate to different packaging components -The values for each input field ("packaging" tag field and "packaging_text_[language code]" packaging information text field) are analyzed[^parse_packaging_from_text_phrase] to recognize packaging components and their attributes. One product may have multiple "packaging_text_[language code]" values in different languages. Only the value for the main product of the language is currently analyzed. +The values for each input field ("packaging" tag field and "packaging*text*[language code]" packaging information text field) are analyzed[^parse_packaging_from_text_phrase] to recognize packaging components and their attributes. One product may have multiple "packaging*text*[language code]" values in different languages. Only the value for the main product of the language is currently analyzed. [^parse_packaging_from_text_phrase]: parse_packaging_from_text_phrase() function in [/lib/ProductOpener/Packagings.pm](https://github.com/openfoodfacts/openfoodfacts-server/blob/main/lib/ProductOpener/Packaging.pm) @@ -144,13 +144,13 @@ We could discard the existing "packaging" tags field, and replace it with an API For instance, if Robotoff detects that the product is in plastic bottle by analyzing a product photo, it could send {shape:"bottle", material:"en:plastic"} and it would be added / combined with the existing "packagings" data. -### Keep the "packaging_text_[language code]" field +### Keep the "packaging*text*[language code]" field It is important to keep this field, as we can display it as-is, use it as input data, and it may contain interesting data that we do not analyze yet. When filled, the values for this field can be analyzed and added to / combined with the "packagings" data structure. Similarly to ingredient text analysis, we could keep information about which parts of the text were recognized as attributes of a packaging component, and which parts were not recognized and were therefore ignored. -Changing the "packagings" value will not change the "packaging_text_[language code]" values. +Changing the "packagings" value will not change the "packaging*text*[language code]" values. ## Challenges @@ -160,9 +160,8 @@ Changing the "packagings" value will not change the "packaging_text_[language co For a single product, we might get partial packaging data from different sources that we map to similar but distinct shapes, like "bottle", "jar" and "jug". It may be difficult to determine if the data concerns a single packaging component, or different components. - ### Products with packaging changes -## Ressources +## Resources - 2020 project to start structuring packaging data: https://wiki.openfoodfacts.org/Packagings_data_structure diff --git a/docs/dev/explain-taxonomy-build-cache.md b/docs/dev/explain-taxonomy-build-cache.md index 7545e7de87fcb..1f71b2271dd0a 100644 --- a/docs/dev/explain-taxonomy-build-cache.md +++ b/docs/dev/explain-taxonomy-build-cache.md @@ -3,6 +3,7 @@ Taxonomies have a significant impact on OFF processing and automated test results so need to be rebuilt before running any tests. However, this process takes some time, so the built taxonomy files are cached in a GitHub repository so that they only need to be rebuilt when there is a genuine change. # How it works + A hash is calculated for all of the source files used to build a particular taxonomy and GitHub is then checked to see if a cache already exists for that hash. If no cached build is found then the taxonomy is rebuilt and cached locally. @@ -15,7 +16,7 @@ The GITHUB_TOKEN is a personal access token, created here: https://github.com/se # Considerations -In maintianing this code be aware of the following complications... +In maintaining this code be aware of the following complications... ## Circular Dependencies @@ -26,5 +27,3 @@ This is currently resolved by building the taxonomy on the fly if it is requeste ## Taxonomy Dependencies Some taxonomies perform lookups on others, e.g. additives_classes are referenced by additives, so the referenced taxonomy needs to be built first. The build order is determined in the Config_off.pm file. - - diff --git a/docs/dev/how-to-learn-perl.md b/docs/dev/how-to-learn-perl.md index bc804fd4af250..c4b06e490e94e 100644 --- a/docs/dev/how-to-learn-perl.md +++ b/docs/dev/how-to-learn-perl.md @@ -2,11 +2,11 @@ Here are some introductory resources to learn Perl: -### Quick start +### Quick start - [Perl Youtube Tutorial](https://www.youtube.com/watch?v=c0k9ieKky7Q) - Perl Enough to be dangerous // FULL COURSE 3 HOURS. - [Perl - Introduction](https://www.tutorialspoint.com/perl/perl_quick_guide.htm) - Introduction to perl from tutorialspoint -- [Impatient Perl](https://blob.perl.org/books/impatient-perl/iperl.pdf) - PDF document for people wintrested in learning perl. +- [Impatient Perl](https://blob.perl.org/books/impatient-perl/iperl.pdf) - PDF document for people intrested in learning perl. ### Official Documentation diff --git a/docs/dev/how-to-update-agribalyse-ecoscore.md b/docs/dev/how-to-update-agribalyse-ecoscore.md index f7579547db89f..24546f3f921cd 100644 --- a/docs/dev/how-to-update-agribalyse-ecoscore.md +++ b/docs/dev/how-to-update-agribalyse-ecoscore.md @@ -10,7 +10,7 @@ Download the AGRIBALYSE food spreadsheet from the [AGRIBALYSE](https://doc.agrib In a backend shell run the ssconvert.sh script. This will re-generate the CSV files, including the AGRIBALYSE_version and AGRIBALYSE_summary files. The AGRIBALYSE_summary file is sorted to make for easier comparison with the previous version. -The Ecoscore calculation just uses the data from the "Detail etape" tab, which is converted to AGRIBALYSE_vf.csv.2 by ssconvert. The Ecoscore.pm module skips the first three lines of this file to ignore headers. This should be checked for each update as the number of header lines has previously changed. Also check that none of the column headings have changed. +The Ecoscore calculation just uses the data from the "Detail etape" tab, which is converted to AGRIBALYSE_vf.csv.2 by ssconvert. The Ecoscore.pm module skips the first three lines of this file to ignore headers. This should be checked for each update as the number of header lines has previously changed. Also check that none of the column headings have changed. ## Review and fix any changed Categories @@ -23,86 +23,133 @@ It is also worth checking the impact the update has had on the main product data The previous values of the Ecoscore are stored in the previous_data section under ecoscore_data. Before applying an update you will need to delete this section with the following MongoDB script: ```js -db.products.update({}, {$unset: {"ecoscore_data.previous_data":0}}); +db.products.update({}, { $unset: { "ecoscore_data.previous_data": 0 } }); ``` + You can then use the following script from a backend bash shell to update products: + ``` ./update_all_products.pl --fields categories --compute-ecoscore ``` + The process will set the `en:ecoscore_grade_changed` and `en:ecoscore_changed` misc_tags, which can be queried to analyse the results. For example, the following script generates a CSV file that summaries all the categories where the grade has changed: + ```js -var results = db.products.aggregate([ +var results = db.products + .aggregate([ + { + $match: { + misc_tags: "en:ecoscore-grade-changed", + }, + }, { - $match: { - misc_tags: "en:ecoscore-grade-changed" - } - }, { $group: { - _id: {en: "$ecoscore_data.agribalyse.name_en", - fr: "$ecoscore_data.agribalyse.name_fr", - code_before: "$ecoscore_data.previous_data.agribalyse.code", - code_after: "$ecoscore_data.agribalyse.code", - before: "$ecoscore_data.previous_data.grade", - after: "$ecoscore_data.grade" }, - count: { $sum: 1 } - } } -]).toArray(); -print('en.Name,fr.Name,Code Before,Code After,Grade Before,Grade After,Count'); + $group: { + _id: { + en: "$ecoscore_data.agribalyse.name_en", + fr: "$ecoscore_data.agribalyse.name_fr", + code_before: "$ecoscore_data.previous_data.agribalyse.code", + code_after: "$ecoscore_data.agribalyse.code", + before: "$ecoscore_data.previous_data.grade", + after: "$ecoscore_data.grade", + }, + count: { $sum: 1 }, + }, + }, + ]) + .toArray(); +print("en.Name,fr.Name,Code Before,Code After,Grade Before,Grade After,Count"); results.forEach((result) => { - // eslint-disable-next-line no-underscore-dangle - var id = result._id; - print('"' + (id.en || '').replace(/"/g,'""') - + '","' + (id.fr || '').replace(/"/g,'""') - + '",' + id.code_before - + ',' + id.code_after - + ',' + id.before - + ',' + id.after - + ',' + result.count); + // eslint-disable-next-line no-underscore-dangle + var id = result._id; + print( + '"' + + (id.en || "").replace(/"/g, '""') + + '","' + + (id.fr || "").replace(/"/g, '""') + + '",' + + id.code_before + + "," + + id.code_after + + "," + + id.before + + "," + + id.after + + "," + + result.count + ); }); ``` + The following script fetches the specific products that have changed: + ```js -var products = db.products.find( +var products = db.products + .find( + { + misc_tags: "en:ecoscore-grade-changed", + }, { - misc_tags: "en:ecoscore-grade-changed" - }, { _id: 1, - "ecoscore_data.agribalyse.name_en": 1, - "ecoscore_data.agribalyse.name_fr": 1, - "ecoscore_data_main.agribalyse.code": 1, - "ecoscore_data.previous_data.agribalyse.code": 1, - "ecoscore_data.agribalyse.code" : 1, - "ecoscore_data_main.grade": 1, - "ecoscore_data.previous_data.grade" : 1, - "ecoscore_data.grade" : 1, - "ecoscore_data_main.score": 1, - "ecoscore_data.previous_data.score" : 1, - "ecoscore_data.score" : 1, - "ecoscore_data_main.agribalyse.ef_total": 1, - "ecoscore_data.previous_data.agribalyse.ef_total" : 1, - "ecoscore_data.agribalyse.ef_total" : 1, - "categories_tags": 1}).toArray(); - -print('_id,en.Name,fr.Name,Code Before Main,Code Before Change,Code After,Grade Before Main,Grade Before Change,Grade After,Score Before Main,Score Before Change,Score After,ef_total Before Main,ef_total Before Change,ef_total After,Categories Tags'); + _id: 1, + "ecoscore_data.agribalyse.name_en": 1, + "ecoscore_data.agribalyse.name_fr": 1, + "ecoscore_data_main.agribalyse.code": 1, + "ecoscore_data.previous_data.agribalyse.code": 1, + "ecoscore_data.agribalyse.code": 1, + "ecoscore_data_main.grade": 1, + "ecoscore_data.previous_data.grade": 1, + "ecoscore_data.grade": 1, + "ecoscore_data_main.score": 1, + "ecoscore_data.previous_data.score": 1, + "ecoscore_data.score": 1, + "ecoscore_data_main.agribalyse.ef_total": 1, + "ecoscore_data.previous_data.agribalyse.ef_total": 1, + "ecoscore_data.agribalyse.ef_total": 1, + categories_tags: 1, + } + ) + .toArray(); + +print( + "_id,en.Name,fr.Name,Code Before Main,Code Before Change,Code After,Grade Before Main,Grade Before Change,Grade After,Score Before Main,Score Before Change,Score After,ef_total Before Main,ef_total Before Change,ef_total After,Categories Tags" +); products.forEach((result) => { - var ecoscore_data_main = result.ecoscore_data_main || {}; - var ecoscore_data_main_agribalyse = ecoscore_data_main.agribalyse || {}; - // eslint-disable-next-line no-underscore-dangle - print( result._id - + ',"' + (result.ecoscore_data.agribalyse.name_en || '').replace(/"/g,'""') - + '","' + (result.ecoscore_data.agribalyse.name_fr || '').replace(/"/g,'""') - + '",' + ecoscore_data_main_agribalyse.code - + ',' + result.ecoscore_data.previous_data.agribalyse.code - + ',' + result.ecoscore_data.agribalyse.code - + ',' + ecoscore_data_main.grade - + ',' + result.ecoscore_data.previous_data.grade - + ',' + result.ecoscore_data.grade - + ',' + ecoscore_data_main.score - + ',' + result.ecoscore_data.previous_data.score - + ',' + result.ecoscore_data.score - + ',' + ecoscore_data_main_agribalyse.ef_total - + ',' + result.ecoscore_data.previous_data.agribalyse.ef_total - + ',' + result.ecoscore_data.agribalyse.ef_total - + ',"' + result.categories_tags.join(" ") +'"' - ); + var ecoscore_data_main = result.ecoscore_data_main || {}; + var ecoscore_data_main_agribalyse = ecoscore_data_main.agribalyse || {}; + // eslint-disable-next-line no-underscore-dangle + print( + result._id + + ',"' + + (result.ecoscore_data.agribalyse.name_en || "").replace(/"/g, '""') + + '","' + + (result.ecoscore_data.agribalyse.name_fr || "").replace(/"/g, '""') + + '",' + + ecoscore_data_main_agribalyse.code + + "," + + result.ecoscore_data.previous_data.agribalyse.code + + "," + + result.ecoscore_data.agribalyse.code + + "," + + ecoscore_data_main.grade + + "," + + result.ecoscore_data.previous_data.grade + + "," + + result.ecoscore_data.grade + + "," + + ecoscore_data_main.score + + "," + + result.ecoscore_data.previous_data.score + + "," + + result.ecoscore_data.score + + "," + + ecoscore_data_main_agribalyse.ef_total + + "," + + result.ecoscore_data.previous_data.agribalyse.ef_total + + "," + + result.ecoscore_data.agribalyse.ef_total + + ',"' + + result.categories_tags.join(" ") + + '"' + ); }); ``` @@ -114,27 +161,32 @@ Re-run the `update_all_products` script after doing this to assess how many prod ## Add new Categories for new AGRIBALYSE codes -For any new categories, review the AGRIBALYSE category descriptions to ensure they are concise and unambiguous sucgh that an OFF user is most likely to get a match on a type-ahead search. Give notice of the change on the taxonomies channel in Slack so that additional translations can be added for the new categories. +For any new categories, review the AGRIBALYSE category descriptions to ensure they are concise and unambiguous such that an OFF user is most likely to get a match on a type-ahead search. Give notice of the change on the taxonomies channel in Slack so that additional translations can be added for the new categories. It is not necessary to add a category for every single AGRIBALYSE entry. For example, AGRIBALYSE has over 80 codes for different mineral waters but these all have almost exactly the same environmental impact. In cases like this it is acceptable to pick a single representative AGRIBALYSE code as a proxy for the Category in general. It may be worth doing a final check to see how many categories cominations still do not have a match to AGRIBALYSE: + ```js -var missing = db.products.aggregate([ +var missing = db.products + .aggregate([ { - $match: { - "ecoscore_data.grade": null - } - }, { $group: { + $match: { + "ecoscore_data.grade": null, + }, + }, + { + $group: { _id: "$categories_tags", - count: { $sum: 1 } - } } -]).toArray(); -print('Category,Count'); + count: { $sum: 1 }, + }, + }, + ]) + .toArray(); +print("Category,Count"); missing.forEach((result) => { - // eslint-disable-next-line no-underscore-dangle - var id = result._id; - print('"' + (id.join(',') || '').replace(/"/g,'""') - + '",' + result.count); + // eslint-disable-next-line no-underscore-dangle + var id = result._id; + print('"' + (id.join(",") || "").replace(/"/g, '""') + '",' + result.count); }); -``` \ No newline at end of file +``` From 8b213c9d3d249cea042cc00d143b31ce1369bfa3 Mon Sep 17 00:00:00 2001 From: oyenuga17 <64274826+oyenuga17@users.noreply.github.com> Date: Mon, 3 Apr 2023 10:35:50 +0100 Subject: [PATCH 2/3] Update explain-packaging-data.md revert vs-code underscore to asterisk formatting --- docs/dev/explain-packaging-data.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/dev/explain-packaging-data.md b/docs/dev/explain-packaging-data.md index 36a636664b8e8..d9d772fcea3ee 100644 --- a/docs/dev/explain-packaging-data.md +++ b/docs/dev/explain-packaging-data.md @@ -50,7 +50,7 @@ In 2020, we made this field a taxonomized field. As a result, we now store the l #### Packaging information / recycling instructions text field (READ and WRITE) -In 2020, we also added a language specific field ("packaging*text*[language code]" e.g. "packaging_text_en" for English) to store free text data about the packaging. It can contain the text of the recycling instructions printed on the packaging (e.g. "bottle to recycle, cap to discard"), or can be filled in by users (e.g. "1 PET plastic bottle to recycle, 1 plastic cap"). +In 2020, we also added a language specific field ("packaging_text_[language code]" e.g. "packaging_text_en" for English) to store free text data about the packaging. It can contain the text of the recycling instructions printed on the packaging (e.g. "bottle to recycle, cap to discard"), or can be filled in by users (e.g. "1 PET plastic bottle to recycle, 1 plastic cap"). ### Current resulting packagings data structure (READ only) @@ -70,7 +70,7 @@ The "shape" and "material" fields are taxonomized using the packaging_shapes and #### Extract attributes that relate to different packaging components -The values for each input field ("packaging" tag field and "packaging*text*[language code]" packaging information text field) are analyzed[^parse_packaging_from_text_phrase] to recognize packaging components and their attributes. One product may have multiple "packaging*text*[language code]" values in different languages. Only the value for the main product of the language is currently analyzed. +The values for each input field ("packaging" tag field and "packaging_text_[language code]" packaging information text field) are analyzed[^parse_packaging_from_text_phrase] to recognize packaging components and their attributes. One product may have multiple "packaging_text_[language code]" values in different languages. Only the value for the main product of the language is currently analyzed. [^parse_packaging_from_text_phrase]: parse_packaging_from_text_phrase() function in [/lib/ProductOpener/Packagings.pm](https://github.com/openfoodfacts/openfoodfacts-server/blob/main/lib/ProductOpener/Packaging.pm) @@ -144,13 +144,13 @@ We could discard the existing "packaging" tags field, and replace it with an API For instance, if Robotoff detects that the product is in plastic bottle by analyzing a product photo, it could send {shape:"bottle", material:"en:plastic"} and it would be added / combined with the existing "packagings" data. -### Keep the "packaging*text*[language code]" field +### Keep the "packaging_text_[language code]" field It is important to keep this field, as we can display it as-is, use it as input data, and it may contain interesting data that we do not analyze yet. When filled, the values for this field can be analyzed and added to / combined with the "packagings" data structure. Similarly to ingredient text analysis, we could keep information about which parts of the text were recognized as attributes of a packaging component, and which parts were not recognized and were therefore ignored. -Changing the "packagings" value will not change the "packaging*text*[language code]" values. +Changing the "packagings" value will not change the "packaging_text_[language code]" values. ## Challenges From 8f1c024530b1616644da283399cb1b52cb4ef5c3 Mon Sep 17 00:00:00 2001 From: oyenuga17 <64274826+oyenuga17@users.noreply.github.com> Date: Mon, 3 Apr 2023 10:42:25 +0100 Subject: [PATCH 3/3] Update docs/dev/how-to-learn-perl.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Stéphane Gigandet --- docs/dev/how-to-learn-perl.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/dev/how-to-learn-perl.md b/docs/dev/how-to-learn-perl.md index c4b06e490e94e..4b31dde6e453d 100644 --- a/docs/dev/how-to-learn-perl.md +++ b/docs/dev/how-to-learn-perl.md @@ -6,7 +6,7 @@ Here are some introductory resources to learn Perl: - [Perl Youtube Tutorial](https://www.youtube.com/watch?v=c0k9ieKky7Q) - Perl Enough to be dangerous // FULL COURSE 3 HOURS. - [Perl - Introduction](https://www.tutorialspoint.com/perl/perl_quick_guide.htm) - Introduction to perl from tutorialspoint -- [Impatient Perl](https://blob.perl.org/books/impatient-perl/iperl.pdf) - PDF document for people intrested in learning perl. +- [Impatient Perl](https://blob.perl.org/books/impatient-perl/iperl.pdf) - PDF document for people interested in learning perl. ### Official Documentation