Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: compute packagings stats #7949

Merged
merged 14 commits into from
Jan 18, 2023
Merged

feat: compute packagings stats #7949

merged 14 commits into from
Jan 18, 2023

Conversation

stephanegigandet
Copy link
Contributor

This is a script to compute some packaging stats for categories of products. #7929

Sample output: https://world.openfoodfacts.org/data/categories_stats/categories_packagings_stats.packagings-with-weights.json
(for products that have some packaging weights)

Currently the stats are only on country / category / shape / material.

We will also add stats for packaging weight.

Copy link
Member

@alexgarel alexgarel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve, but I would better like to have the OpenAPI documentation shipped with it (see my comment).

scripts/gen_packaging_stats.pl Outdated Show resolved Hide resolved
scripts/gen_packaging_stats.pl Outdated Show resolved Hide resolved
scripts/gen_packaging_stats.pl Outdated Show resolved Hide resolved
scripts/gen_packaging_stats.pl Outdated Show resolved Hide resolved
scripts/gen_packaging_stats.pl Outdated Show resolved Hide resolved
scripts/gen_packaging_stats.pl Outdated Show resolved Hide resolved
scripts/gen_packaging_stats.pl Show resolved Hide resolved

my $total = 0;

my $packagings_stats_ref = {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to document this kind of structure where we declare them, to avoid having to read the full algorithm to understand the structure.

Or even better we could document it as a json schema (in yml) in the docs/reference, that would be cool (it could also be considered an API), and put here the path to the OpenAPI schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a description of the structure at the top of the file:

Aggregation counts are stored in a structure of the form:
{
    countries => {
        "en:world" => ..
        "en:france" => {
            categories => {
                "all" => .. # stats for all categories
                "en:yogourts" => {
                    shapes => {
                        "en:unknown" => ..
                        "all" => .. # stats for all shapes
                        "en:bottle" => {
                            materials_parents => .. # stats for parents materials (e.g. PET will also count for plastic)
                            materials => {
                                "all" => ..
                                "en:plastic" => 12, # number of products sold in France that are yogurts and that have a plastic bottle packaging component
                            }
                        },
                        ..
                    }
                },
                ..
            }
        },
        ..
    }
}

Regarding the doc in OpenAPI, why not, but at this point this is completely experimental, I don't know if we will keep that structure or not. We can document in OpenAPI once it's a bit more stabilized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
my $packagings_stats_ref = {};
# this will contains the final result
# see structure on top of this file
my $packagings_stats_ref = {};

stephanegigandet and others added 4 commits January 6, 2023 15:41
Co-authored-by: Alex Garel <alex@garel.org>
Co-authored-by: Alex Garel <alex@garel.org>
Co-authored-by: Alex Garel <alex@garel.org>
@stephanegigandet
Copy link
Contributor Author

Thanks for all the suggestions @alexgarel , I think I implemented them all in the last commit.

Please don't merge yet, there are a few details I want to change (like filtering out bogus entries in countries).

@teolemon
Copy link
Member

teolemon commented Jan 9, 2023

I'm not quite sure what to make of this (I looked at the JSON beautified).
@CharlesNepote Is the packaging structured field exported ? It would be easier to pivot, probably.

@stephanegigandet
Copy link
Contributor Author

I added a special export just for French yogurts, to make the data easier to explore in just a browser:

https://world.openfoodfacts.org/data/categories_stats/categories_packagings_stats.fr.fermented-dairy-desserts.packagings-with-weights.json

So for instance we can easily see the materials used for pots of yogurts:

image

Note that there are fields for "shape" / "shape_parents", and "material", "material_parents". This is so that we can see stats for all "pots" even if we have some components listed as "pots" and others as "individual pots". Same thing for materials: in "materials" you see the exact values entered by users, and in "materials_parents" you have all the parents values as well.

@stephanegigandet
Copy link
Contributor Author

I added weights (values + mean):

image

@github-actions github-actions bot added the GitHub Actions Pull requests that update Github_actions code label Jan 14, 2023
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@teolemon teolemon added the 📦 Packaging https://wiki.openfoodfacts.org/Category:Recycling label Jan 14, 2023
@stephanegigandet stephanegigandet merged commit 657b4ee into main Jan 18, 2023
@stephanegigandet stephanegigandet deleted the packagings_stats branch January 18, 2023 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GitHub Actions Pull requests that update Github_actions code 📦 Packaging https://wiki.openfoodfacts.org/Category:Recycling
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants