-
-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: explanation about packaging data #7517
Changes from 3 commits
852377f
95832af
f6b5ade
b69bb85
9a81b21
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
# Packaging data | ||
|
||
This document explains how packaging data is currently added, updated and structured in the Open Food Facts database, and how it could be improved. | ||
|
||
## Introduction | ||
|
||
## Types of packaging data | ||
|
||
Food products typically have 1 or more packaging components (e.g. milk may have a bottle and a cap). | ||
|
||
For each product, we aim to have a comprehensive list of all its packaging components, with detailed information about each packaging component. | ||
|
||
### Data about packaging components | ||
|
||
For each packaging component, we want data for different attributes, like its shape (e.g. a bottle) and its size (e.g. plastic). | ||
|
||
There are many different attributes that can be interesting for specific uses. For instance, researchers in epidemiology are interested in knowing which packaging component is in contact with the food itself, and which one can be put in the microwave oven, so that they can study the long term effects of some plastics on health. | ||
|
||
## Sources of packaging data | ||
|
||
We can get packaging data from different sources: | ||
|
||
### Users | ||
|
||
Users of the Open Food Facts website and app, and users of 3rd party apps, can enter packaging data. | ||
|
||
### Manufacturers | ||
|
||
Some manufacturers send product data through GS1, which currently has limited support for packaging information (but this is likely to be improved in the years to come). | ||
|
||
Some manufacturers send us more detailed packaging data (e.g. recycling instructions) through the Producers Platform. | ||
|
||
Some manufacturers send us data used to compute the Eco-Score using the Eco-Score spreadsheet template, which has fields like "Packaging 1", "Material 1", "Packaging 2", "Material 2" etc. | ||
|
||
### Product photos and machine learning | ||
|
||
We can extract logos related to packaging, or parse the text recognized from product photos to recognize packaging information or recycling instructions. | ||
|
||
## How packaging data is currently added, updated and structured in Open Food Facts | ||
|
||
In Open Food Facts, we currently have a number of input fields related to packaging. The data in those fields is parsed and analyzed to create a structured list of packaging components with specific attributes. | ||
|
||
### Current input fields | ||
|
||
#### Packaging tag field (READ and WRITE) | ||
|
||
At the start of Open Food Facts in 2012, we had a "packaging" tag field where users could enter comma separated free text entries about the packaging (e.g. "Plastic", "Bag" or "Plastic bag") in different languages. | ||
|
||
In 2020, we made this field a taxonomized field. As a result, we now store the language used to fill this field, so that we can match its value to the multilingual packaging taxonomy. So "plastique" in French will be mapped to the canonical "en:plastic" entry. | ||
|
||
#### Packaging information / recycling instructions text field (READ and WRITE) | ||
|
||
In 2020, we also added a language specific field ("packaging_text_[language code]" e.g. "packaging_text_en" for English) to store free text data about the packaging. It can contain the text of the recycling instructions printed on the packaging (e.g. "bottle to recycle, cap to discard"), or can be filled in by users (e.g. "1 PET plastic bottle to recycle, 1 plastic cap"). | ||
|
||
### Current resulting packagings data structure (READ only) | ||
|
||
The input fields are analyzed and combined to create the "packagings" data structure. | ||
|
||
The structure is an array of packaging components. Each packaging component can have values for different attributes: | ||
|
||
- number: the number of units for the packaging component (e.g. a pack of beers may contain 6 bottles) | ||
- shape: the general shape of the packaging component (e.g. "bottle", "box") | ||
- material: the material of the packaging component | ||
- quantity: how much product the packaging component contains (e.g. "25 cl") | ||
- recycling: whether the packaging component should be recycled, discarded or reused | ||
|
||
The "shape" and "material" fields are taxonomized using the packaging_shapes and packaging_materials taxonomies. | ||
|
||
### How the the resulting packagings data structure is created | ||
|
||
The values for each input field ("packaging" tag field and "packaging_text_[language code]" packaging information text field) are analyzed to recognize packaging components and their attributes. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't explain how we deal with entries in more than one language. Do we analyze them all ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently we only analyze the entry corresponding to the main language of the product. Added a note about that. |
||
|
||
For instance, if the "packaging" field contains "Plastic bottle, box, cardboard", we will use the packaging shapes, materials and recycling taxonomies to create a list of 3 packaging components: {shape:"en:bottle", material:"en:plastic"}, {shape:"en:box"}, {material:"en:cardboard"}. | ||
|
||
And if the "packaging_text_en" field contains "PET bottle to recycle, box to reuse", we will create 2 more packaging components: {shape:"en:bottle", material:"en:pet-polyethylene-terephthalate", recycling:"en:recycle"}, {shape:"box", recycling:"reuse"}. | ||
|
||
The 3 + 2 = 5 resulting packaging components are then added one by one in the packagings structure. When their attributes are compatible, the packaging units are merged. For instance {shape:"en:box"} and {material:"en:cardboard"} have non conflicting attributes, so they are merged into {shape:"en:box", material:"en:cardboard"}. Note that it is possible that this is a mistake, and that the "box" and "cardboard" tags concern in fact different components. | ||
|
||
Similarly, as "en:plastic" is a parent of "en:pet-polyethylene-terephthalate" in the packaging_materials taxonomy, we can merge {shape:"en:bottle", material:"en:plastic"} with {shape:"en:bottle", material:"en:pet-polyethylene-terephthalate", recycling:"en:recycle"} into {shape:"en:bottle", material:"en:pet-polyethylene-terephthalate", recycling:"en:recycle"}. | ||
|
||
The resulting structure is: | ||
|
||
``` | ||
packagings: [ | ||
{ | ||
material: "en:pet-polyethylene-terephthalate", | ||
recycling: "en:recycle", | ||
shape: "en:bottle" | ||
}, | ||
{ | ||
recycling: "en:reuse", | ||
shape: "en:box" | ||
}, | ||
{ | ||
shape: "en:container" | ||
} | ||
] | ||
``` | ||
|
||
### Taxonomies | ||
|
||
We have created a number of multilingual taxonomies related to packagings: | ||
|
||
- Packaging shapes taxonomy : https://github.com/openfoodfacts/openfoodfacts-server/blob/main/taxonomies/packaging_shapes.txt | ||
- Packaging materials taxonomy : https://github.com/openfoodfacts/openfoodfacts-server/blob/main/taxonomies/packaging_materials.txt | ||
- Packaging recycling taxonomy : https://github.com/openfoodfacts/openfoodfacts-server/blob/main/taxonomies/packaging_recycling.txt | ||
- Preservation methods taxonomy (related) : https://github.com/openfoodfacts/openfoodfacts-server/blob/main/taxonomies/preservation.txt | ||
|
||
Those taxonomies are used to structure packaging data in Open Food Facts, and to analyze unstructured input data. | ||
|
||
## How we could improve it | ||
|
||
### Extend the attributes of the packaging components in the "packagings" data structure | ||
|
||
#### Weight | ||
|
||
We need to add an attribute for the weight of the packaging component. We might need to add different fields to distinguish values that have been entered by users that weight the packaging, versus values provided by the manufacturer, or average values that we have determined from other products, or that we got from external sources. | ||
|
||
### Make the "packagings" data structure READ and WRITE | ||
|
||
The "packagings" data structure is currently a READ only field. We could create an API to make it a READ and WRITE field. | ||
|
||
For new products, clients (website and apps) could ask users to enter data about all packaging components of the product. | ||
|
||
For existing products, clients could display the packaging components and let users change them (e.g. adding or removing components, entering values for new attributes, editing attributes to add more precise values (e.g. which type of plastic) etc.). | ||
|
||
#### Add a way to indicate that the "packagings" data structure contains all the packaging components of the product | ||
|
||
We currently have no way to know if the packaging data we have for a product is complete, or if we may be missing some packaging components. | ||
|
||
We could have a way (e.g. a checkbox) that users could use to indicate all components are accounted for. And we could also do the reverse, and indicate that it is very likely that we are missing some packaging components (e.g. if we have a "cap" but no other component to put the cap on). | ||
|
||
### Deprecate the "packaging" tags field | ||
|
||
We could discard the existing "packaging" tags field, and replace it with an API to allow clients to add partial information about packaging components. | ||
|
||
For instance, if Robotoff detects that the product is in plastic bottle by analyzing a product photo, it could send {shape:"bottle", material:"en:plastic"} and it would be added / combined with the existing "packagings" data. | ||
|
||
### Keep the "packaging_text_[language code]" field | ||
|
||
It is important to keep this field, as we can display it as-is, use it as input data, and it may contain interesting data that we do not analyze yet. | ||
|
||
When filled, the values for this field can be analyzed and added to / combined with the "packagings" data structure. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Explain if there would be a round trip between this field and packaging_text_[language code]. Could we also keep a track of which characters (span) of packaging_text mapped to an entry in packagings ? How do you deal with eventual parts of packaging_text that you are not able to parse ? Do you reject the input in this case ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Writing packagings will not affect packaging_text. We can keep track of what part of the text was matched to what. What we don't recognize is ignored. Added notes about that in the doc. |
||
|
||
## Challenges | ||
|
||
### Incomplete lists of packaging components | ||
|
||
### Slightly mismatched data from different sources | ||
|
||
For a single product, we might get partial packaging data from different sources that we map to similar but distinct shapes, like "bottle", "jar" and "jug". It may be difficult to determine if the data concerns a single packaging component, or different components. | ||
|
||
|
||
### Products with packaging changes | ||
|
||
## Ressources | ||
|
||
- 2020 project to start structuring packaging data: https://wiki.openfoodfacts.org/Packagings_data_structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a footnote on which is the corresponding function in ProductOpener ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I added footnotes.