-
-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV export revision (Adding new fields and removing some) #2325
Comments
Columns I’d definitely love to see included are “vegan?”, “vegetarian?” and “co2 footprint” and from_category_averaged nutrition values for unbranded/unpackaged/fresh foods |
My suggestion for fields to be deleted: |
More fields I suggest to be deleted: Carnitine 100G | Casein 100G | Collagen-Meat-Protein-Ratio 100G | Copper 100G | Folates 100G | Glycemic-Index 100G | Image Ingredients Small Url | Image Nutrition Small Url | Image Small Url | Ingredients From Palm Oil | Manganese 100G | Molybdenum 100G | No Nutriments | Nucleotides 100G | Nutrition-Score-Fr 100G | Omega-6-Fat 100G | Omega-9-Fat 100G | Polyols 100G | Potassium 100G | Selenium 100G | Serum-Proteins 100G | Starch 100G | Taurine 100G |
More fields I'd love to add:
|
While we're at it, I'd love to move Code as the unique identifier to the very left/beginning of each product listing |
For some fields, there are several translations. To keep the CSV as compact as possible, I suggest keeping only the |
Code is already at the left/beginning of each product, isn't it? See:
|
Can you provide a reason why? Just notice it is very easy to delete fields after download while you can't build them back. I think it would also be better that you give an explication for each field you want to add / delete / modify. Keep in mind that some people are downloading the CSV every day for very different reasons (they probably need different fields than you need). |
It was my suggestion out of efficiency: the columns I suggested seem mostly empty or redundant. The file size is already 2gb and it will be larger with every column and row we add. Example Nutriscore: there is nutriscore EN and FR, and they are identical There are other examples where there is not a single value I guess a good way would be a poll among the people that use the csv |
Some fields I would like to be added:
|
Empty fields to be deleted:
... and I suspect other ones. |
I would not remove any field. additives: it would be best to fix it (generate it from additives_tags), in order to be consistent with categories etc. |
Ok.
I think I prefer to keep the |
In the CSV there is currently no data to know whether the nutrients per 100g are computed from the values per serving or per 100 g. This is annoying because the values per 100g can be buggy in case of low serving size and rounded values. |
An update as of today Since last check, There are more empty fields than I expected. 4 fields + all fields beginning with a hyphen. Some of them should be useful, such as
For the fields beginning with, I never used them and didn't notice there were empty until now. Due to the fact that no-one complained on last update, I suggest to:
What we could do:
|
It's probably because we added sub sub nutriments, but did not update the export file.
should be changed to:
|
Indeed, I can see it in Food.pm. |
We clearly need a lighter CSV in addition to the full one. 8GB compressed ? That's too much. We should aim at having one sub 1GB files available as well for beginners/people with older machines… |
CSV has not evolved since a long time. Some people need new data in the CSV. This issue allows to discuss which new fields could be exported.
Fields to add:
ingredients_analysis_tags
ingredients_analysis_tags: ["en:palm-oil","en:non-vegan","en:vegetarian-status-unknown"]
nutrient_levels_tags
?nutrient_levels_tags: ["en:fat-in-high-quantity","en:saturated-fat-in-high-quantity","en:sugars-in-high-quantity","en:salt-in-low-quantity"]
product_quantity
product_quantity: "1500"
, computed fromquantity: "1,5 L"
;product_quantity: "320"
, computed fromquantity: "2 x 160 g"
serving_quantity
.owner
"owner: "org-carrefour"
data_quality_errors_tags
data_quality_errors_tags: ["en:nutrition-saturated-fat-greater-than-fat"]
(speaks for itself).unique_scans_n
(contains the number of unique scans of a product (~33% of products))unique_scans_n: "8"
.popularity_tags
popularity_tags
field groups products by different levels of popularity by year, either in the world, either in the countries where it is popular."popularity_tags: ["top-50000-scans-2019","top-100000-scans-2019","at-least-5-scans-2019","at-least-10-scans-2019","top-75-percent-scans-2019","top-80-percent-scans-2019","top-85-percent-scans-2019","top-90-percent-scans-2019","top-50000-fr-scans-2019","top-100000-fr-scans-2019","top-country-fr-scans-2019","at-least-5-fr-scans-2019","at-least-10-fr-scans-2019"]
unique_scans_n
, as the latter could suggest this number is a fresh data if not real-timecompleteness
:product_name
,quantity
,packaging
,brands
,categories
,origins
,emb_codes
,expiration_date
,ingreditents_text
,nutriments
(orno_nutrition_data
if it ison
).completeness: 0.7625
last_image_t
:last_image_t: 1666661491
Fields to delete ??
created_t
/lastmodified_t
vscreated_datetime
/last_modified_datetime
(51 Mb lost)additives
, which is emptystates*
three fields;states
andstates_tags
are almost identical, the only difference is that states contains spacesbrands
orbrands_tags
: the latter is only the normalized version of the first one (lowercased, unaccented, and replacing spaces and typographic signs by a "-")Fields that could evolve:
Process:
Implementation
@export_fields
in https://github.com/openfoodfacts/openfoodfacts-server/blob/main/lib/ProductOpener/Config_off.pm[to be completed]
The text was updated successfully, but these errors were encountered: