Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Packaging import through producers platform #8207

Merged
merged 68 commits into from
Apr 4, 2023
Merged
Show file tree
Hide file tree
Changes from 48 commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
226573c
feat: import packaging data (start)
stephanegigandet Mar 14, 2023
49c8455
lint
stephanegigandet Mar 14, 2023
b6000d7
test column names to fields mapping
stephanegigandet Mar 14, 2023
50fc0c5
column names for packaging components
stephanegigandet Mar 14, 2023
203e03a
update tests
stephanegigandet Mar 14, 2023
88959ae
improve column names matching for packagings
stephanegigandet Mar 15, 2023
cc1c9b9
fix for weights
stephanegigandet Mar 15, 2023
2c60d1e
Merge branch 'main' into packaging-import
stephanegigandet Mar 16, 2023
dd2359e
lint
stephanegigandet Mar 16, 2023
0cc5211
Merge branch 'main' into packaging-import
stephanegigandet Mar 16, 2023
a2d83e6
Merge branch 'main' into packaging-import
stephanegigandet Mar 17, 2023
3d1cafa
refactor
stephanegigandet Mar 17, 2023
4bc3c61
refactor and better tests
stephanegigandet Mar 20, 2023
bba0882
lint
stephanegigandet Mar 20, 2023
d265a4f
packaging components on multiple lines import
stephanegigandet Mar 21, 2023
53e0e13
taxonomy changes for Les Mousquetaires / Intermarché
stephanegigandet Mar 22, 2023
5604e21
merge
stephanegigandet Mar 23, 2023
818f62a
canonicalize_tags: match Synonym 1 / Synonym 2 and Parent / Child
stephanegigandet Mar 23, 2023
7c0649a
lint
stephanegigandet Mar 23, 2023
b96129c
canonicalize_tags: match Synonym 1 / Synonym 2 and Parent / Child
stephanegigandet Mar 23, 2023
b2f9194
update tests
stephanegigandet Mar 23, 2023
71c0001
typo
stephanegigandet Mar 23, 2023
ac3124a
update tests
stephanegigandet Mar 23, 2023
0a21954
Merge branch 'main' into packaging-import
stephanegigandet Mar 28, 2023
cafd818
new test for regexp generation
stephanegigandet Mar 28, 2023
da6ea3e
update tests
stephanegigandet Mar 28, 2023
9cd9e96
disable the simple packaging tag
stephanegigandet Mar 28, 2023
7945397
new packagings facets
stephanegigandet Mar 28, 2023
e232f81
match synonym 1 (synonym 2)
stephanegigandet Mar 28, 2023
2cd5b98
update tests
stephanegigandet Mar 28, 2023
d61c6bc
les mousquetaires entries
stephanegigandet Mar 28, 2023
37c8cb6
pvdc
stephanegigandet Mar 28, 2023
106d6a3
update tests
stephanegigandet Mar 28, 2023
04f1fd4
show weight_specified
stephanegigandet Mar 29, 2023
696ab58
small fixes
stephanegigandet Mar 29, 2023
38b3473
refactor regexps
stephanegigandet Mar 29, 2023
74517d5
fix translations
stephanegigandet Mar 29, 2023
3f73d7a
fix template
stephanegigandet Mar 29, 2023
a1743af
fix regexp variable name
stephanegigandet Mar 29, 2023
2cd4143
fix refactor
stephanegigandet Mar 30, 2023
0e24de6
packaging taxonomies
stephanegigandet Mar 30, 2023
ddef3c8
update tests
stephanegigandet Mar 30, 2023
0ec0f12
lint
stephanegigandet Mar 30, 2023
36bfced
add weight_specified
stephanegigandet Mar 30, 2023
8ad279b
Merge branch 'main' into packaging-import
stephanegigandet Mar 30, 2023
2f03201
remove unused variable
stephanegigandet Mar 30, 2023
750cd8c
Merge branch 'packaging-import' of github.com:openfoodfacts/openfoodf…
stephanegigandet Mar 30, 2023
0d9faef
fix rpet
stephanegigandet Mar 30, 2023
1469dd6
Update lib/ProductOpener/Import.pm
stephanegigandet Mar 30, 2023
6a7f91f
Update lib/ProductOpener/Producers.pm
stephanegigandet Mar 30, 2023
dd211f2
remove debug print
stephanegigandet Mar 30, 2023
00522ca
Update lib/ProductOpener/Producers.pm
stephanegigandet Mar 30, 2023
829d8f6
Update lib/ProductOpener/Tags.pm
stephanegigandet Mar 30, 2023
9cc4d5e
Update tests/unit/packaging.t
stephanegigandet Mar 30, 2023
c2a2934
small fix
stephanegigandet Mar 30, 2023
57e2e25
Update lib/ProductOpener/Producers.pm
stephanegigandet Mar 30, 2023
6a0a7eb
Apply suggestions from code review
stephanegigandet Mar 30, 2023
c0f3543
suggestions from code review
stephanegigandet Mar 30, 2023
cdb9f55
Merge branch 'packaging-import' of github.com:openfoodfacts/openfoodf…
stephanegigandet Mar 30, 2023
ff0fcdd
Merge branch 'main' into packaging-import
stephanegigandet Mar 31, 2023
f38baa3
fix constant
stephanegigandet Mar 31, 2023
7c7d259
update tests
stephanegigandet Mar 31, 2023
1dd6508
update tests
stephanegigandet Apr 3, 2023
2f5053f
update tests
stephanegigandet Apr 3, 2023
8bfd76a
fix tests
stephanegigandet Apr 3, 2023
d69e5cf
update tests
stephanegigandet Apr 3, 2023
f04ba04
fix test
stephanegigandet Apr 3, 2023
67c0951
lint
stephanegigandet Apr 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
6 changes: 4 additions & 2 deletions cgi/product_multilingual.pl
Original file line number Diff line number Diff line change
Expand Up @@ -118,8 +118,10 @@ ($product_ref)

my $input_packaging_ref = {};
my $prefix = "packaging_" . $packaging_id . "_";
foreach
my $property ("number_of_units", "shape", "material", "recycling", "quantity_per_unit", "weight_measured")
foreach my $property (
"number_of_units", "shape", "material", "recycling",
"quantity_per_unit", "weight_measured", "weight_specified"
)
{
$input_packaging_ref->{$property} = remove_tags_and_quote(decode utf8 => single_param($prefix . $property));
}
Expand Down
2 changes: 1 addition & 1 deletion cpanfile
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ requires 'Data::DeepAccess';
requires 'XML::XML2JSON';
requires 'Redis';
requires 'Digest::SHA1';

requires 'Data::Difference';
alexgarel marked this conversation as resolved.
Show resolved Hide resolved

# Mojolicious/Minion
requires 'Mojolicious::Lite';
Expand Down
9 changes: 7 additions & 2 deletions lib/ProductOpener/APIProductWrite.pm
Original file line number Diff line number Diff line change
Expand Up @@ -140,8 +140,13 @@ sub update_packagings ($request_ref, $product_ref, $field, $is_addition, $value)
$input_packaging_ref, $response_ref);

if (defined $packaging_ref) {
# Add or combine with the existing packagings components array
add_or_combine_packaging_component_data($product_ref, $packaging_ref, $response_ref);
if (not $is_addition) {
alexgarel marked this conversation as resolved.
Show resolved Hide resolved
push @{$product_ref->{packagings}}, $packaging_ref;
}
else {
# Add or combine with the existing packagings components array
add_or_combine_packaging_component_data($product_ref, $packaging_ref, $response_ref);
}
}
}
}
Expand Down
12 changes: 8 additions & 4 deletions lib/ProductOpener/Ecoscore.pm
Original file line number Diff line number Diff line change
Expand Up @@ -497,7 +497,7 @@ sub load_ecoscore_data_packaging() {
target_shape => "bottle",
target_material => "rpet",
source_shape => "bottle",
source_material => "transparent pet"
source_material => "transparent rpet"
},
{
target_material => "plastic",
Expand Down Expand Up @@ -851,9 +851,13 @@ sub compute_ecoscore ($product_ref) {
$product_ref->{ecoscore_data}{"scores"}{$cc} = 79;
}

$log->debug("compute_ecoscore - final score and grade",
{score => $product_ref->{"scores"}{$cc}, grade => $product_ref->{"grades"}{$cc}})
if $log->is_debug();
$log->debug(
"compute_ecoscore - final score and grade",
{
score => $product_ref->{ecoscore_data}{"scores"}{$cc},
grade => $product_ref->{ecoscore_data}{"grades"}{$cc}
}
) if $log->is_debug();
}

# The following values correspond to the Eco-Score for France.
Expand Down
29 changes: 16 additions & 13 deletions lib/ProductOpener/GS1.pm
Original file line number Diff line number Diff line change
Expand Up @@ -744,19 +744,22 @@ my %gs1_product_to_off = (
},
],

[
"packaging_information:packagingInformationModule",
{
fields => [
[
"packaging",
{
fields => [["packagingTypeCode", "+packaging%packagingTypeCode"],],
},
],
],
},
],
# 20230328: this packaging field is too imprecise, and the packaging field is deprecated,
# as we have a new packagings components structure
#
# [
# "packaging_information:packagingInformationModule",
# {
# fields => [
# [
# "packaging",
# {
# fields => [["packagingTypeCode", "+packaging%packagingTypeCode"],],
# },
# ],
# ],
# },
# ],

[
"packaging_marking:packagingMarkingModule",
Expand Down
119 changes: 112 additions & 7 deletions lib/ProductOpener/Import.pm
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ use ProductOpener::Ingredients qw/:all/;
use ProductOpener::Images qw/:all/;
use ProductOpener::DataQuality qw/:all/;
use ProductOpener::Data qw/:all/;
use ProductOpener::ImportConvert qw/clean_fields clean_weights assign_quantity_from_field/;
use ProductOpener::ImportConvert qw/:all/;
use ProductOpener::Users qw/:all/;
use ProductOpener::Orgs qw/:all/;
use ProductOpener::Data qw/:all/;
Expand All @@ -111,6 +111,7 @@ use DateTime::Format::ISO8601;
use URI;
use Digest::MD5 qw(md5_hex);
use LWP::UserAgent;
use Data::Difference qw(data_diff);

# private function to import images from dir
# args:
Expand Down Expand Up @@ -527,10 +528,8 @@ sub set_field_value (

my $tagid;

next if $tag =~ /^(\s|,|-|\%|;|_|°)*$/;
next
if $tag
=~ /^\s*((n(\/|\.)?a(\.)?)|(not applicable)|unknown|inconnu|inconnue|non renseigné|non applicable|nr|n\/r)\s*$/i;
if $tag =~ /^\s*($empty_regexp|$unknown_regexp|$not_applicable_regexp)\s*$/i;

$tag =~ s/^\s+//;
$tag =~ s/\s+$//;
Expand Down Expand Up @@ -1107,6 +1106,97 @@ sub set_nutrition_data_per_fields ($args_ref, $imported_product_ref, $product_re
return;
}

sub import_packaging_components (
$args_ref, $imported_product_ref, $product_ref, $stats_ref,
$modified_ref, $modified_fields_ref, $differing_ref, $differing_fields_ref,
$packagings_edited_ref, $time
)
{

my $code = $imported_product_ref->{code};

# keep a deep copy of the existing packaging components, so that we can check if the resulting components are different
my $original_packagings_ref = dclone($product_ref->{packagings} || []);

# build a list of input packaging components
my @input_packagings = ();
my $data_is_complete = 0;

# packaging data is specified in the CSV file in columns named like packagings_1_number_of_units
stephanegigandet marked this conversation as resolved.
Show resolved Hide resolved
for (my $i = 1; $i <= 10; $i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# packaging data is specified in the CSV file in columns named like packagings_1_number_of_units
for (my $i = 1; $i <= 10; $i++) {
# packaging data is specified in the CSV file in columns named like packagings_1_number_of_units
# we currently search up to 10 components
$IMPORT_MAX_COMPONENTS = 10;
for (my $i = 1; $i <= $IMPORT_MAX_COMPONENTS; $i++) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

my $input_packaging_ref = {};
foreach
my $field (qw(number_of_units shape material recycling quantity_per_unit weight_specified weight_measured))
{
$input_packaging_ref->{$field} = $imported_product_ref->{"packaging_${i}_${field}"};
}
$log->debug("input_packaging_ref", {i => $i, input_packaging_ref => $input_packaging_ref}) if $log->is_debug();

# Taxonomize the input packaging component data
push @input_packagings,
get_checked_and_taxonomized_packaging_component_data($imported_product_ref->{lc}, $input_packaging_ref, {});

# Record if we have complete input data, with all key fields (for at least 1 component)
# not considered a key field (and thus may be lost): recycling instruction, quantity per unit
if (
(defined $input_packaging_ref->{number_of_units})
and (defined $input_packaging_ref->{shape})
and (defined $input_packaging_ref->{material})
and
((defined $input_packaging_ref->{weight_specified}) or (defined $input_packaging_ref->{weight_measured}))
)
{
$data_is_complete = 1;
}
}

if ($data_is_complete) {
# We seem to have complete data, replace existing data
$product_ref->{packagings} = \@input_packagings;
Comment on lines +1159 to +1161
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$data_is_complete only tells that you have at least one complete line. Is this enough to consider complete ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging packaging data is very tricky and very likely to generate duplicates, so if we have weights from the producer, for at least one component, I think it's better to replace the whole structure.

}
else {
# We have partial data, that may be missing fields like number of units, weight etc.
# In that case, we try to merge the input components with the existing components
# so that we don't lose user entered data such as weights
# This may result in some components being duplicated, if the existing component and
# the input component have incompatible fields (e.g. if one is a "tray" and the other a "box",
# even though they refer to the same thing)

foreach my $input_packaging_ref (@input_packagings) {
add_or_combine_packaging_component_data($product_ref, $input_packaging_ref, {});
}
}

# Check if the packagings data has changed
my @diffs = data_diff($original_packagings_ref, $product_ref->{packagings});
if (scalar @diffs > 0) {
$log->debug(
"packagings diff",
{
original_packagings => $original_packagings_ref,
input_packagings => \@input_packagings,
new_packagings => $product_ref->{packagings},
data_is_complete => $data_is_complete,
diffs => \@diffs
}
) if $log->is_debug();
$stats_ref->{products_packagings_updated}{$code} = 1;
if (scalar @$original_packagings_ref == 0) {
$stats_ref->{products_packagings_created}{$code} = 1;
}
else {
$stats_ref->{products_packagings_changed}{$code} = 1;
}
$$modified_ref++;
$packagings_edited_ref->{$code}++;
# push @$modified_fields_ref, "nutrients.$field";
}

# Update the packagings_complete_field

return;
}

=head2 import_csv_file ( ARGUMENTS )

C<import_csv_file()> imports product data in the Open Food Facts CSV format
Expand Down Expand Up @@ -1315,7 +1405,11 @@ sub import_csv_file ($args_ref) {

$log->debug("importing products", {}) if $log->is_debug();

open(my $io, '<:encoding(UTF-8)', $args_ref->{csv_file}) or die("Could not open " . $args_ref->{csv_file} . ": $!");
my $io;
if (not open($io, '<:encoding(UTF-8)', $args_ref->{csv_file})) {
$stats_ref->{error} = "Could not open " . $args_ref->{csv_file} . ": $!";
return $stats_ref;
}

# first line contains headers
my $columns_ref = $csv->getline($io);
Expand All @@ -1330,6 +1424,7 @@ sub import_csv_file ($args_ref) {
my @edited = ();
my %edited = ();
my %nutrients_edited = ();
my %packagings_edited = ();
my $skip_not_existing = 0;
my $skip_no_images = 0;

Expand All @@ -1344,7 +1439,7 @@ sub import_csv_file ($args_ref) {
$i++;

# By default, use the orgid passed in the arguments
# it may be overrode later on a per product basis
# it may be overriden later on a per product basis
my $org_id = $args_ref->{org_id};
my $org_ref;

Expand Down Expand Up @@ -1573,7 +1668,7 @@ sub import_csv_file ($args_ref) {
$Owner_id = get_owner_id($User_id, $Org_id, $args_ref->{owner_id});
my $product_id = product_id_for_owner($Owner_id, $code);

# The userid can be overrode on a per product basis
# The userid can be overriden on a per product basis
# when we import data from the producers platform to the public platform
# we use the orgid as the userid
my $user_id = $args_ref->{user_id};
Expand Down Expand Up @@ -1982,6 +2077,16 @@ sub import_csv_file ($args_ref) {

set_nutrition_data_per_fields($args_ref, $imported_product_ref, $product_ref, $stats_ref, \$modified,);

# Packaging data

import_packaging_components(
$args_ref, $imported_product_ref, $product_ref, $stats_ref,
\$modified, \@modified_fields, \$differing, \%differing_fields,
\%packagings_edited, $time,
);

# Compute extra stats

if ((defined $stats_ref->{products_info_added}{$code}) or (defined $stats_ref->{products_info_changed}{$code}))
{
$stats_ref->{products_info_updated}{$code} = 1;
Expand Down
20 changes: 17 additions & 3 deletions lib/ProductOpener/ImportConvert.pm
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,12 @@ BEGIN {
use vars qw(@ISA @EXPORT_OK %EXPORT_TAGS);
@EXPORT_OK = qw(

$empty_regexp
$unknown_regexp
$not_applicable_regexp
$none_regexp
$empty_unknown_not_applicable_or_none_regexp

%fields
@fields
%products
Expand Down Expand Up @@ -122,6 +128,15 @@ use XML::Rules;

my $mode = "append";

# Regular expressions that can be combined to match specific inputs
$empty_regexp = '(?:,|\%|;|_|°|-|\/|\\|\.|\s)*';
$unknown_regexp = 'unknown|inconnu|inconnue|non renseigné(?:e)?(?:s)?|nr|n\/r';
$not_applicable_regexp = 'n(?:\/|\\|\.|-)?a(?:\.)?|(?:not|non)(?: |-)applicable|no aplica';
$none_regexp = 'none|aucun|aucune|aucun\(e\)';
Comment on lines +133 to +136
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really cool.


$empty_unknown_not_applicable_or_none_regexp
= join('|', ($empty_regexp, $unknown_regexp, $not_applicable_regexp, $none_regexp));

=head1 FUNCTIONS

=cut
Expand Down Expand Up @@ -1124,12 +1139,11 @@ sub clean_fields ($product_ref) {

# remove N, N/A, NA etc.
# but not "no", "none" that are useful values (e.g. for specific labels "organic:no", allergens : "none")
$product_ref->{$field}
=~ s/(^|,)\s*((n(\/|\.)?a(\.)?)|(not applicable)|unknown|inconnu|inconnue|non renseigné|non applicable|no aplica|nr|n\/r)\s*(,|$)//ig;
$product_ref->{$field} =~ s/(^|,)\s*($unknown_regexp|$not_applicable_regexp)\s*(,|$)//ig;

# remove none except for allergens and traces
if ($field !~ /allergens|traces/) {
$product_ref->{$field} =~ s/(^|,)\s*(none|aucun|aucune|aucun\(e\))\s*(,|$)//ig;
$product_ref->{$field} =~ s/(^|,)\s*($none_regexp)\s*(,|$)//ig;
}

if ( ($field =~ /_fr/)
Expand Down
Loading