Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: japanese parenthesis #8381

Merged
merged 9 commits into from
May 3, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions docs/dev/how-to-develop-using-perl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# How can I learn the Perl programming language?

Here are some introductory resources to learn Perl:

## Quick start

- [Perl Youtube Tutorial](https://www.youtube.com/watch?v=c0k9ieKky7Q) - Perl Enough to be dangerous // FULL COURSE 3 HOURS.
- [Perl - Introduction](https://www.tutorialspoint.com/perl/perl_quick_guide.htm) - Introduction to perl from tutorialspoint
- [Impatient Perl](https://blob.perl.org/books/impatient-perl/iperl.pdf) - PDF document for people wintrested in learning perl.

## Official Documentation

- [Perl.org](https://www.perl.org/) - Official Perl website with documentation, tutorials, and community resources.
- [Learn Perl](https://learn.perl.org/) - Perl programming language tutorials for beginners.
- [Perl Maven](https://perlmaven.com/) - Perl programming tutorials, tips, and code examples.

# See the logs while running Perl locally
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great doc.

But I would have made it a specific how to, and a "how-to-develop-using-perl" that links to how to learn perl, the how to use log, and how-to-write-and-run-tests.md, how-to-use-repl.md and how-to-use-vscode.md

## Types of logs
### Logs that are always printed
Those logs are like this:
```
$log->debug("extracting ingredients from text", {text => $text})
if $log->is_debug();
```
or this:
```
$log->trace("compare_nutriments", {nid => $nid}) if $log->is_trace();
```

### Logs that you have to activate
Those logs are not printed by default. You have to "activate" them by editing the corresponding variable. For example, for **Ingredients.pm** you have to set the following variable to 1:
```my $debug_ingredients = 0;```

This type of logs is found in **Ingredients.pm** and **Tags.pm**

## See the logs
There is a make command to see (all) logs:

```make tail```

Nevertheless, if you want to see only perl-related logs, you can either edit temporarily the **Makefile** file (replace ```tail -f logs/**/*``` by following command. **Do not forget to rollback changes!**) or directly run the following command in a terminal:

```tail -f logs/apache2/log4perl.log```


Additionally, sometimes you want to focus only to some specific logs in the code. In this case you can use combination of tail and grep commands to find specific text in the logs. For example this command will fetch all logs containing the text "found the first separator":

```tail -f logs/apache2/log4perl.log | grep -a "found the first separator"```
15 changes: 0 additions & 15 deletions docs/dev/how-to-learn-perl.md

This file was deleted.

23 changes: 19 additions & 4 deletions lib/ProductOpener/Ingredients.pm
Original file line number Diff line number Diff line change
Expand Up @@ -138,11 +138,23 @@ my $commas = qr/(?:\N{U+002C}|\N{U+FE50}|\N{U+FF0C}|\N{U+3001}|\N{U+FE51}|\N{U+F
my $stops = qr/(?:\N{U+002E}|\N{U+FE52}|\N{U+FF0E}|\N{U+3002}|\N{U+FE61})/i;

# '(' and other opening brackets ('Punctuation, Open' without QUOTEs)
# U+201A "‚" (Single Low-9 Quotation Mark)
benbenben2 marked this conversation as resolved.
Show resolved Hide resolved
# U+201E "„" (Double Low-9 Quotation Mark)
# U+276E "❮" (Heavy Left-Pointing Angle Quotation Mark Ornament)
# U+2E42 "⹂" (Double Low-Reversed-9 Quotation Mark)
# U+301D "〝" (Reversed Double Prime Quotation Mark)
# U+FF08 "(" (Fullwidth Left Parenthesis) used in some countries (Japan)
my $obrackets = qr/(?![\N{U+201A}|\N{U+201E}|\N{U+276E}|\N{U+2E42}|\N{U+301D}|\N{U+FF08}])[\p{Ps}]/i;

# ')' and other closing brackets ('Punctuation, Close' without QUOTEs)
# U+276F "❯" (Heavy Right-Pointing Angle Quotation Mark Ornament )
# U+301E "⹂" (Double Low-Reversed-9 Quotation Mark)
# U+301F "〟" (Low Double Prime Quotation Mark)
# U+FF09 ")" (Fullwidth Right Parenthesis) used in some countries (Japan)
my $cbrackets = qr/(?![\N{U+276F}|\N{U+301E}|\N{U+301F}|\N{U+FF09}])[\p{Pe}]/i;

my $separators_except_comma = qr/(;|:|$middle_dot|\[|\{|\(|( $dashes ))|(\/)/i
# U+FF0F "/" (Fullwidth Solidus) used in some countries (Japan)
my $separators_except_comma = qr/(;|:|$middle_dot|\[|\{|\(|\N{U+FF08}|( $dashes ))|(\/|\N{U+FF0F})/i
; # separators include the dot . followed by a space, but we don't want to separate 1.4 etc.

my $separators = qr/($stops\s|$commas|$separators_except_comma)/i;
Expand Down Expand Up @@ -1268,8 +1280,7 @@ sub parse_ingredients_text ($product_ref) {
my $processing = '';

$debug_ingredients and $log->debug("analyze_ingredients_function", {string => $s}) if $log->is_debug();

# find the first separator or ( or [ or :
# find the first separator or ( or [ or : etc.
if ($s =~ $separators) {

$before = $`;
Expand All @@ -1283,7 +1294,7 @@ sub parse_ingredients_text ($product_ref) {

# If the first separator is a column : or a start of parenthesis etc. we may have sub ingredients

if ($sep =~ /(:|\[|\{|\()/i) {
if ($sep =~ /(:|\[|\{|\(|\N{U+FF08})/i) {

# Single separators like commas and dashes
my $match = '.*?'; # non greedy match
Expand All @@ -1305,6 +1316,10 @@ sub parse_ingredients_text ($product_ref) {
elsif ($sep eq '{') {
$ending = '\}';
}
# brackets type used in some countries (Japan) "(" and ")"
elsif ($sep =~ '\N{U+FF08}') {
$ending = '\N{U+FF09}';
}

$ending = '(' . $ending . ')';

Expand Down
242 changes: 242 additions & 0 deletions tests/unit/expected_test_results/ingredients/jp-parenthesis.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
{
"ingredients" : [
{
"id" : "jp:しょうゆ",
"ingredients" : [
{
"id" : "jp:本醸造",
"percent_estimate" : 55,
"percent_max" : 100,
"percent_min" : 10,
"text" : "本醸造"
}
],
"percent_estimate" : 55,
"percent_max" : 100,
"percent_min" : 10,
"text" : "しょうゆ"
},
{
"id" : "jp:糖類",
"ingredients" : [
{
"id" : "jp:ぶどう糖果糖液糖",
"percent_estimate" : 11.25,
"percent_max" : 50,
"percent_min" : 0,
"text" : "ぶどう糖果糖液糖"
},
{
"id" : "jp:水あめ",
"percent_estimate" : 5.625,
"percent_max" : 25,
"percent_min" : 0,
"text" : "水あめ"
},
{
"id" : "jp:砂糖",
"percent_estimate" : 5.625,
"percent_max" : 16.6666666666667,
"percent_min" : 0,
"text" : "砂糖"
}
],
"percent_estimate" : 22.5,
"percent_max" : 50,
"percent_min" : 0,
"text" : "糖類"
},
{
"id" : "jp:みりん",
"percent_estimate" : 11.25,
"percent_max" : 33.3333333333333,
"percent_min" : 0,
"text" : "みりん"
},
{
"id" : "jp:食塩",
"percent_estimate" : 5.625,
"percent_max" : 25,
"percent_min" : 0,
"text" : "食塩"
},
{
"id" : "jp:かつお節",
"percent_estimate" : 2.8125,
"percent_max" : 20,
"percent_min" : 0,
"text" : "かつお節"
},
{
"id" : "jp:さば節",
"percent_estimate" : 1.40625,
"percent_max" : 16.6666666666667,
"percent_min" : 0,
"text" : "さば節"
},
{
"id" : "jp:たん白加水分解物混合物",
"percent_estimate" : 0.703125,
"percent_max" : 14.2857142857143,
"percent_min" : 0,
"text" : "たん白加水分解物混合物"
},
{
"id" : "jp:こんぶ",
"percent_estimate" : 0.3515625,
"percent_max" : 12.5,
"percent_min" : 0,
"text" : "こんぶ"
},
{
"id" : "jp:調味料",
"ingredients" : [
{
"id" : "jp:アミノ酸等",
"percent_estimate" : 0.17578125,
"percent_max" : 11.1111111111111,
"percent_min" : 0,
"text" : "アミノ酸等"
}
],
"percent_estimate" : 0.17578125,
"percent_max" : 11.1111111111111,
"percent_min" : 0,
"text" : "調味料"
},
{
"id" : "jp:アルコール",
"percent_estimate" : 0.17578125,
"percent_max" : 10,
"percent_min" : 0,
"text" : "アルコール"
}
],
"ingredients_analysis" : {
"en:palm-oil-content-unknown" : [
"jp:しょうゆ",
"jp:本醸造",
"jp:糖類",
"jp:ぶどう糖果糖液糖",
"jp:水あめ",
"jp:砂糖",
"jp:みりん",
"jp:食塩",
"jp:かつお節",
"jp:さば節",
"jp:たん白加水分解物混合物",
"jp:こんぶ",
"jp:調味料",
"jp:アミノ酸等",
"jp:アルコール"
],
"en:vegan-status-unknown" : [
"jp:しょうゆ",
"jp:本醸造",
"jp:糖類",
"jp:ぶどう糖果糖液糖",
"jp:水あめ",
"jp:砂糖",
"jp:みりん",
"jp:食塩",
"jp:かつお節",
"jp:さば節",
"jp:たん白加水分解物混合物",
"jp:こんぶ",
"jp:調味料",
"jp:アミノ酸等",
"jp:アルコール"
],
"en:vegetarian-status-unknown" : [
"jp:しょうゆ",
"jp:本醸造",
"jp:糖類",
"jp:ぶどう糖果糖液糖",
"jp:水あめ",
"jp:砂糖",
"jp:みりん",
"jp:食塩",
"jp:かつお節",
"jp:さば節",
"jp:たん白加水分解物混合物",
"jp:こんぶ",
"jp:調味料",
"jp:アミノ酸等",
"jp:アルコール"
]
},
"ingredients_analysis_tags" : [
"en:palm-oil-content-unknown",
"en:vegan-status-unknown",
"en:vegetarian-status-unknown"
],
"ingredients_hierarchy" : [
"jp:しょうゆ",
"jp:糖類",
"jp:みりん",
"jp:食塩",
"jp:かつお節",
"jp:さば節",
"jp:たん白加水分解物混合物",
"jp:こんぶ",
"jp:調味料",
"jp:アルコール",
"jp:本醸造",
"jp:ぶどう糖果糖液糖",
"jp:水あめ",
"jp:砂糖",
"jp:アミノ酸等"
],
"ingredients_n" : 15,
"ingredients_n_tags" : [
"15",
"11-20"
],
"ingredients_original_tags" : [
"jp:しょうゆ",
"jp:糖類",
"jp:みりん",
"jp:食塩",
"jp:かつお節",
"jp:さば節",
"jp:たん白加水分解物混合物",
"jp:こんぶ",
"jp:調味料",
"jp:アルコール",
"jp:本醸造",
"jp:ぶどう糖果糖液糖",
"jp:水あめ",
"jp:砂糖",
"jp:アミノ酸等"
],
"ingredients_percent_analysis" : 1,
"ingredients_tags" : [
"jp:しょうゆ",
"jp:糖類",
"jp:みりん",
"jp:食塩",
"jp:かつお節",
"jp:さば節",
"jp:たん白加水分解物混合物",
"jp:こんぶ",
"jp:調味料",
"jp:アルコール",
"jp:本醸造",
"jp:ぶどう糖果糖液糖",
"jp:水あめ",
"jp:砂糖",
"jp:アミノ酸等"
],
"ingredients_text" : "しょうゆ(本醸造)、糖類(ぶどう糖果糖液糖、水あめ、砂糖)、みりん、食塩、かつお節、さば節、たん白加水分解物混合物、こんぶ、調味料(アミノ酸等)、アルコール",
"ingredients_with_specified_percent_n" : 0,
"ingredients_with_specified_percent_sum" : 0,
"ingredients_with_unspecified_percent_n" : 12,
"ingredients_with_unspecified_percent_sum" : 100,
"known_ingredients_n" : 0,
"lc" : "jp",
"nutriments" : {
"fruits-vegetables-nuts-estimate-from-ingredients_100g" : 0,
"fruits-vegetables-nuts-estimate-from-ingredients_serving" : 0
},
"unknown_ingredients_n" : 15
}
Loading