Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taxonomy: added and removed stopwords for ingredients in HR #7987

Merged
merged 8 commits into from
Jan 18, 2023

Conversation

benbenben2
Copy link
Collaborator

Updates are made after the following observations:

može sadržavati tragove
Sadrži
"Popis sastojaka"
"naziv proizvoda"
!!!! RM pasterizirano because ingredient can be "Nehomogenizirano pasterizirano mlijeko"!!!
!!!! RM HR because if the address is after ingredient "sastojci blablabla HR blablabla2" it starts after hr leading to: "blablabla2"
"Čuvati zatvoreno na"
"Cuvati" (accent missed)
"Čuvati pri sobnoj temperaturi."

"Izvor dijetalnih vlakana"
Proizvod je termički obrađen-pasteriziran
"Prosječne hranjive vrijednosti"
"Prosječne nutritivne vrijednosti"
"Protresti prije otvaranja"
"Uvoznik za"
"zaštićena oznaka zemljopisnog podrijetla"
"Zemlja podrijetla"
"Zemlja porekla kakao mase EU"

@benbenben2 benbenben2 requested a review from a team as a code owner January 14, 2023 20:08
@benbenben2 benbenben2 self-assigned this Jan 14, 2023
@github-actions github-actions bot added 🥗 Ingredients 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis labels Jan 14, 2023
Copy link
Contributor

@stephanegigandet stephanegigandet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@benbenben2
Copy link
Collaborator Author

Thanks

having trouble with perl tidy :-(

@alexgarel
Copy link
Member

@benbenben2 just run make lint_perltidy and commit changes.

@alexgarel
Copy link
Member

alexgarel commented Jan 16, 2023

@stephanegigandet this is the error with perl tidy:

Use of strings with code points over 0xFF as arguments to bitwise xor (^) operator is not allowed at /opt/perl/local/lib/perl5/Perl/Tidy.pm line 2778.

Is this linked to a character used by @benbenben2

@benbenben2, UTF-8 is a bit strange some times, as for example you might have the character e + ` as a modifier or è directly (you see same think but it's not the same). The problem here is with one of the character you input.

One think you can try is to use re.pl :

  • add export CPANMOPTS="--with-develop --with-feature=off_server_dev_tools" in your .envrc (if you use direnv or the Makefile, or directly in .env, but do not commit)
  • rebuild
  • the run docker-compose run --rm backend re.pl

It gives you a sandbox to test your expressions.

@alexgarel
Copy link
Member

@benbenben2 or maybe try to copy the first "ž" already present instead of retyping it in the rest of the sentence, because your ž might be different and cause the problem (wild guess)

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@stephanegigandet stephanegigandet merged commit 7728b21 into main Jan 18, 2023
@stephanegigandet stephanegigandet deleted the hr_add_stopwords_in_ingredients_pm_2 branch January 18, 2023 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis 🥗 Ingredients
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants