You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Im always frustrated when the OCR extracts ingredients from a different language.
Proposed solution
Thanks to the language setting the OCR already knows where to start the extraction. When it now hits the keyword in a different language it should stop the extraction.
The OCR extracts all the text. It is then processed and by using so-called stopwords it cut the text before and after the ingredients list.
In this case it removes everything before "Zutaten" (it can be other words) and after some stopword expected at the end of the list (like keep in dry place, etc.).
It does not work all the time. It depends of the known stopwords.
In this particular example, I guess "ingredients" is not a German word, so we could add it to the stopwords for German language.
(Maybe, maybe, just sharing some thoughts, we could add all stopwords before ingredients as stopwords after ingredients. I dont know if it would work. That would need some investigations (for example in cases where same word for ingredients is used in different languages that would be problematic). Ping @stephane, @aleene. )
At least for that particular example in your issue @github-throwaway you can add "Ingr(e|é)dients" for the German stopwords.
Problem
Im always frustrated when the OCR extracts ingredients from a different language.
Proposed solution
Thanks to the language setting the OCR already knows where to start the extraction. When it now hits the keyword in a different language it should stop the extraction.
Time per product
4 seconds saved.
Video example of problem
trim.3634E555-DDBF-4A2D-B2A6-B78E57AB58E0.MOV
Part of
#9096
The text was updated successfully, but these errors were encountered: