Restrict Text Recognization to Digit Only #1876
-
Hi, I am trying to do the text recognition of a table with digits only. Sometimes, the digits are recognized as English characters. I wonder whether there is a way to restrict the scope of the text recognition to digits only. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
Hi @jidalii 👋, Unfortunately, we don’t have built-in logic for blacklisting or whitelisting characters yet. I’ve tried several approaches, including:
However, I wasn’t really satisfied with the results. In some cases, the model started predicting seemingly random characters to fill the restricted positions (e.g., when "a" was blacklisted, instead of choosing the closer "á," it predicted "R"). Maybe @frgfm has some ideas I haven’t tested yet ? 🤗 In general, I think the simplest approach would be to introduce a |
Beta Was this translation helpful? Give feedback.
However, you have to "know what you are doing there" with a LogitsProcessor... I think not many people would use that or it needs extensive explanation and/or help
something like
ocr_predictor(.. whitelist=VOCABS["french"] + VOCABS["german"])
for example seems to be more user friendly ..but would require a more robust solution / logic on our end 😅