You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered this error while running the following code:
import pycld2 as cld2
text ="""
Happy Tailors Day! Hackett We�re celebrating with a special offer
"""
isReliable, textBytesFound, details = cld2.detect(text)
Here is the error:
error: input contains invalid UTF-8 around byte 30 (of 68)
The text was updated successfully, but these errors were encountered:
There's been some great exploration of this issue in this polyglot issue and [also in the older cld2 project(https://github.com/mikemccand/chromium-compact-language-detector/issues/22) that pycld2 is forked from (some of which are from folks using Polyglot, which actually depends on pycld2 rather than that older cld2 project).
Have not tried it yet, but this solution, which uses a regex to strip the two offending UTF8 control characters from the input, looks like the most elegant solution to me.
I encountered this error while running the following code:
Here is the error:
The text was updated successfully, but these errors were encountered: