Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions for "UnicodeEncodeError: 'utf-8' codec can't encode character" #116

Open
boechat107 opened this issue Jun 22, 2022 · 0 comments

Comments

@boechat107
Copy link

I would be glad if someone gives me a suggestion.

I want to encode a big dictionary that contains text encoded in something different than utf-8. Does the library offer some option to handle this situation? Or must I change the data before trying to serialize it?

  File "/home/user/.local/lib/python3.7/site-packages/bson/codec.py", line 201, in encode_value
    buf.write(encode_string_element(name, value))
  File "/home/user/.local/lib/python3.7/site-packages/bson/codec.py", line 170, in encode_string_element
    return b"\x02" + encode_cstring(name) + encode_string(value)
  File "/home/user/.local/lib/python3.7/site-packages/bson/codec.py", line 125, in encode_string
    value = value.encode("utf-8")
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce1' in position 13: surrogates not allowed

I read the source code, and it seems to not offer any quick fix (something like encode(errors="ignore").

Might the text be passing the condition?

   if isinstance(value, text_type)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant