-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 encoding woes #64
Comments
Hi @martint17r - thanks for the issue. I'm curious why this hasn't been a problem for any other consumers of this library. I am a little concerned about the downstream effects of such a change. I suspect there is no greater issue because our target-stitch library uses the corresponding I'm not sure if its relevant in this case, but in the past - when In any case, I'd be happy to merge a PR for this change if you make the default function work as it has worked but provide a flag to use your described behavior. |
I believe the reason why this has not popped up anywhere else is that this is not a problem within python, i.e. the escaped unicode notation is a Python specific encoding, see 7.2 Python specific encodings, which it tolerates. I ran into that problem, because I am trying to implement a target in Go. According to RFC 8259 Ch. 8.1 JSON must be encoded in UTF-8, even RFC 7493 Ch. 2.1 agrees on that. Would you accept a second pull request to I agree that this mandates a major version change in semver notation. |
This makes sense and both parts sound like a good change. I've done some preliminary testing and Python seems happy to decode both the escaped and not-escaped versions of the string. Making it an option seems unnecessary - if you could make a PR to add the flag and we can do a major bump to 6.0.0. Adding to the Singer Spec sounds good too. Thanks for this! (note: this SO question helped me understand the issue and there seems to be less confusion in Python 3 than 2 - the |
Any update on this? For now I am using |
while using tap-pipedrive I noticed that the output produced - ultimately by
format_message
inmessages.py
is using simplejson with the default value ofensure_ascii=True
- is encoded in Pythons escaped unicode (literal \u followed by 4 hexadecimal digits).This confuses a lot of my later processing. I am not sure how to properly fix that later on.
I set PYTHONIOENCODING to utf-8 and it looks like the setting is working:
The output from tap-pipedrive is unchanged, though.
A way to change the output encoding is to set
ensure_ascii=False
when calling simplejson.dumps. Would you accept a PR for that?The text was updated successfully, but these errors were encountered: