Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes #175. utf-8 errors in Python 3. #208

Merged
merged 1 commit into from
Mar 19, 2019
Merged

Conversation

mcarlsen
Copy link
Contributor

@mcarlsen mcarlsen commented Nov 6, 2017

Caused by block reads chopping multibyte utf-8 sequences in half

Because of the 8192 bytes read block size, a utf-8 character can possibly be cut in two, causing the block to be invalid utf-8.

Fixed by not decoding the block. Instead encode the delimiter and do the replace operation with bytes instead of str.

@dancrew32
Copy link

This patch is great, thanks! Who should we assign it to in order to merge it into master?

@Kirill-Babkin
Copy link

Kirill-Babkin commented Mar 15, 2019

Hey Hey is anyone is still trying to get it in, this fixed solved a problem I was having with the SDK and I would love to see it in SDK.

@chattarajoy chattarajoy changed the base branch from master to unreleased March 15, 2019 03:46
@msumit msumit merged commit 3648855 into qubole:unreleased Mar 19, 2019
@msumit
Copy link
Contributor

msumit commented Mar 19, 2019

Thanks for the patch. Merged into unreleased branch for now, which will be picked up in the next release and master branch as well.

chattarajoy pushed a commit that referenced this pull request May 14, 2019
Caused by block reads chopping multibyte utf-8 sequences in half.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants