Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid cutting off utf8 encoding halfway when truncating text values #148

Open
karimbahgat opened this issue Jun 6, 2018 · 0 comments
Open

Comments

@karimbahgat
Copy link
Collaborator

Currently encodes any unicode to bytes (using whichever encoding), and truncates afterwards to fit within the specified byte size of the field. I guess UTF8 and any encoding that allows characters that spans more than a single byte would stand at risk of cutting and invalidating such characters if they occur at the end of the text, spanning across the truncating limit. See #125.

karimbahgat added a commit that referenced this issue Sep 8, 2018
- Fixes issue in Py3 when converting text characters to byte strings, but in Py3 converts to unicode instead, because uses the Py2 specific str() function, instead of the version neutral b(). When the text contains non-ascii 2-byte unicode values this results in truncating the unicode length instead of the byte length, and thus results in incorrectly padded byte lengths and data values ending up in the wrong field/column. See #157, and also #148. 
- Also bump to next version.
@karimbahgat karimbahgat added this to the More robustness milestone Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant