You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now if you want to use an encoding, you have to make sure that all fields that contain non-ascii characters are in that encoding. Maybe this is okay, but I'm working with a lot of data and it is a bit cumbersome for me to be doing these checks constantly. For example
The question is, is this something that the user should be worrying about or is there a "safe_encode" that could be used instead, similar idea to #1804?
The text was updated successfully, but these errors were encountered:
Just ran into this again. I'm starting to have the feeling that the I/O operations should do the conversion. For example, I want to read in a utf-8 file and then convert it to latin-1. It would be great if I could do
read_csv(..., 'utf-8')
to_csv(..., 'latin-1')
but currently you have to go through the intermediate apply step for each string-like column. Might work on a PR for this if I can get some time this afternoon.
After a week of digging through the unicode nether-regions of the tree, I support jseabold's
comment on strictly enforcing unicode internally and encoding/decoding at I/O
points only.
Otherwise, you end up having brittle assumptions about encodings and if clauses handling corner-cases
all over the codebase.
I apologize for all these useless messages, github is doing unexpected things, and I can't
figure out how to purge these. hopefully they'll go away when I tear down the PR branches.
Right now if you want to use an encoding, you have to make sure that all fields that contain non-ascii characters are in that encoding. Maybe this is okay, but I'm working with a lot of data and it is a bit cumbersome for me to be doing these checks constantly. For example
The question is, is this something that the user should be worrying about or is there a "safe_encode" that could be used instead, similar idea to #1804?
The text was updated successfully, but these errors were encountered: