Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setindex! incorrect for non-UTF-8 strings? #59

Open
nalimilan opened this issue May 7, 2019 · 2 comments
Open

setindex! incorrect for non-UTF-8 strings? #59

nalimilan opened this issue May 7, 2019 · 2 comments

Comments

@nalimilan
Copy link
Member

These two lines don't seem correct to me for non-UTF-8 AbstractString types:

resize!(buffer, l + sizeof(val))
unsafe_copyto!(pointer(buffer, l+1), pointer(val,1), sizeof(val))

Indeed this will copy the contents of the string even if it uses a different encoding from existing data.

@quinnj
Copy link
Member

quinnj commented May 7, 2019

what would you suggest though? is there a standard api for getting the encoding of a string? converting it to utf8? or maybe if it's not in the encoding of the rest of the array, we reject it?

@nalimilan
Copy link
Member Author

AFAIK there's no API to get the encoding of a string, but that would be a logical complement to codeunit/codeunits. BTW, there's no guaranty that you can call pointer on an AbstractString and get a pointer to the data: one would need to use codeunits anyway even if the encoding matched.

Waiting for a better API, I guess the only solution is to have a fast method for String with StringArray{<:Union{Missing, String}}, and a slower method iterating over characters for other cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants