setindex! incorrect for non-UTF-8 strings? #59

nalimilan · 2019-05-07T20:57:00Z

These two lines don't seem correct to me for non-UTF-8 AbstractString types:

Lines 369 to 370 in caf4ed4

    
           resize!(buffer, l + sizeof(val)) 
        
           unsafe_copyto!(pointer(buffer, l+1), pointer(val,1), sizeof(val))

Indeed this will copy the contents of the string even if it uses a different encoding from existing data.

The text was updated successfully, but these errors were encountered:

quinnj · 2019-05-07T21:20:01Z

what would you suggest though? is there a standard api for getting the encoding of a string? converting it to utf8? or maybe if it's not in the encoding of the rest of the array, we reject it?

nalimilan · 2019-05-08T09:13:46Z

AFAIK there's no API to get the encoding of a string, but that would be a logical complement to codeunit/codeunits. BTW, there's no guaranty that you can call pointer on an AbstractString and get a pointer to the data: one would need to use codeunits anyway even if the encoding matched.

Waiting for a better API, I guess the only solution is to have a fast method for String with StringArray{<:Union{Missing, String}}, and a slower method iterating over characters for other cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

setindex! incorrect for non-UTF-8 strings? #59

setindex! incorrect for non-UTF-8 strings? #59

nalimilan commented May 7, 2019

quinnj commented May 7, 2019

nalimilan commented May 8, 2019

setindex! incorrect for non-UTF-8 strings? #59

setindex! incorrect for non-UTF-8 strings? #59

Comments

nalimilan commented May 7, 2019

quinnj commented May 7, 2019

nalimilan commented May 8, 2019