-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to Julia 0.6-1.0, drop support for 0.4-0.5 #24
Conversation
Failures on 1.0 are from Zip.jl |
Sorry about the tabs. I've changed them to spaces. Can you rebase? Thanks. |
Sorry about the changes to white space, I didn't realize that some parts of the code used tabs until it was too late.
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes!
src/NPZ.jl
Outdated
@@ -58,36 +57,37 @@ end | |||
# be fixed by rehashing the Dict when the module is | |||
# loaded. | |||
|
|||
readle(ios::IO, ::Type{UInt16}) = htol(read(ios, UInt16)) | |||
readle(ios::IO, ::Type{T}) where T = ltoh(read(ios, T)) #ltoh is inverse of htol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a space after #
like all other comments
src/NPZ.jl
Outdated
writele(ios::IO, x::UInt16) = writele(ios, reinterpret(UInt8, [htol(x)])) | ||
# Endianness only pertains to multi-byte things | ||
writele(ios::IO, x::AbstractVector{UInt8}) = writecheck(ios, x) | ||
# NPY headers only have ascii strings (bytes) so endianness doesn't matter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should be assuming we're writing NPY header. In the future, we might be writing string data. Either change it back take a String
input instead of AbstractString
so that the input is UTF-8 encoded or handle byte order for any type of string (e.g. UTF-16).
In fact, I don't understand why all the String
needs to be converted to AbstractString
in the functions below. Since the header is ASCII, String
works for us. There is no need to be more general, which has potential to slow down parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm my comment isn't quite accurate, it should work on any utf-8 string, which is the only string encoding that I've come across in Julia since Julia 0.5. My intent with using AbstractString
type bound is to catch the many AbstractString
s in Julia that use utf-8. In Julia 0.7 more functions produce SubString
etc, which in fact leads to breakage in NPZ.jl (see my comment in #23).
Excluding UTF-16, and other string types that I haven't seen in julia, there are still plenty of AbstractStrings
that would work for these functions, and these functions need to handle at least SubString
. The change regarding production of non-String
AbstractString
s by base functions that bites here is JuliaLang/julia#22496, where all strip functions return SubString
s. As you can see below, all AbstractString
subtypes in base are utf-8 encoded, but manipulate the order, extent, or content of an underlying String
.
julia> subtypes(AbstractString)
4-element Array{Any,1}:
String
SubString
SubstitutionString
Test.GenericString
In fact, utf-8 is always a sequence of single bytes, even when codepoints are not themselves single bytes, so utf-8 strings are indifferent to endianness [#1]. Ascii is a subset of utf-8, and I think can be safely treated as always being utf-8.
However, you are right that someone may want to write other utf-encodings using strings generated from some package or other source. I'll change it so the codeunits are inspected before being written. This should allow it to work for all utf-8 AbstractString
s
src/NPZ.jl
Outdated
function parsechar(s::String, c::Char) | ||
writele(ios::IO, x::UInt16) = writecheck(ios, htol(x)) | ||
|
||
function parsechar(s::AbstractString, c::Char) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was already a bit wrong in that it assumes c
is an ASCII character. If it's not, it'll crash:
julia> s = "\u2200 x \u2203 y"
"∀ x ∃ y"
julia> typeof(s)
String
julia> s[1]
'∀': Unicode U+2200 (category Sm: Symbol, math)
julia> s[2:end]
ERROR: UnicodeError: invalid character index 2 (0x88 is a continuation byte)
Stacktrace:
[1] macro expansion at ./REPL.jl:97 [inlined]
[2] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73
Now that it's AbstractString
, it may be UTF-16 and become always wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it was already broken for more than just the character argument. String
can be (and is) utf8, and you just demonstrated that it breaks on utf8.
To demonstrate...
julia> foo(s::String) = s
foo (generic function with 1 method)
julia> bar(s::String) = s[2]
bar (generic function with 1 method)
julia> s = "♨★ 𝕦ᑎ𝐢ς𝐎𝐝є!! ☯✌"
"♨★ 𝕦ᑎ𝐢ς𝐎𝐝є!! ☯✌"
julia> foo(s)
"♨★ 𝕦ᑎ𝐢ς𝐎𝐝є!! ☯✌"
julia> bar(s)
ERROR: StringIndexError("♨★ 𝕦ᑎ𝐢ς𝐎𝐝є!! ☯✌", 2)
Stacktrace:
[1] string_index_err(::String, ::Int64) at ./strings/string.jl:12
[2] getindex_continued(::String, ::Int64, ::UInt32) at ./strings/string.jl:216
[3] getindex at ./strings/string.jl:209 [inlined]
[4] bar(::String) at ./REPL[31]:1
[5] top-level scope at none:0
I fixed the unicode problems. |
I can't say I'm happy with the new string indexing -- it's too verbose. But it's acceptable. |
Thanks for all the fixes! |
Seems like the price you have to pay if you want to play with unicode. Thanks again for maintaining such a useful package! |
Sorry about the changes to white space, I didn't realize that some parts of the
code used tabs until it was too late.