-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Char no longer a subtype of Integer #8816
Conversation
+1 to this idea |
If it helps to argue for this idea, I'd say that codepoints are a convenience numeric encoding for what is clearly a categorical variable that doesn't possess any meaningful numerical properties. |
I agree but I also don't see why at least comparisons with integers can't work. |
I agree but we don't really have a well defined interface to support this concept. @JeffBezanson has pointed out that
I wanted to be conservative, integer comparison can be trivially added back now. |
So having thought about this a bit more I think that chars should not be numbers but support these very specific operations:
That covers all the useful operations you generally want to do with characters as integers but is specific enough to avoid too much confusion. |
Oh, in see that you have exactly those in here. |
I left out comparisons in the beginning as it was useful for finding what parts of Base this change touched. I'll point out that |
Why do you think it's a source of bugs? There's far less use for comparisons of pointers than for characters. |
Off the top of my head, I can't think of any two types in Base that define an equality relationship and do not share a common abstract super-type. |
They do – Any! |
The grisu code uses Char == Int comparisons. I wouldn't be surprised if other printing/showing code utilized it somehow. Also Period types in the Dates module can be equal to Real. Just a couple of data points. |
Ok fair enough :-) @quinnj are you sure? If you turn off character / integer comparison then all the grisu tests still pass. That might signal that are parts that are not being tested at the moment but I find that hard to believe. |
Ah, perhaps they don't. They originally did to mirror the C++ implementation, but I did end up changing over to using raw Array{Uint8,1} buffers instead. I still would have thought there'd have been a few comparisons in there at some point. I guess if not, all the better, right? |
5de9a78
to
eed3724
Compare
Char now supports a limited set of "integer like" behavior. * comparisons with integers * Char - Char = Int * Char + Int = Char update docs and add a NEWS.md entry
3845ca2
to
ac512af
Compare
Travis is passing now and I've updated the docs. Should comparison / equality operations be defined for all Numbers or just Integers? |
Nice to get rid of We'll get bug reports if people think |
I agree with @ivarne's take – let's stick with just allowing comparisons to integers for now. |
That's what I have now. Good to merge then? |
I support @jakebolewski's position that comparing ints and chars is more a source of bugs than anything. Inequalities like |
Perhaps, but we can make this less disruptive change first and then consider whether to do that. I'm good with merging this now. |
Actually, no code in Base relies on |
Make Char no longer a subtype of Integer
@@ -40,14 +56,10 @@ promote_rule(::Type{Char}, ::Type{Uint128}) = Uint128 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep these bitwise operations for Char
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be useful for masking.
PkgEval: This unsurprisingly broke a few things, including people doing something involving |
Regression fix from JuliaLang/julia#8816 Fixes #84
|
DataStructures:
(Does this mean you can't sort strings?) |
No, you just can't sort |
If you can write 'a':'z' and it produces the letters 'a','b',...,'z', it implies (to me) there is some sort of sensible ordering to apply |
if
|
julia> isless('a', 'c')
ERROR: `isless` has no method matching isless(::Char, ::Char)
julia> 'a' < 'c'
true I forget how |
According to the documentation we should rather implement
|
Is "canonical total order" CS 101? I know what a total order is, but the canonical part is not obvious to me and Wikipedia's definition:
didn't really help as "standard order" seems to be a non-standard term and "obeying certain rules" is rather vague. |
@ivarne these errors are gaps in test coverage, so any fix should include tests. |
Definitely true that. Seems like we have a pretty glaring hole in test coverage for |
asci = true | ||
d = sbuff.data | ||
for idx in 1:length(d) | ||
(d[idx] < 0x80) ? continue : (asci = false; break) | ||
(char(d[idx]) < char(0x80)) ? continue : (asci = false; break) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't have thought to do this, but now I'm wondering if I should -- what's the benefit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this doesn't make sense. This code is really comparing bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is because for UTF32 strings this is a Char / Integer comparison, and I removed those definitions some intermediate point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see -- good to know that's what UTF32String looks like under the hood. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I got confused because this code is kind of wrong. If it's supposed to work for arbitrary Strings it shouldn't be accessing .data
.
What should be the replacement for the logic for using ncurses where we need to test c == char(-1) It's now giving me |
Maybe I also have trouble understanding why we think of |
yeah just successfully tested that 5 secs ago. It's not a big deal. Thanks. |
|
Re: signedness of char, I'll bring up #7472 and #7303 in the spirit of unfun, do-not-want, things to watch out for, where char being (sometimes? depending on platform?) a signed fixed-width type caused nastiness like introducing unicode into the parser resulting in the windows binaries no longer working on XP... |
@tkelman How is issues with the signdness of the C 8 bit datatype I don't say that the signdness matters, I just want there to be reason why we behave differently than Go, which have very similar Unicode semantics with Julia. |
Just precedent for unforeseen consequences of a related, but not identical, issue. |
I'd want to know what the sign bit would mean. One option would be to serve as an "invalid" flag while allowing some payload. |
Not sure if it would come into play here, but Java rejects (valid) byte patterns expressed as hex literals because it treats its private static final byte[] sync = {0xFF, 0x12}; is not permitted. This is annoying and I'd appreciate its Julia equivalent working independent of char's signedness. |
I feel that this is surprising behavior, and this issue has come up in #5844 and #8569.
All the tests pass except for readlm and parallel (which hangs forever). I guess I'm looking for a up / down vote on this (breaking) behavior before fixing the last remaining issues.