-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
should in
use isequal
to test for containment?
#9381
Comments
Wouldn't this necessitate two versions of Dicts / Sets, one using Edit. I see we already have this behavior. |
This is another case that can be justified on the basis that the IEEE standard specifies the behavior of |
@jakebolewski, julia> Set([1,1.0])
Set([1.0]) This is a pretty straightforward consequence of using Dict as the foundation for Set. If we wanted the egal behavior, we would need an |
This kind of sucks, but I think we have to live with it – it's either that or introduce yet another kind of equality, which we can do over my cold, dead body. |
We definitely don't need another kind of equality. I think we will opt to keep you alive instead. |
I'm relieved. |
The other option would be to make |
Oh, we can't make |
Is it really required that As a data point, MATLAB's |
It's not strictly necessary, but I think anything else would be kind of a disaster, imo. |
Interesting data point:
I'm not sure how python's |
I'm not so sure. |
One thing worth considering is making This could be done via function totalorder(x::Float64,y::Float64)
xi = reinterpret(Int64,x)
yi = reinterpret(Int64,y)
(xi < yi) $ ((xi < 0) & (yi < 0))
end (which also seems to be faster than our current |
That sort order has always struck me as weird and useless – why would I want some NaNs to come at the beginning and some at the end? If we started doing this, I feel like we'd almost need to start printing NaNs with a sign, which would also be strange. The only advantage I can see to the IEEE ordering is that you can implement it efficiently with integer comparisons. |
Yep, that's how to think like a 1980s computer scientist :) |
Alright, so I would be ok with making
I'm not inclined to have some NaNs negative while others are positive. That's just silly and useless. |
Yes, that might be OK. Have to think about it. By the way, if there are any python experts around I really would like to know what is happening in the example I posted above. It seems
EDIT: Ok, my guess is that |
Oh, Python, you so crazy. |
Python's So, Jeff's guess is correct. (Yes, I've spent far too much time looking at the CPython codebase for someone who never programs in Python.) |
Clearly, I used up my "oh, Python, you so crazy" way too early in this thread. |
This is an abstract from Python's documentation for comparison operators:
So even while
|
@Ismael-VC, so their documentation is wrong, because it effectively uses |
Actually, the radix sort function in SortingAlgorithms.jl does use integer On Wednesday, December 17, 2014, Jeff Bezanson notifications@github.com
|
I think -0.0 is more important than NaN: code that tries to compare with NaN is always going to have problems. Finding |
How would you implement that efficiently for Sets? I feel that as containers arrays should behave like ordered multisets, which to me implies using isequal.
|
changes here aren't in scope for 0.6 |
Triage decision: we should change this and |
I should have posted this here (or maybe somewhere else?) instead of @JackDevine's PR (sorry). I've been thinking about this as well as our current plans for julia> [NaN] == [NaN]
false
julia> Set(NaN) == Set(NaN)
true
julia> using Nulls
julia> [null] == [null]
null
julia> Set(null) == Set(null)
ERROR: MethodError: no method matching length(::Nulls.Null)
Closest candidates are:
length(::SimpleVector) at essentials.jl:528
length(::Base.MethodList) at reflection.jl:670
length(::MethodTable) at reflection.jl:745
...
Stacktrace:
[1] union!(::Set{Any}, ::Nulls.Null) at ./set.jl:126
[2] Set(::Nulls.Null) at ./set.jl:19 I would make the argument that "two containers are equal when they contain (tested via (For context, I became interested in this because I am slightly worried about the way |
So it seems like this is potentially a good coherent position to take:
This is a containment strategy for "unusual" scalar behaviors like those of
It gets a little weird when you consider containers as scalars, i.e. when doing |
The @StefanKarpinski's plan is reasonable and consistent, but I'm not sure that would really be a good idea. Let me stress that in practice I don't really care how First, having Second, we have chosen the policy to propagate missing values everywhere to be on the safe side. That can be quite cumbersome in practice. Personally for my own work I'd find it more convenient for Third, I don't really see why |
I agree with @nalimilan here:
Quoting @andyferris from #24563
Boiling so much information down to one sentence helps clarify the situation a lot (thanks Andy). What I don't understand about the above sentence though, is what is meant by "equal". If #24563 goes through, then we would have: Containers are But we would not have: Containers are Like Milan says, if there's a missing value, you basically don't know whether the arrays are equal or not. However, if both arrays have a missing value in the same place then they are |
Slightly related question: should the type of the container matter? E.g., if I have a More interesting for me, would |
The other way to go with this is to put |
The idea here is that I do think the type of container matters. For example, we have this ugly check:
It's pretty clear to me that we should have a All of that is to say that the comparison function used determines the meaning of the object, and so |
Makes sense. Maybe it's not so bad to say that |
Bump, Will it be possible to make a decision on this before the feature freeze? I am not completely sure what it is that we want to do here. However, if we come up with a solid plan, then I am happy to help out. |
I see the minimal effort version of a coherent story for container
(It's also possible for us to state that the keys of |
That sounds good, are you arguing that we make |
Sorry, I forgot to mention Yes, in the precise sense of my opening statement above - the minimal effort version is to do very little, and cover it over with a convenient and somewhat coherent story afterwards :) Going into more detail, the data people appear to consider Sorry to wax philosophically here, but to me the choice between I think the answer is that sometimes you will be doing some software engineering, and sometimes you will be doing some data science, and in both cases you might want to test for containment. For The reasoning behind what I wrote in my last post is that testing whether a key exists seems more like software concern where in However, even that default semantic is tricky because sometimes |
Considering this more, I wonder if having It would let us use |
I don't think we want both |
Ok, so I think that the current plan is to have If people think that the above is reasonable, then I could make the changes and add a little bit to the docstring that describes the intention of |
Revisiting this, I tend to think we should leave it as-is. Currently
This is nice and simple. If we don't want to change rule (2) to use |
|
I have a vector that is meant to have unique values but it has both I can't use If we are going to keep the two behaviours, which I'm fine with given the somewhat reasonable rationale discussed above, then users need to be able to manage both behaviours. I don't know what the best implementation is, but something like Or any other implementation that allows me to combine |
This is how Dicts work, so there's a case to me made. Came up in this discussion:
https://groups.google.com/forum/#!topic/julia-users/XZDm57yHc5M
Motivating examples:
It seems like in for arrays is the one that's out of line here.
The text was updated successfully, but these errors were encountered: