-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issetequal behavior with duplicate elements #32550
Comments
Good catch. That algorithm does work for sets but is incorrect for any collection that can have duplicates. One option would be to specialize the current implementation for |
There could be an |
Why does this even allow duplicates? Do any other languages have a standard implementation of set that behaves like this? Seems like |
@mcognetta My reading is that |
@pearlzli Line 291 in f5a50be
Line 226 in f5a50be
|
I think what I meant is: suppose we restricted I should specify that when I say "restrict to only work on sets", I mean the case where we only define |
I think the right thing to do here is just to fix the implementation for non-sets. |
Hello, I am a first-time contributor. I read through this issue and the implementation and I have ideas on how to implement The
We should test the runtime similarly to #26198 to compare different algorithms and input sizes. The difference in speed might not be significant since all of these approaches are O(n + m), and the overhead of creating two sets is most likely more inefficient than a simpler approach for small inputs. |
Hello, I got around to testing the runtimes. My code is in this jupyter notebook. Essentially, Unless anyone has any other ideas on how to implement |
I've got an in-progress PR for this that I just need to finish up. It's trickier than it might seem :) |
add tests that set ops fail for non-sets (#32550)
I put up a PR that checks some cases where you can avoid constructing a set: basically, when one side is a set and the other side has length, you can check if the set has too many unique values and return false early. It's a slight optimization but it's the best I could come up with. Last resort is to just make sets and thereby guarantee unique elements counts. |
fix #32550: issetequal with duplicate values
Broken by JuliaLang/julia#32550 Fixed by enforcing uniqueness of CustomSet elements when adding to them with push! This internal representation of the Set as an Array is horrible and should be fixed at some point.
Broken by JuliaLang/julia#32550 Fixed by enforcing uniqueness of CustomSet elements when adding to them with push! This internal representation of the Set as an Array is horrible and should be fixed at some point.
Broken by JuliaLang/julia#32550 Fixed by enforcing uniqueness of CustomSet elements when adding to them with push! This internal representation of the Set as an Array is horrible and should be fixed at some point.
As I mentioned on Discourse, the
issetequal
docstring says thatissetequal(a, b)
is equivalent toa ⊆ b && b ⊆ a
, but this isn't the case when there are duplicate elements:This is because the implementation assumes no duplicate elements:
The text was updated successfully, but these errors were encountered: