-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement isdisjoint
#13192
Implement isdisjoint
#13192
Conversation
Addresses #13189. |
See https://github.com/JuliaLang/julia/blob/master/CONTRIBUTING.md#improving-documentation - docstring either in helpdb or inline, and add the signature to the rst docs so |
I feel like there has to be a much more efficient way to implement this. Checking, for each element in the first set, whether it is in the second set seems like it would be more efficient than constructing the intersection. This appears to be checking for pairwise disjointness, which is not obviously the only correct meaning for n-ary disjointness. A definition equivalent to |
@StefanKarpinski An implementation based on A definition equivalent to |
It's standard in mathematical texts to explicitly say "pairwise disjoint" or "collectively disjoint" to distinguish these two cases. I would argue that the term "disjoint" by itself is ambiguous for more than two arguments. |
@StefanKarpinski Is this a generic comment, or do you suggest to rename the function to |
|
Efficiency is not an argument; these are generic implementations that can be overwritten for specific datatypes, for example if elements are kept in a certain order. For example, the existing generic implementation of # symdiff is associative, so a relatively clean
# way to implement this is by using setdiff and union, and
# recursing. Has the advantage of keeping order, too, but
# not as fast as other methods that make a single pass and
# store counts with a Dict. It seems you're arguing to omit the multi-argument version of |
@tkelman I didn't understand the text in It seems that the structure of the documentation is completely decoupled from the structure of the source code. This is strange. Wouldn't it be easier to generate a reference manual by following the source code structure, and by putting all functions there to ensure no one is missing, wrong, or a ghost? I was trying to follow e.g. the definition of So here is what I did:
I don't understand how I could lose documentation with this. I assume that Did I miss something? |
70f5e76
to
2aee9ed
Compare
isdisjoint
, both generically, and optimized for Set
isdisjoint
Yeah the docsystem had a major revamp recently and things are in a still-confusing state right now. Sorry.
I'm not sure I follow you here. The "docstring body" is the documentation that you add, and gets spliced into the RST under an "autogenerated from julia source" comment.
Yes, probably. There's still some awkwardness with how multiple dispatch and macro-generated methods interact with documentation, in terms of where to put docstrings and how many of them to write. There are several issues about continuing to move more of the doc system into Julia/Markdown and away from the existing sphinx manual which should eventually get rid of the awkwardness of having separate md and rst representations of the docstrings, but we're not there yet. |
Efficiency is an argument when it comes to what should or should not go in a standard library – if writing I have no position on wether the multi-argument form should check for pairwise disjointness of collective disjointness – both are valid and useful operations. I am saying that it would be wrong to have a multiargument form of
|
Going with Python here and punting and multiple arguments for Regarding efficiency: The standard library has a generic implementation of My |
My OCDness wants those to be are instead of is! 😀 |
Yeah, this is where the |
Another thing - I was wondering if for more than 2 arguments that iteratively checking all elements of A to see if they are in B for all pairs would be rather expensive. I can vaguely remember algorithms for these sorts of things from 6.046 with Rivest and Leiserson, but that was > 30 years ago! |
Regarding the improved algorithm: Whether this makes sense depends on the collection. The implementation here is a "reasonable" default algorithm for many kinds of collections. I expect various collections to provide specialized implementations that are more efficient. See the comment for the default implementation of |
2aee9ed
to
5609ef1
Compare
Determine whether the collections `v1` and `v2` are disjoint, using | ||
the function `in` to check. | ||
""" | ||
isdisjoint(v1, v2) = all(v -> v ∉ v1, v2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably iterate over the smaller of the two sets? If one of the sets is much bigger than the other that could make a huge difference.
Also, if the point of putting it in the standard library is efficiency, probably an explicit loop would be better than a higher-order function.
5609ef1
to
95b4631
Compare
@stevengj I addressed your comments. |
""" | ||
function _isdisjoint(v1, v2) | ||
for v in v1 | ||
if v in v2 return false end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These kinds of one-liners are usually written v in v2 && return false
elsewhere in Base.
I have to wonder if there isn't a better way to generically express this. It seems like we have a mixture of things here: universal or existential, pairwise versus collective, some pairwise operation on collections. |
@StefanKarpinski You mean something like |
The O(n^2) complexity indicates to me that such an API isn't quite right yet, but I'm not sure what's better. |
95b4631
to
63bcfa4
Compare
I think algorithms with better complexity must make stronger assumptions about the collections involved. That is, you want to use other operations than just |
This patch is ready to be merged. @StefanKarpinski's comments above refer to a proposed extension of the current API (allowing more than two sets to be passed) and does not apply to the current patch. Also, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a rebase. Looks like it was consensus accepted for merging, then just got forgotten and neglected.
function isdisjoint(v1, v2) | ||
# Iterate over the smaller set | ||
# Use a function call to ensure type stability | ||
length(v1)<=length(v2) ? _isdisjoint(v1,v2) : _isdisjoint(v2,v1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
length(v1)<=length(v2) ? _isdisjoint(v1,v2) : _isdisjoint(v2,v1) | |
if IteratorSize(v1) isa HasLength && IteratorSize(v2) isa HasLength && length(v2) < length(v1) | |
return _isdisjoint(v2, v1) | |
end | |
return _isdisjoint(v1, v2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also you may want to handle AbstractSets specially since it's much faster to check containment in an AbstractSet.
@eschnett Do you have time to finish this? I sometimes implement this for my own, so I wish this be in the standard library. |
@bicycle1885 Oh wow, that was four years ago... Not sure if I'll have time in the coming week. |
This was added in #13192, but the PR became stale. We've also changed a bit how we write these kinds of algorithms, so make use of the tools we have (e.g. `fastin`).
This was added in #13192, but the PR became stale. We've also changed a bit how we write these kinds of algorithms, so make use of the tools we have (e.g. `fastin`).
This was added in #13192, but the PR became stale. We've also changed a bit how we write these kinds of algorithms, so make use of the tools we have (e.g. `fastin`).
This was added in #13192, but the PR became stale. We've also changed a bit how we write these kinds of algorithms, so make use of the tools we have (e.g. `fastin`).
Rebased in #34427. |
This implements the functionality. Which files do I need to change manually to document it?
fixes #13189