-
Notifications
You must be signed in to change notification settings - Fork 15
What are the actual algorithms we should use? #70
Comments
Arguments of methods from this proposal are iterables, not sets, so we can't get |
That question is also not yet decided. I think it would be pretty weird to force them to be consumed as iterables when the common case is that you're passing a Set, since treating the argument as a Set allows the use of algorithms with better time complexity. |
At least, it's useful to pass, for example, arrays to such methods, so limiting it only to sets is a bad idea - even in the case of optimizing performance. If you have 2 sets and wanna optimize performance - you can manually choose the biggest and call |
It's anyway |
There's two relevant parameters: the size of the receiver and the size of the argument. Call them To put it another way: intersection a one-element Set with a thousand-element Set requires one set lookup. Intersecting a one-element set with a thousand-element iterable requires a thousand set lookups. It is not obvious to me that we should specify the option which is a thousand times worse, here. |
However, in computer science, both of those cases have the same, linear, time complexity. |
No, |
What of those cases is sublinear or requires more than linear time? |
Nobody argues that by choosing to iterate over a smaller collection, you can optimize the method, but this has nothing to do with time complexity. |
It doesn't really make sense to talk about whether something is sub-linear when it's a function of multiple variables. |
In this case, it makes no sense to talk about time complexity. This is definitely not the case when the multiplicity of function variables affects it. |
OK, let's be precise about this, I guess. There's a few different definitions people use for big-O for multiple variables. They should all work out to be about the same here, but the one I generally use is that given in (3.1) of this paper, i.e.: For (The hat operator is a technical thing which doesn't end up being relevant here; we can just deal with the first part, I claim that Containment is trivial, you can just reuse the same To show the containment is strict, consider |
I don't understand, what are you trying to show? Sure, I had read this document many years ago. Iteration |
I am trying to show - have shown, in fact - is that |
...have tried to show. |
We can also go with way of
|
No, it's not. I wrote out a proof. Was part of it unclear?
Yeah, I considered this. I suspect there will not be much appetite for adding more implementation-defined stuff to the spec - I don't love the idea myself - but it is technically an option. |
Are you serious? You posted the first link about the asymptotic notation from the search (the perfect proof, but to what?). You posted a good proof that |
The claim was that
Yes, which is formalized in the definition I referred to in the proof. And with that definition you can show that they are not the same complexity class. I don't think I'm capable of explaining this in a way which is going to be clear to you, so I'm going to step away from this conversation now. |
Maybe there is a middle ground? function intersection(setA, setB) {
let primary = {
has: setA.has.bind(setA),
it: setA[Symbol.iterator](),
};
let secondary = {
has: setB.has.bind(setB),
it: setB[Symbol.iterator](),
};
const retVal = new Set();
for (;;) {
const {done, value} = primary.it.next();
if (done) break;
if (! primary.has(value)) throw new Error("Not set-like behavior");
if (secondary.has(value)) retVal.add(value);
[secondary, primary] = [primary, secondary]; // switch sets for next iteration of the loop
}
return retVal;
} |
@acutmore That algorithm doubles the number of calls required, which seems bad. |
Yes it sacrifices some performance while keeping Only a suggestion, the performance cost perhaps doesn’t outweigh the benefits. |
Oh, the point about order is a good one; it might be worth the cost of the extra calls to get a more consistent order. I'm unconvinced by the extra Edit: though it might be possible to get consistent order without this, depending on what the implementation of |
After some discussion about the actual implementation of Sets, I am now pretty sure we can specify that the order of the result should be the order of the elements in We'd still have the problem that the choice of which methods in |
The current spec text uses the time-complexity optimal algorithm for |
I feel I should point out that polyfills won't have access to the data that makes sorting of the intersection result efficient. Should we recommend that they just take the performance hit and always iterate over the original set? |
Up to each polyfill what it wants to do, but that's what I'd do. It's pretty common for new language features to be impossible to polyfill performantly. |
That's what my polyfills have done, given that there's no other choice. In practice, my experience tells me most Sets won't be prohibitively large anyways. |
Consider
intersection
. The most efficient way to implement it is to choose the smaller of the two sets and then iterate it checking membership in the other set. Are we going to specify that, with all the calls tosize
andhas
and so in that it entails? Are we going to specify something observably different from that, so that worse time complexity is observably required?For a more complex case, here's cpython's implementation of
difference
.This is only relevant if we decide that methods like
intersection
should make any observable method calls on their receivers or arguments (though either is enough for this to be relevant; for example, if we decided to performhas
calls on arguments, but not receivers). If we say that all methods operate on the internal slot directly, this isn't relevant.The text was updated successfully, but these errors were encountered: