What are the actual algorithms we should use? #70

bakkot · 2022-02-11T23:13:47Z

Consider intersection. The most efficient way to implement it is to choose the smaller of the two sets and then iterate it checking membership in the other set. Are we going to specify that, with all the calls to size and has and so in that it entails? Are we going to specify something observably different from that, so that worse time complexity is observably required?

For a more complex case, here's cpython's implementation of difference.

This is only relevant if we decide that methods like intersection should make any observable method calls on their receivers or arguments (though either is enough for this to be relevant; for example, if we decided to perform has calls on arguments, but not receivers). If we say that all methods operate on the internal slot directly, this isn't relevant.

The text was updated successfully, but these errors were encountered:

zloirock · 2022-02-12T04:22:44Z

Arguments of methods from this proposal are iterables, not sets, so we can't get size.

bakkot · 2022-02-12T04:29:29Z

That question is also not yet decided. I think it would be pretty weird to force them to be consumed as iterables when the common case is that you're passing a Set, since treating the argument as a Set allows the use of algorithms with better time complexity.

zloirock · 2022-02-12T04:42:26Z

At least, it's useful to pass, for example, arrays to such methods, so limiting it only to sets is a bad idea - even in the case of optimizing performance. If you have 2 sets and wanna optimize performance - you can manually choose the biggest and call .intersection on it.

zloirock · 2022-02-12T05:02:17Z

since treating the argument as a Set allows the use of algorithms with better time complexity

It's anyway O(n) (if consider checking the existence of Set element as O(1)) since it's required to iterate one of the collections.

bakkot · 2022-02-12T05:20:48Z

It's anyway O(n)

There's two relevant parameters: the size of the receiver and the size of the argument. Call them n and m respectively. Assuming constant-time set membership testing, intersecting a Set with a Set is O(min(n, m)). Intersecting a Set with an iterable is O(m). That matters when the receiver is small and the argument is large.

To put it another way: intersection a one-element Set with a thousand-element Set requires one set lookup. Intersecting a one-element set with a thousand-element iterable requires a thousand set lookups. It is not obvious to me that we should specify the option which is a thousand times worse, here.

zloirock · 2022-02-12T05:59:35Z

However, in computer science, both of those cases have the same, linear, time complexity.

bakkot · 2022-02-12T06:18:44Z

No, O(min(n, m)) is not the same as O(m).

zloirock · 2022-02-12T06:39:07Z

What of those cases is sublinear or requires more than linear time?

zloirock · 2022-02-12T06:43:29Z

Nobody argues that by choosing to iterate over a smaller collection, you can optimize the method, but this has nothing to do with time complexity.

bakkot · 2022-02-12T06:49:18Z

It doesn't really make sense to talk about whether something is sub-linear when it's a function of multiple variables. O(min(n, m)) is not the same time complexity as O(m).

zloirock · 2022-02-12T07:41:35Z

In this case, it makes no sense to talk about time complexity. This is definitely not the case when the multiplicity of function variables affects it.

bakkot · 2022-02-12T07:59:38Z

OK, let's be precise about this, I guess.

There's a few different definitions people use for big-O for multiple variables. They should all work out to be about the same here, but the one I generally use is that given in (3.1) of this paper, i.e.:

For f ∈ ℝ x ℝ -> ℝ, O(f(n, m)) is the set of all functions g ∈ ℝ x ℝ -> ℝ for which there exists c ≥ 1 ∈ ℝ and N ∈ ℕ such that for all n ≥ N and m ≥ N, g(n, m) ≤ c * f(n, m) and hat(g)(n, m) ≤ c * hat(f)(n, m).

(The hat operator is a technical thing which doesn't end up being relevant here; we can just deal with the first part, g(n, m) ≤ c * f(n, m).)

I claim that O(min(n, m)) is a strict subset of O(m).

Containment is trivial, you can just reuse the same c and N. That is, if we suppose a function g ∈ O(min(n, m)), that means there exists c and N such that for all n ≥ N and m ≥ N, g(n, m) ≤ c * min(n, m). Since min(n, m) ≤ m by definition, this implies g(n, m) ≤ c * m.

To show the containment is strict, consider g(n, m) = m. This is in O(m) trivially. It is not in O(min(n, m)): for any choice of c and N, set n = N and m = c * N + 1. This satisfies n ≥ N and m ≥ N, but we have g(n, m) = m = c * N + 1 and c * min(n, m) = c * min(N, c * N + 1) = c * N, so g(n, m) is not ≤ c * min(n, m). ∎

zloirock · 2022-02-12T08:52:40Z

I don't understand, what are you trying to show? Sure, I had read this document many years ago. Iteration n times and 2n times - they both have linear time complexity. The same with iteration over all n or m elements - here is no difference, even if min(n, m) <= m what's obvious.

bakkot · 2022-02-12T08:57:55Z

I don't understand, what are you trying to show?

I am trying to show - have shown, in fact - is that O(min(n, m)) is strictly better time complexity than O(m).

zloirock · 2022-02-12T09:06:26Z

...have tried to show. O shows the worst case - and the worst case for them is similar.

Ginden · 2022-02-12T15:53:17Z

We can also go with way of Array.prototype.sort if we decide not to use .has checks as per #50.

Sort items using an implementation-defined sequence of calls to SortCompare. If any such call returns an abrupt completion, stop before performing any further calls to SortCompare or steps in this algorithm and return that completion.

bakkot · 2022-02-12T16:18:22Z

...have tried to show. O shows the worst case - and the worst case for them is similar.

No, it's not. I wrote out a proof. Was part of it unclear?

We can also go with way of Array.prototype.sort

Yeah, I considered this. I suspect there will not be much appetite for adding more implementation-defined stuff to the spec - I don't love the idea myself - but it is technically an option.

zloirock · 2022-02-12T16:50:52Z

No, it's not. I wrote out a proof. Was part of it unclear?

Are you serious? You posted the first link about the asymptotic notation from the search (the perfect proof, but to what?). You posted a good proof that min(n, m) <= m, that's obvious. However, the big O definition means asymptotic bounding from above - the worst case (when variables go to infinity) - and for them the worst case is similar.

bakkot · 2022-02-12T17:12:21Z

You posted a good proof that min(n, m) <= m

The claim was that O(min(m, n)) is a strict subset of O(m), not that min(n, m) <= m. In particular, min(n, m) is O(m) but m is not O(min(m, n)). This is what it means to have better time complexity.

However, the big O definition means asymptotic bounding from above

Yes, which is formalized in the definition I referred to in the proof. And with that definition you can show that they are not the same complexity class.

I don't think I'm capable of explaining this in a way which is going to be clear to you, so I'm going to step away from this conversation now.

acutmore · 2022-07-10T09:34:07Z

Maybe there is a middle ground?

function intersection(setA, setB) {
  let primary = {
    has: setA.has.bind(setA),
    it: setA[Symbol.iterator](),
  };
  let secondary = {
    has: setB.has.bind(setB),
    it: setB[Symbol.iterator](),
  };
  const retVal = new Set();
  for (;;) {
    const {done, value} = primary.it.next();
    if (done) break;
    if (! primary.has(value)) throw new Error("Not set-like behavior");
    if (secondary.has(value)) retVal.add(value);
    [secondary, primary] = [primary, secondary]; // switch sets for next iteration of the loop 
  }    
  return retVal;
}

bakkot · 2022-07-10T19:16:09Z

@acutmore That algorithm doubles the number of calls required, which seems bad.

acutmore · 2022-07-10T19:25:56Z

Yes it sacrifices some performance while keeping O(min(n,m)), the attempted benefits are that the resulting set order doesn’t flip when the argument size changes and can catch Map being passed as an argument (as long as this.size > 1, though that could be improved).

Only a suggestion, the performance cost perhaps doesn’t outweigh the benefits.

bakkot · 2022-07-10T19:51:09Z

Oh, the point about order is a good one; it might be worth the cost of the extra calls to get a more consistent order.

I'm unconvinced by the extra primary.has(value) call, though; it's pretty unusual for JS methods to do extra calls which do nothing except establish consistency. Usually we just say it's the caller's responsibility to ensure consistency.

Edit: though it might be possible to get consistent order without this, depending on what the implementation of Set actually is; I'll have to check with implementors.

bakkot · 2022-07-11T18:46:15Z

After some discussion about the actual implementation of Sets, I am now pretty sure we can specify that the order of the result should be the order of the elements in this without needing to do extra calls and without significant overhead. (It does add an O(log(size of result)) factor overhead to the theoretical performance, but no user-observable calls, so I think it's worth it.)

We'd still have the problem that the choice of which methods in other to call would depend on the sizes of the inputs, but I'm personally OK with that; the implementor of the Set-like is responsible for ensuring that doesn't matter.

bakkot · 2022-09-17T20:15:56Z

The current spec text uses the time-complexity optimal algorithm for intersection and requires that the result be ordered according to the order in the receiver, which should be possible to do efficiently per the previous comment.

brad4d · 2023-01-12T22:40:25Z

I feel I should point out that polyfills won't have access to the data that makes sorting of the intersection result efficient.

Should we recommend that they just take the performance hit and always iterate over the original set?

bakkot · 2023-01-12T22:42:34Z

Up to each polyfill what it wants to do, but that's what I'd do. It's pretty common for new language features to be impossible to polyfill performantly.

ljharb · 2023-01-12T23:24:08Z

That's what my polyfills have done, given that there's no other choice. In practice, my experience tells me most Sets won't be prohibitively large anyways.

bakkot mentioned this issue Sep 17, 2022

Avoid "delete" and "add"? #50

Closed

bakkot closed this as completed Sep 17, 2022

bakkot mentioned this issue Feb 11, 2023

Revisit intersection algorithm based on JSC's Set implementation #91

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are the actual algorithms we should use? #70

What are the actual algorithms we should use? #70

bakkot commented Feb 11, 2022

zloirock commented Feb 12, 2022

bakkot commented Feb 12, 2022

zloirock commented Feb 12, 2022 •

edited

Loading

zloirock commented Feb 12, 2022

bakkot commented Feb 12, 2022 •

edited

Loading

zloirock commented Feb 12, 2022

bakkot commented Feb 12, 2022

zloirock commented Feb 12, 2022

zloirock commented Feb 12, 2022

bakkot commented Feb 12, 2022

zloirock commented Feb 12, 2022

bakkot commented Feb 12, 2022 •

edited

Loading

zloirock commented Feb 12, 2022

bakkot commented Feb 12, 2022

zloirock commented Feb 12, 2022

Ginden commented Feb 12, 2022

bakkot commented Feb 12, 2022

zloirock commented Feb 12, 2022

bakkot commented Feb 12, 2022

acutmore commented Jul 10, 2022

bakkot commented Jul 10, 2022

acutmore commented Jul 10, 2022

bakkot commented Jul 10, 2022 •

edited

Loading

bakkot commented Jul 11, 2022 •

edited

Loading

bakkot commented Sep 17, 2022

brad4d commented Jan 12, 2023

bakkot commented Jan 12, 2023

ljharb commented Jan 12, 2023

What are the actual algorithms we should use? #70

What are the actual algorithms we should use? #70

Comments

bakkot commented Feb 11, 2022

zloirock commented Feb 12, 2022

bakkot commented Feb 12, 2022

zloirock commented Feb 12, 2022 • edited Loading

zloirock commented Feb 12, 2022

bakkot commented Feb 12, 2022 • edited Loading

zloirock commented Feb 12, 2022

bakkot commented Feb 12, 2022

zloirock commented Feb 12, 2022

zloirock commented Feb 12, 2022

bakkot commented Feb 12, 2022

zloirock commented Feb 12, 2022

bakkot commented Feb 12, 2022 • edited Loading

zloirock commented Feb 12, 2022

bakkot commented Feb 12, 2022

zloirock commented Feb 12, 2022

Ginden commented Feb 12, 2022

bakkot commented Feb 12, 2022

zloirock commented Feb 12, 2022

bakkot commented Feb 12, 2022

acutmore commented Jul 10, 2022

bakkot commented Jul 10, 2022

acutmore commented Jul 10, 2022

bakkot commented Jul 10, 2022 • edited Loading

bakkot commented Jul 11, 2022 • edited Loading

bakkot commented Sep 17, 2022

brad4d commented Jan 12, 2023

bakkot commented Jan 12, 2023

ljharb commented Jan 12, 2023

zloirock commented Feb 12, 2022 •

edited

Loading

bakkot commented Feb 12, 2022 •

edited

Loading

bakkot commented Feb 12, 2022 •

edited

Loading

bakkot commented Jul 10, 2022 •

edited

Loading

bakkot commented Jul 11, 2022 •

edited

Loading