Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LibJS: Implement the Set Methods proposal #16279

Merged
merged 8 commits into from
Dec 2, 2022

Conversation

IdanHo
Copy link
Member

@IdanHo IdanHo commented Dec 1, 2022

@IdanHo IdanHo added the 👀 pr-needs-review PR needs review from a maintainer or community member label Dec 1, 2022
Copy link
Member

@davidot davidot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Also very nice to immediately have some tests.
Just a couple of questions, with really only the [[Size]] thing being an issue for me

// 8 Set Records, https://tc39.es/proposal-set-methods/#sec-set-records
struct SetRecord {
NonnullGCPtr<Object> set; // [[Set]]
double size { 0 }; // [[Size]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a double the appropriate type here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see it might be infinity, hmm that complicates things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think just using the raw double result of toIntegerOrInfinity is cleaner.

return vm.throw_completion<TypeError>(ErrorType::IntlNumberIsNaN, "size"sv);

// 6. Let intSize be ! ToIntegerOrInfinity(numSize).
auto integer_size = MUST(number_size.to_integer_or_infinity(vm));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe at least add a VERIFY here to ensure we're in safe integer range (below 2^53) and not negative? That will ensure that [[Size]] does have a a non-negative integer or +∞

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can know that we're in the safe integer range? to_integer_or_infinity doesn't check for that, and the spec text doesn't talk about it either AFAICT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I guess it would work outside of safe integer range since we only compare sizes.

@davidot davidot added ⏳ pr-waiting-for-author PR is blocked by feedback / code changes from the author and removed 👀 pr-needs-review PR needs review from a maintainer or community member labels Dec 2, 2022
Copy link
Member

@linusg linusg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome :)

// 4. NOTE: If rawSize is undefined, then numSize will be NaN.
// 5. If numSize is NaN, throw a TypeError exception.
if (number_size.is_nan())
return vm.throw_completion<TypeError>(ErrorType::IntlNumberIsNaN, "size"sv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess we should drop the Intl prefix (later)

Userland/Libraries/LibJS/Runtime/Set.cpp Outdated Show resolved Hide resolved
Userland/Libraries/LibJS/Runtime/Set.cpp Outdated Show resolved Hide resolved
Userland/Libraries/LibJS/Runtime/CommonPropertyNames.h Outdated Show resolved Hide resolved
Userland/Libraries/LibJS/Runtime/SetPrototype.cpp Outdated Show resolved Hide resolved
@IdanHo IdanHo force-pushed the proposal-set-methods branch from 34b4b13 to 8e08ef5 Compare December 2, 2022 10:22
@IdanHo IdanHo added 👀 pr-needs-review PR needs review from a maintainer or community member and removed ⏳ pr-waiting-for-author PR is blocked by feedback / code changes from the author labels Dec 2, 2022
Copy link
Member

@linusg linusg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failure is sus, but looks unrelated. tyvm for implementing this so quickly!

image

@linusg linusg merged commit 2e806da into SerenityOS:master Dec 2, 2022
@davidot davidot removed the 👀 pr-needs-review PR needs review from a maintainer or community member label Dec 2, 2022
@linusg linusg mentioned this pull request Dec 2, 2022
11 tasks
@IdanHo IdanHo deleted the proposal-set-methods branch December 2, 2022 13:38
@davidot davidot mentioned this pull request Dec 2, 2022
@bakkot
Copy link

bakkot commented Dec 2, 2022

Nice!

I'm interested to see the comment about

// FIXME: This is not possible with the current underlying m_values implementation

since the question of whether that step is actually implementable is one for which I definitely need feedback from implementations.

I'm trying to get a sense of whether LibJS has an otherwise-performant implementation of Set which is unable to support that operation, or if it's already known to be less than optimally big-O performant and this is a consequence of that. In other words: the spec assumes that this step would be possible because I had a particular implementation in mind for Set; is it impossible in LibJS because LibJS's implementation is different but equivalently good to the one I had in mind, or does it have less than optimal performance in other ways?

Poking around a little, it looks like Map.prototype.delete (which backs Set.prototype.delete as well) is already linear in the size of the Map, instead of ~constant, so it seems like the existing implementation is already less than optimal, and so the fact that this step cannot be done efficiently falls out of that? But I'd appreciate commentary from someone more familiar with the internals here.

@IdanHo
Copy link
Member Author

IdanHo commented Dec 2, 2022

@bakkot LibJS takes an approach of correctness first, performance later, so it's very possible our implementation is not fully up-to-par w/regards to that yet.
Specifically regarding Map.prototype.delete, you are correct that it is currently O(N), but I asked around and found out that is not intended, so we're definitely looking into that 😄

Regarding the implementation you linked, it's not obvious to me how it handles iterator invalidation, or rather, the lack-there-of. Specifically, an iterator over that implementation would not allow continuing the iteration if the deletion or addition of entry caused a rehash of the map. Handling that specific issue is the reason why the initial LibJS implementation was eventually replaced with a balanced-binary-tree based one.

To be honest I did not think too hard about how easy it would be to fix said FIXME, but it doesn't sound completely impossible to do in O(smaller N). The main issue I'm thinking of at the moment, is the question of how the original ordering of the elements in the larger set could be extracted without iterating over the larger set in the first place, is that readily available somehow in that implementation?

@alimpfard
Copy link
Member

alimpfard commented Dec 2, 2022

To further comment on the iterator matter, I don't see any way to achieve O(~const) under the existing constraints:

  1. Live iterators should be capable of continuing once they reach the end
  2. Deleting the current (or next, or both) entry in the map/set while iterating it should be well-behaved
  3. Deleted entries should no longer show up in the iteration sequence

more concretely, I don't think O(const) can be achieved by using a linked-list to keep track of the order (while there are live iterators) as deletion of the current or next element will either explode, or force a re-seek from the first known-available entry - neither of which are O(const).

The best solution I can think of is to add a second hashmap to our existing rbtree/hashmap impl to make deletion O(lgn).

I'd love to know your thoughts on how live iterators should be handled (and whether any of those constraints are actually a misunderstanding on my part - or if I'm missing some obvious fact here).

@bakkot
Copy link

bakkot commented Dec 2, 2022

@IdanHo:

LibJS takes an approach of correctness first, performance later, so it's very possible our implementation is not fully up-to-par w/regards to that yet.

Towards that end, you can get the right ordering semantics (with worse performance) for .intersection by, when the time comes to do the sort, iterating over this and creating a temporary map from key->index, then doing a stable sort of the contents of the result according to lookups in the temporary map (with things missing from the temporary map mapping to infinity). That'll pass tests; the only downside is that it's O(size of receiver) instead of O(size of result).

Regarding the implementation you linked, it's not obvious to me how it handles iterator invalidation, or rather, the lack-there-of.

Yeah, V8's actual implementation, and SpiderMonkey's, both have a lot more details to get full spec compliance, some of which is for handling iterator invalidation.

I don't actually know the exact approaches they take. At a quick glance, V8's involves updating iterators to point to the new table, whereas SpiderMonkey's iterators keep track both of an index within the data table and a count of the actual elements which have been emitted so far and have the underlying map call appropriate methods to keep those values in sync when the map is rehashed or an element is removed.

The main issue I'm thinking of at the moment, is the question of how the original ordering of the elements in the larger set could be extracted without iterating over the larger set in the first place, is that readily available somehow in that implementation?

The thing you actually need is the ability to efficiently determine the relative order of any two items (so you can do a sort). And that's easy to get with the "Deterministic hash tables" implementation I linked (I believe): because the Entry objects are allocated in a linear, insertion-order array, you can do a normal lookup to get the Entry objects corresponding to your items and then compare the locations of those objects in memory.


@alimpfard:

more concretely, I don't think O(const) can be achieved by using a linked-list to keep track of the order

I don't think anyone actually uses a linked list to keep track of order, LibJS included, so I'm not sure of the relevance of this comment.

The best solution I can think of is to add a second hashmap to our existing rbtree/hashmap impl to make deletion O(lgn).

Yeah, I'm afraid I don't see a way short of an additional table given the current implementation. That said my data structure design skills are a bit rusty, so I might well be missing something.

(If you do add an extra table which lets you do key->index lookups so that you can make remove fast, you get the ability to do the .intersection sort step for free.)

@alimpfard
Copy link
Member

I don't think anyone actually uses a linked list to keep track of order, LibJS included

We used to, with Idan's original implementation that used OrderedHashMap (which gives you O(const) but has an iterator invalidation problem).

@bakkot
Copy link

bakkot commented Mar 25, 2023

Re: the sort step, that's now been dropped, so you can just remove the fixme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants