Revisiting the edge-case semantics of wake #108

conrad-watt · 2018-10-29T15:38:13Z

Currently

The current semantics of wake is as follows:

Wake consumes two operands, an i32 address, and an i32 wake count.

The wake count operand is interpreted as a signed value, and the following behaviour occurs based on this value:

`wake count` value	Behavior
`wake count` < 0	Wake all waiters
`wake count` == 0	Wake no waiters
`wake count` > 0	Wake min(`wake count`, `num waiters`) waiters

Let num woken be the number of threads woken by this operation.

The result of the wake operation is num woken if num woken can be represented as an i32, and trap otherwise.

Proposal

I propose instead to interpret the wake count operand as an unsigned value, with the following behaviour

`wake count` value	Behavior
---	Wake min(`wake count`, `num waiters`) waiters

The result of the wake operation is num woken, which is guaranteed to be representable as an i32.

Reasoning

From discussions in TPAC and elsewhere (#72), there were concerns about the behaviour of the operation when the number of waiting threads is greater than UINT32_MAX. There was also some concern about conformity to JS, but this seems to be a red herring as JS takes a float to represent its wake count, waking all threads if passed ∞, and otherwise clamping the value to max(ToInteger wake count, 0) with no concern for the UINT32_MAX edge-case (link).

Consider that if there really are more than UINT32_MAX waiting threads, neither implementation can wake all of them in one operation (must use a loop), the former because it would trigger a trap, and the latter because the number to wake is not representable.

Polls in previous CGs appear very inconclusive, and focussed on i32 vs i64 representation. I don't have a strong opinion on the representation issue, but I think adding a trap case to the semantics isn't the right approach, and interpreting the num waker argument as signed is slightly rogue.

This all seems to be perfectly theoretical anyway. At least on linux, superficial googling suggests that there are several internal limits that restrict maximum thread numbers to the order of millions, even on 64-bit systems, with a very hard limit of 2^29 due to their implementation of PIDs. Several linux syscalls assume that the number of waiters can be represented using i32. There's also a blogpost on experimentally pushing the envelope in Windows which doesn't get anywhere near 2^32.

This hopefully means that the semantic change won't break anything, as a negative wake count argument will now be interpreted as a ginormous positive one (at least 2^31).

The text was updated successfully, but these errors were encountered:

lars-t-hansen · 2018-10-31T08:16:56Z

This all seems to be perfectly theoretical anyway.

Not so, because you will (likely) be able to create an arbitrary number of asynchronous promise-based waiters without consuming significant thread resources by using the proposed Atomics.waitAsync. Given enough memory and a sufficiently lightweight implementation of Promise, the count could "easily" reach 2^31 or 2^32. This is stated clearly in the first comment on #72. Also see discussion on #106. waitAsync is not a wasm API but that should not matter.

It is true that #72 got hijacked by the i32 vs i64 vs f64 discussion and that there don't seem to be a lot of traces of the discussion that led to the current design with the trap, but the crucial point is that the woken count can overflow in more or less realistic settings if it is i32 or even u32, for the reason stated above. Sure, nobody writes code like that but let's not leave that hole wide open.

We could have gone with i64 but that would create some mismatch with the JS semantics that no doubt could be resolved, ISTR we also discussed an arbitrary cap at 2^53, but given how extreme a corner case this is the headache did not seem worth it.

IMO you need stronger reasoning or a stronger proposal to change the semantics here. It is useful to have a distinguished argument to say "wake all waiters" instead of having to use some artificially inflated number to "be sure" we get all, I think the latter is particularly ugly (though see below).

In the case where you "wake all" with a distinguished argument and "all" turns out to be some vast number that is not representable, it is somewhat useful for this to trap, since something is going on that you did not expect. If you did expect it you would have used an explicit count to wake.

An alternative that I'm not sure if we discussed is to have an Atomics.wakeAll instruction, this does not take a count of waiters to wake. This allows Atomics.wake to take an unsigned argument and return an unsigned result, if you think that's important (I personally think it should be kept signed) and would more importantly allow it not to trap, ever. We could give Atomics.wakeAll a non-trapping semantics, but only if it does not return a number of waiters that could overflow, and I'm not sure that's helpful, it just takes us back to where we were. But it's nice in a sense that using wakeAll opts in to the trapping semantics explicitly.

(The bell-shaped poll results for what to do here probably reflect that most people don't care what we do.)

conrad-watt · 2018-10-31T09:04:54Z

This reasoning makes sense. I missed the significance of Atomics.waitAsync, which was just sloppy reading on my part, so sorry.

I was originally motivated by looking at similar linux syscall behaviours like FUTEX_WAKE, which does rely on an arbitrarily inflated number to wake all threads, but reading further this isn't uniform across different platforms anyway. For example the windows synch API only lets you wake 1, or all threads, and the syscalls don't return whether the wake was effectful/the number of threads woken. So it seems there's nothing that exactly matches any reasonable behaviour of our instruction (purely to motivate similarity of API, not necessarily about implementation difficulty).

If no one else chimes in with an opinion, I'll happily close this issue shortly.

lars-t-hansen · 2018-10-31T10:20:17Z

OK. No hard feelings, it's useful to revisit issues like these to make sure we get them right anyway... Having an Atomics.wakeAll is less clunky than the current API, but I'm not sure I care enough about it to worry about it.

rossberg · 2018-10-31T12:20:02Z

Sentinel values are bad, especially as input. And making the count artificially signed just for that is even worse. If we are keen on making the "all" case special then we should introduce a wake_all instruction instead.

If wake_all is too heavy then we should simply change to a (unsigned) i64 argument.

I don't buy the "because JS" argument in this case. First, because its API is based on floats and thus different already, and second because dubious JS design choices do not justify bad Wasm design choices. If there is a discrepancy between the two then the right side to patch over the mess is JS -- for example, have that throw when it cannot represent an i64 count (which will never possibly happen anyway, since it's more than available address space). It seems insane to drive Wasm design decisions by the fact that JS only has 2^53 integers.

conrad-watt · 2018-10-31T22:03:48Z

This seems to leave a path open for Atomics.wake to take my unsigned semantics, so long as an Atomics.wakeAll instruction is added which does not return anything. I'm happy to do the PR and CG legwork needed to bring this up for debate.

lars-t-hansen · 2018-11-01T07:30:04Z

An Atomics.wakeAll that does not return anything seems like a step backward.

I don't understand what the concern is about having a trapping case for something that is almost certainly a programming bug.

conrad-watt · 2018-11-01T07:55:04Z

What about (from your post above) an Atomics.wakeAll that does return its number woken and has the trapping semantics (with wake unsigned)? Having slept on Andreas' response, I've realised that the main thing I'm unhappy with is the interpretation of wake count as signed, and I wouldn't care so much about a trap case if that were avoided.

rossberg · 2018-11-01T08:37:19Z

The signedness / sentinel value kludge is also my biggest complaint, which is why I suggested wake_all. But in fact, I don't think that should be necessary either.

I'm confused about what assumptions we are making regarding the expected value range of the wake count. @lars-t-hansen, I cannot consistently tell from your comments whether you consider >=2^32 a realistic case that we should worry about or not.

From what I can see there are only two possibilities regarding a count >=2^32:

If we think it is a realistic case (now or in some future) then we should change all wake counts to u64. Then no sentinel "all" value or wake_all instruction would be needed, u64_max will always suffice.
If we think that it is unrealistic then u32_max is good enough as a proxy for "all" and we could just as well adopt @conrad-watt's suggestion. (Unlike a trap it at least gives programs the freedom to handle the unlikely edge case gracefully if they care, by looping wake until the returned count is less then u32_max.)

In neither case a sentinel "all" value or wake_all instruction seems necessary. And in either case the current semantics seems strictly worse:

If 2^32 is realistic then no application should ever be using wake-all under a trapping semantics, because that could kill it without any way to protect against it.
If 2^32 is not realistic then no application has a need for wake-all, because it can just as well use wake-max. But now it has to worry about potential signedness conversions.

Am I missing some scenario where a program could actually benefit from using wake-all?

lars-t-hansen · 2018-11-01T17:28:19Z

On 64-bit systems, in an implementation that implements waitAsync with promises without needing to start a thread for each of them, more than 2^32 waiters is likely possible, and we need to have some kind of meaning for that. (I should be more careful in my wording next time. The case is perhaps not "realistic" in that I doubt real programs will run into it, but I don't know that, and it seems beside the point. It can happen, or be made to happen.)

I personally care very little about what the semantics are, but I care that there are semantics to cover cases that can occur in practice, and to the extent I care about them I want them to be straightforward for common cases, which are probably "wake k" for 0 <= k < smallish integer, and "wake all", and predictable for all other cases.

Calling wake n times in a loop does not have the same semantics that "wake n" has, because there's an atomic section in wake where the waiter list is manipulated and the waiters to be woken are selected and woken; if there are m < n waiting, only m will be woken. Calling wake n times in a loop risks waking more than m. In particular, a waiter that was woken may do something and put itself back on the waiter list and thus be woken more than once in such a scenario. That may or may not be desirable, but it's a change in behavior. https://tc39.github.io/ecma262/#sec-atomics.notify

conrad-watt · 2018-11-01T18:12:37Z

The semantics from my first post with just Atomics.wait (which I'm going to call the "unsigned semantics" from now on) covers a strict superset of the non-trapping cases of the current semantics. It's currently not possible to carry out a "wake all" operation if there are more than 2^32 waiters.

If there are fewer than 2^32 waiters, both semantics are capable of waking either a small number, or all threads (although the current semantics cannot wake more than (2^31 - 1) by explicit count). If there are more than 2^32 waiters, both semantics must use a loop.

The argument against the unsigned semantics was "having to use some artificially inflated number to "be sure" we get all, I think the latter is particularly ugly".

rossberg · 2018-11-02T16:10:35Z

@lars-t-hansen, what @conrad-watt said. In particular, the current design does not cover all cases in any practical sense, because it makes wake-all unusable. When you actually expect more than 2^32 waiters then you should absolutely not use it because its trapping semantics would blow up in your face. That is strictly worse than a loop. In all other cases it is not needed. So I would argue that the current wake-all serves no useful purpose.

Why can't we simply move to i64 counts and drop the special wake-all case?

conrad-watt · 2018-11-02T16:56:00Z

I think the question of i32 vs i64 and the question of signed vs unsigned semantics can be separated out here.

For i32 vs i64, the only time we are expecting more than 2^32 waiters is through the use of a JS feature, so it makes a certain sense to be concerned about the 2^53 edge-case, since it's pretty much guaranteed that a Wasm wake in this case will be indirecting through some JS engine code. This would be an argument against an i64 wake count (if we don't believe we will indirect through a JS engine/don't care about JS in general, we should never need more than i32 capacity anyway).

For signed vs unsigned, I think it's clear that unsigned is the more general semantics if the signed version traps as currently described, and it also avoids the sentinel value aspect. I'm most interested in exploring whether the "particularly ugly" objection in this case can be overcome. If we're fine with the limitations of a trapping signed i32 argument in this case wrt Atomics.waitAsync, the limitations of the unsigned version are no worse.

lukewagner · 2018-11-05T16:52:57Z

In a long-ago CG discussion on this overflow issue, the idea came up (which I haven't seen re-discussed here) of having the implementation trap in wait if there would more than UINT32_MAX waiters as a resource-exhaustion trap. The spec could handle this like it does with, e.g., table size (where table.current can return a i32 because table.grow can't grow beyond that).

lars-t-hansen · 2018-11-06T15:14:00Z

I'm actually a tiny bit exasperated here because all we're trying to do is assign meaning to a particularly nutty corner case that just needs a meaning, yet the discussion makes it seem like this is actually of vital importance to anyone...

@conrad-watt, you can't fix the "particularly ugly" bit about the unsigned semantics, because that's how I feel about it, and I stand by my aesthetic judgement in this case. But semantically I am just fine with it, and if you can get the committee to vote in favor of it, then great. Speaking for myself, I will still be able to pass any negative i32 to wake and in almost every conceivable practical case this will have exactly the same semantics as we have currently.

@lukewagner, I expect trapping on wait would be fine too, though to my eye it's less deterministic than wake trapping if it gets an argument it can't handle. Again, semantically I'm just fine with it; it won't come up in almost any conceivable case anyway.

@rossberg, the current design does cover all cases in a practical sense because no really practical program will hit the trapping case anyway (IMO). The assigned meaning is there just so that we have a meaning where there would otherwise not be one.

conrad-watt · 2018-11-06T16:12:44Z

All other things being equal, the unsigned semantics gives rise to a smaller and more regular specification; avoiding, as @rossberg pointed out, a case split on the sign of the operand, and a trap case in the reduction rules. I will bring this proposal to the next CG on that basis.

I don't want this issue to draw more attention than it deserves. We've only just begun writing real PRs against the forked spec, so it makes sense that minor decisions like this are re-evaluated with a view towards minimising spec real estate for cases that don't matter.

Also add a note that we expect engines to trap when there are 4 billion waiters. See issue #108.

binji · 2018-11-15T02:19:15Z

We discussed this at the Nov 12th CG meeting, and decided to use the unsigned semantics for wake. We also decided to trap during wait if there would be be more than 2³² waiters. We also decided to rename atomic.wake to atomic.notify (following JavaScript).

Also add a note that we expect engines to trap when there are 4 billion waiters. See issue #108.

conrad-watt mentioned this issue Nov 6, 2018

Add agenda item for edge-case semantics of wake WebAssembly/meetings#338

Merged

binji added a commit that referenced this issue Nov 15, 2018

[Overview] Change wake to use unsigned semantics

af16173

Also add a note that we expect engines to trap when there are 4 billion waiters. See issue #108.

binji added a commit that referenced this issue Nov 15, 2018

[Overview] Change wake to use unsigned semantics

7b9c40f

Also add a note that we expect engines to trap when there are 4 billion waiters. See issue #108.

binji mentioned this issue Nov 15, 2018

[Overview] Change wake to use unsigned semantics #110

Merged

binji added a commit that referenced this issue Nov 15, 2018

[Overview] Change wake to use unsigned semantics (#110)

b9c176d

Also add a note that we expect engines to trap when there are 4 billion waiters. See issue #108.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisiting the edge-case semantics of wake #108

Revisiting the edge-case semantics of wake #108

conrad-watt commented Oct 29, 2018 •

edited

Loading

lars-t-hansen commented Oct 31, 2018

conrad-watt commented Oct 31, 2018

lars-t-hansen commented Oct 31, 2018

rossberg commented Oct 31, 2018 •

edited

Loading

conrad-watt commented Oct 31, 2018

lars-t-hansen commented Nov 1, 2018

conrad-watt commented Nov 1, 2018 •

edited

Loading

rossberg commented Nov 1, 2018

lars-t-hansen commented Nov 1, 2018

conrad-watt commented Nov 1, 2018

rossberg commented Nov 2, 2018 •

edited

Loading

conrad-watt commented Nov 2, 2018 •

edited

Loading

lukewagner commented Nov 5, 2018

lars-t-hansen commented Nov 6, 2018

conrad-watt commented Nov 6, 2018

binji commented Nov 15, 2018

Revisiting the edge-case semantics of wake #108

Revisiting the edge-case semantics of wake #108

Comments

conrad-watt commented Oct 29, 2018 • edited Loading

lars-t-hansen commented Oct 31, 2018

conrad-watt commented Oct 31, 2018

lars-t-hansen commented Oct 31, 2018

rossberg commented Oct 31, 2018 • edited Loading

conrad-watt commented Oct 31, 2018

lars-t-hansen commented Nov 1, 2018

conrad-watt commented Nov 1, 2018 • edited Loading

rossberg commented Nov 1, 2018

lars-t-hansen commented Nov 1, 2018

conrad-watt commented Nov 1, 2018

rossberg commented Nov 2, 2018 • edited Loading

conrad-watt commented Nov 2, 2018 • edited Loading

lukewagner commented Nov 5, 2018

lars-t-hansen commented Nov 6, 2018

conrad-watt commented Nov 6, 2018

binji commented Nov 15, 2018

conrad-watt commented Oct 29, 2018 •

edited

Loading

rossberg commented Oct 31, 2018 •

edited

Loading

conrad-watt commented Nov 1, 2018 •

edited

Loading

rossberg commented Nov 2, 2018 •

edited

Loading

conrad-watt commented Nov 2, 2018 •

edited

Loading