-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisiting the edge-case semantics of wake #108
Comments
Not so, because you will (likely) be able to create an arbitrary number of asynchronous promise-based waiters without consuming significant thread resources by using the proposed Atomics.waitAsync. Given enough memory and a sufficiently lightweight implementation of Promise, the count could "easily" reach 2^31 or 2^32. This is stated clearly in the first comment on #72. Also see discussion on #106. waitAsync is not a wasm API but that should not matter. It is true that #72 got hijacked by the i32 vs i64 vs f64 discussion and that there don't seem to be a lot of traces of the discussion that led to the current design with the trap, but the crucial point is that the woken count can overflow in more or less realistic settings if it is i32 or even u32, for the reason stated above. Sure, nobody writes code like that but let's not leave that hole wide open. We could have gone with i64 but that would create some mismatch with the JS semantics that no doubt could be resolved, ISTR we also discussed an arbitrary cap at 2^53, but given how extreme a corner case this is the headache did not seem worth it. IMO you need stronger reasoning or a stronger proposal to change the semantics here. It is useful to have a distinguished argument to say "wake all waiters" instead of having to use some artificially inflated number to "be sure" we get all, I think the latter is particularly ugly (though see below). In the case where you "wake all" with a distinguished argument and "all" turns out to be some vast number that is not representable, it is somewhat useful for this to trap, since something is going on that you did not expect. If you did expect it you would have used an explicit count to wake. An alternative that I'm not sure if we discussed is to have an (The bell-shaped poll results for what to do here probably reflect that most people don't care what we do.) |
This reasoning makes sense. I missed the significance of I was originally motivated by looking at similar linux syscall behaviours like FUTEX_WAKE, which does rely on an arbitrarily inflated number to wake all threads, but reading further this isn't uniform across different platforms anyway. For example the windows synch API only lets you wake 1, or all threads, and the syscalls don't return whether the wake was effectful/the number of threads woken. So it seems there's nothing that exactly matches any reasonable behaviour of our instruction (purely to motivate similarity of API, not necessarily about implementation difficulty). If no one else chimes in with an opinion, I'll happily close this issue shortly. |
OK. No hard feelings, it's useful to revisit issues like these to make sure we get them right anyway... Having an |
Sentinel values are bad, especially as input. And making the count artificially signed just for that is even worse. If we are keen on making the "all" case special then we should introduce a If I don't buy the "because JS" argument in this case. First, because its API is based on floats and thus different already, and second because dubious JS design choices do not justify bad Wasm design choices. If there is a discrepancy between the two then the right side to patch over the mess is JS -- for example, have that throw when it cannot represent an i64 count (which will never possibly happen anyway, since it's more than available address space). It seems insane to drive Wasm design decisions by the fact that JS only has 2^53 integers. |
This seems to leave a path open for |
An I don't understand what the concern is about having a trapping case for something that is almost certainly a programming bug. |
What about (from your post above) an |
The signedness / sentinel value kludge is also my biggest complaint, which is why I suggested I'm confused about what assumptions we are making regarding the expected value range of the wake count. @lars-t-hansen, I cannot consistently tell from your comments whether you consider >=2^32 a realistic case that we should worry about or not. From what I can see there are only two possibilities regarding a count >=2^32:
In neither case a sentinel "all" value or
Am I missing some scenario where a program could actually benefit from using wake-all? |
On 64-bit systems, in an implementation that implements waitAsync with promises without needing to start a thread for each of them, more than 2^32 waiters is likely possible, and we need to have some kind of meaning for that. (I should be more careful in my wording next time. The case is perhaps not "realistic" in that I doubt real programs will run into it, but I don't know that, and it seems beside the point. It can happen, or be made to happen.) I personally care very little about what the semantics are, but I care that there are semantics to cover cases that can occur in practice, and to the extent I care about them I want them to be straightforward for common cases, which are probably "wake k" for 0 <= k < smallish integer, and "wake all", and predictable for all other cases. Calling wake n times in a loop does not have the same semantics that "wake n" has, because there's an atomic section in wake where the waiter list is manipulated and the waiters to be woken are selected and woken; if there are m < n waiting, only m will be woken. Calling wake n times in a loop risks waking more than m. In particular, a waiter that was woken may do something and put itself back on the waiter list and thus be woken more than once in such a scenario. That may or may not be desirable, but it's a change in behavior. https://tc39.github.io/ecma262/#sec-atomics.notify |
The semantics from my first post with just If there are fewer than 2^32 waiters, both semantics are capable of waking either a small number, or all threads (although the current semantics cannot wake more than (2^31 - 1) by explicit count). If there are more than 2^32 waiters, both semantics must use a loop. The argument against the unsigned semantics was "having to use some artificially inflated number to "be sure" we get all, I think the latter is particularly ugly". |
@lars-t-hansen, what @conrad-watt said. In particular, the current design does not cover all cases in any practical sense, because it makes wake-all unusable. When you actually expect more than 2^32 waiters then you should absolutely not use it because its trapping semantics would blow up in your face. That is strictly worse than a loop. In all other cases it is not needed. So I would argue that the current wake-all serves no useful purpose. Why can't we simply move to i64 counts and drop the special wake-all case? |
I think the question of i32 vs i64 and the question of signed vs unsigned semantics can be separated out here. For i32 vs i64, the only time we are expecting more than 2^32 waiters is through the use of a JS feature, so it makes a certain sense to be concerned about the 2^53 edge-case, since it's pretty much guaranteed that a Wasm wake in this case will be indirecting through some JS engine code. This would be an argument against an i64 For signed vs unsigned, I think it's clear that unsigned is the more general semantics if the signed version traps as currently described, and it also avoids the sentinel value aspect. I'm most interested in exploring whether the "particularly ugly" objection in this case can be overcome. If we're fine with the limitations of a trapping signed i32 argument in this case wrt |
In a long-ago CG discussion on this overflow issue, the idea came up (which I haven't seen re-discussed here) of having the implementation trap in wait if there would more than UINT32_MAX waiters as a resource-exhaustion trap. The spec could handle this like it does with, e.g., table size (where |
I'm actually a tiny bit exasperated here because all we're trying to do is assign meaning to a particularly nutty corner case that just needs a meaning, yet the discussion makes it seem like this is actually of vital importance to anyone... @conrad-watt, you can't fix the "particularly ugly" bit about the unsigned semantics, because that's how I feel about it, and I stand by my aesthetic judgement in this case. But semantically I am just fine with it, and if you can get the committee to vote in favor of it, then great. Speaking for myself, I will still be able to pass any negative i32 to wake and in almost every conceivable practical case this will have exactly the same semantics as we have currently. @lukewagner, I expect trapping on wait would be fine too, though to my eye it's less deterministic than wake trapping if it gets an argument it can't handle. Again, semantically I'm just fine with it; it won't come up in almost any conceivable case anyway. @rossberg, the current design does cover all cases in a practical sense because no really practical program will hit the trapping case anyway (IMO). The assigned meaning is there just so that we have a meaning where there would otherwise not be one. |
All other things being equal, the unsigned semantics gives rise to a smaller and more regular specification; avoiding, as @rossberg pointed out, a case split on the sign of the operand, and a trap case in the reduction rules. I will bring this proposal to the next CG on that basis. I don't want this issue to draw more attention than it deserves. We've only just begun writing real PRs against the forked spec, so it makes sense that minor decisions like this are re-evaluated with a view towards minimising spec real estate for cases that don't matter. |
Also add a note that we expect engines to trap when there are 4 billion waiters. See issue #108.
Also add a note that we expect engines to trap when there are 4 billion waiters. See issue #108.
We discussed this at the Nov 12th CG meeting, and decided to use the unsigned semantics for wake. We also decided to trap during wait if there would be be more than 232 waiters. We also decided to rename |
Also add a note that we expect engines to trap when there are 4 billion waiters. See issue #108.
Currently
The current semantics of wake is as follows:
Wake consumes two operands, an
i32
address
, and ani32
wake count
.The
wake count
operand is interpreted as a signed value, and the following behaviour occurs based on this value:wake count
valuewake count
< 0wake count
== 0wake count
> 0wake count
,num waiters
) waitersLet
num woken
be the number of threads woken by this operation.The result of the wake operation is
num woken
ifnum woken
can be represented as ani32
, andtrap
otherwise.Proposal
I propose instead to interpret the
wake count
operand as an unsigned value, with the following behaviourwake count
valuewake count
,num waiters
) waitersThe result of the wake operation is
num woken
, which is guaranteed to be representable as ani32
.Reasoning
From discussions in TPAC and elsewhere (#72), there were concerns about the behaviour of the operation when the number of waiting threads is greater than UINT32_MAX. There was also some concern about conformity to JS, but this seems to be a red herring as JS takes a float to represent its
wake count
, waking all threads if passed ∞, and otherwise clamping the value to max(ToIntegerwake count
, 0) with no concern for the UINT32_MAX edge-case (link).Consider that if there really are more than UINT32_MAX waiting threads, neither implementation can wake all of them in one operation (must use a loop), the former because it would trigger a trap, and the latter because the number to wake is not representable.
Polls in previous CGs appear very inconclusive, and focussed on
i32
vsi64
representation. I don't have a strong opinion on the representation issue, but I think adding a trap case to the semantics isn't the right approach, and interpreting thenum waker
argument as signed is slightly rogue.This all seems to be perfectly theoretical anyway. At least on linux, superficial googling suggests that there are several internal limits that restrict maximum thread numbers to the order of millions, even on 64-bit systems, with a very hard limit of 2^29 due to their implementation of PIDs. Several linux syscalls assume that the number of waiters can be represented usingi32
. There's also a blogpost on experimentally pushing the envelope in Windows which doesn't get anywhere near 2^32.This hopefully means that the semantic change won't break anything, as a negative
wake count
argument will now be interpreted as a ginormous positive one (at least 2^31).The text was updated successfully, but these errors were encountered: