Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make br_if return its operands #330

Closed
wants to merge 1 commit into from
Closed

Make br_if return its operands #330

wants to merge 1 commit into from

Conversation

rossberg
Copy link
Member

@rossberg rossberg commented Sep 2, 2016

No description provided.

@ghost
Copy link

ghost commented Sep 2, 2016

I don't think this is a useful change, it seems to run counter to consuming values to get them off the stack, and counter to the expression pattern.

It is equivalent to picking each of the arguments from the stack in stack order, and if the producer really needs these live on the return from br_if then it could use pick anyway.

For the text presentation it requires the following change or some more complex live range checking.

br_if 0 ($a + $b, $c + $d) <cond>;
=>
const $l0 = $a + $b;
const $l1 = $c + $d;
const $cond = <cond>;
br_if 0 ($l0, $l1) $cond

It is also very brittle as br_if 0 ($l1, $l0) $cond would not fit the pattern and require excess pick operators anyway. For example:

const $l0 = call1();
const $l1 = call2();
const $cond = <cond>;
br_if 0 ($l1, $l0) $cond
=>
call1
call2
<cond>
pick 1
pick 3
pick 2
br_if 0
drop
drop
drop
drop
drop
drop

If there are stats showing that these values are mostly live out then that might be something? I really doubt it but surprise me. If the arguments are going to be duplicated rather than consumed then at least make them general pick operations so the operator is not so fragile.

Otherwise please stick to the expression pattern and just pop them and leave it to the producer to duplicate them only when necessary.

@kripken
Copy link
Member

kripken commented Sep 2, 2016

@rossberg-chromium: I can't seem to remember the details behind this, what's the context?

@sunfishcode
Copy link
Member

It's simpler in terms of the stack-machine semantics. This way, br_if doesn't have to do extra popping on the branch-not-taken path. It is also expected to be used by compilers to represent values live over a conditional branch.

The test changes here lgtm.

@kripken
Copy link
Member

kripken commented Sep 2, 2016

@sunfishcode: I get that. Not arguing :) What I am asking is, the last time I remember this discussion, it didn't end with a decision to go this way,

WebAssembly/design#539
WebAssembly/design#709

In fact it seemed like things were going in another direction as shown there. What changed?

@lukewagner
Copy link
Member

lukewagner commented Sep 2, 2016

@kripken What changed since those issues was switching to stack machine semantics which makes it more natural to not pop the br_if value only along one successor edge. As demonstration of this, 3 independent implementations (SM, V8, @rossberg-chromium's formalization of stack machine semantics) all unintentionally left the value on the stack.

If there's a significant size motivation (additional drops necessary when the value is only needed along one edge), that could be a reason to add a new br_if-folded-with-drop op, but we don't seem to have that yet and it could always be added later.

@kripken
Copy link
Member

kripken commented Sep 2, 2016

I see, sounds like the points brought up in WebAssembly/design#703? WebAssembly/design#709 can probably be closed.

@ghost
Copy link

ghost commented Sep 2, 2016

I believe I have made a great case for not doing this, and that there is a very poor case for making this change. We can have a stack machine but that does not in itself justify constraints on the stack machine operators that frustrate a structured presentation language (even frustrate nice stack machine code). The pick operator will address this much more cleanly, much more flexibly.

That multiple VMs happened to not pop these values seems besides the point. It is not a big matter, or inefficient, for them to pop these values is it?

I think the lack of commitment and demonstration of a structured presentation language, and how this PR change affects it, is causing disproportionate weight to be given to rather minor matters. I challenge the proponents to consider how this is consistent with an expression based presentation language?

As noted above, if the stats show that in most case the values are live out then that might be a good reason for making an exception for this operator, but do we have those results?

Even if so, please change the PR so that these operators can pick arbitrary values from the stack rather than only using an immediate linear sequence of stack values which is very fragile.

Another example. With the current design, if these values are used out of order or multiple times then they would have been saved to local variables and reloaded just for this operator and thus need dropping after it if not popped - in the general case these values need to be popped.

@sunfishcode
Copy link
Member

@JSStats It's fairly straightforward to represent this change in a structured text format. Ignoring the issues that arise with arity>1 (which are not specific to br_if), code like this:

    get_local $x
    get_local $a
    get_local $b
    i32.eq
    br_if 7
    call $foo

might be rendered something like this:

  $foo(br_if(7, $x, $a == $b));

(adjust as desired according to your syntactic preferences). Essentially, the br_if acts as a pass-through for $x in the case where the condition is false.

@ghost
Copy link

ghost commented Sep 3, 2016

@sunfishcode Examples are good, thanks. Two things make your example look a simple matter: as you note it deals with only one value, but multiple values need to be considered in the design; the values have already been saved to locals hiding the cost of spilling them to the locals and hiding the difference that the value is still live at the end of the sequence.

Part of my argument against this PR as it is, is how fragile the pattern that it matches is and that this makes it poor stack machine code and a poor fit to an expression based presentation. Let me work through some examples to illustrate some of these points:

The value might be used out of stack order, in which case the result of br_if would be dropped anyway. Also note that at the end of this sequence the value is still live in $x which would be a burden for a simple compiler. Moving a value from the values stack to a local variable might reduce the number of live values on the values stack, but it does not reduce the number of live values.

get_local $x
get_local $a
get_local $b
i32.eq
br_if 7
drop
i32.const 1
get_local $x
call $foo

br_if(7, $x, $a == $b);
$foo(1, $x);

Lets consider an example starting from a point at which values have not already been spilled to local variables, and that is a perfect match to the pattern. Note that all the values are consumed, there are no values live at the end of the sequence except the result of the last call.

call $gf64
call $gi32
br_if 7
i32.const 1
call $foo

$foo(br_if(7, $gf64(), $gi32()), 1);

Now lets change the order of the arguments to $foo, something that a producer would like to do without undue cost. The operator sequence becomes much longer, the result of the br_if is dropped, the argument to br_if had to be spilled to a local variable and is still live at the end of the sequence. This is a very significant change in the efficiency of the encoding and performance for a simple compiler as a result of a minor change in the connectivity. It's too fragile. I believe such issues are part the challenge that binaryen has hit. It is also a convoluted change for the producer, fragile and not great.

call $gf64
tee_local $x
call $gi32
br_if 7
drop
i32.const 1
get_local $x
call $foo

drop(br_if(7, tee_local($x, $gf64()), $gi32()))
$foo(1, $x);

Let's say pick is added then this becomes the following. This look better. But this only works this cleanly for a single value, if there were multiple values to pass through then their ordering also makes this fragile. Hence my call to please have br_if pick values from the stack if you want to keep them live, so the br_if operator would accept one offset for each value, and this would also allow picking an ordering of the values - it would be a small change that makes a very big reduction in how fragile and frustrating the stack machine code is to work with. Also note that values are left on the values stack, still live at the end of the sequence, and wasm would need a pick_and_void to address that, but at least they end at the end of the block scope.

call $gf64
call $gi32
br_if 7
i32.const 1
pick 1
call $foo

const $l1 = $gf64();
const $l2 = br_if(7, $l1, $gi32());
$foo(1, $l2);

If popping all the arguments to br_if then this would give the following. I expect that the common case would be for the values to not be live after the br_if so this might be the best option. It does not require the pick arguments to br_if rather uses explicit pick operators as necessary. I expect these excess pick operators would be easier to cost in optimizers, and their overhead ignored.

call $gf64
pick 0  // aka dup
call $gi32
br_if 7
i32.const 1
pick 1
call $foo

const $l1 = $gf64();
const $l2 = br_if(7, $l1, $gi32());
$foo(1, $l2);

Thus I suggest just leaving it popping the values from the stack.

It may well be possible to work around this proposed change to br_if in the text presentation, but it will need different strategies compared with all the other expression base operators that pop values and I have not worked it all out yet.

I understand that a stack machine swap operator might have been used in some of the sequences, but I have other reasons to argue against adding such an operator. swap mutates the stack and might break the efficiency of SSA decoding, and the effects are very convoluted to express in an expression based presentation (not impossible, but the values need to be consumed and re-emitted which is very verbose and hard to follow).

@sunfishcode
Copy link
Member

When arity>1, one catch-all approach for structured text formats is to use "let"-like syntax to assign names to values. This is a general solution which is not specific to br_if.

br_if is not the only place in wasm where it can be tricky to line up stack operands in the right order; if we want pick functionality, I suggest we consider adding a plain pick first and seeing how far that gets us, before adding operators with pick folded in.

As was mentioned above, if having extra drops after br_if turns out to be common and frequent in practice, we can add a new br_if-folded-with-drop opcode.

@ghost
Copy link

ghost commented Sep 5, 2016

@sunfishcode Yes, I agree that the 'let'-like syntax is one good approach. I assume this is the same as a value on the values stack referenced via pick. These values have block scope, so fit the 'let' pattern. I also agree that adding a single pick operator would be best for a start, rather than bundling it into br_if, but the proposal in this PR would be equivalent to bundling in a very restricted form of pick. For example if there are three values then this PR would be equivalent to:

pick 2
pick 2
pick 2
<cond>
br_if

If you must do this then I would much rather see general support for picking the arguments so that the code is not so brittle.

I would be delighted if the pick operator were added to the next 'stack' update so people can start exploring this, and perhaps revisit this PR later with some statistics to compare.

But lets say we use this 'let' pattern for br_if then the general pattern for br_if (the form when values are not ready to consume in stack order) becomes for example (some stack values and some values in local variables):

pick n1
pick n2
get_local l1
pick n3
get_local l2
...
br_if

This patten works well using the 'expression' pattern, where values are popped. It's the same pattern that would be used for i32.add, for example:

pick n1
get_local l1
i32.add

There might be some code sequences in which the arguments to i32.add are used again, but there is no one arguing that i32.add should keep it's argument duplicates live too.

The expression pattern, of popping arguments, seems to have the wasm encoding optimized for the high frequency expression pattern where values can be consumed in stack order. This degrades gracefully for local variables, just requiring a get_local, but without a general pick it is brittle to changes in the stack patterns - it's neither a nice stack machine code or a nice match to presentation in a structure expression based language.

Thus my suggestion to not change br_if as this PR proposes, or if you do then give it general pick arguments.

I don't see many other operators that do keep values live. There are block ends that pass multiple values, but they seem very different as they keep some values and discard the rest, and block ends can use the same general pattern as in the example above, and do not need to be followed by drop operators. The only other case seems to be the tee_local operator but that applies to only a single value, and with pick I expect tee_local would not be used so heavily because later reference might be able to pick the value from the stack too rather than having to spill it to a local variable.

@sunfishcode
Copy link
Member

@JSStats Your first example appears to be emulating a br_if that returns its operands in terms of a br_if that doesn't. However consider the reverse: if you need a br_if that doesn't return its value and all you have is one that does, you can just drop the excess values, which is straightforward. This is another argument in favor of having br_if return its value operand(s): it's less awkward when emulating the other.

It is indeed not always easy to generate stack-machine code that pushes operands in the desired order, and pick and swap are interesting to consider, but these issues aren't specific to br_if.

It's true that most instructions pop their operands. The value operands of branch instructions are special though, as their value operands aren't normal operands. They're not participating in any computation that the branch is doing; they're just values that the branch is forwarding on the stack to the destination -- to be popped by some other instruction.

@ghost
Copy link

ghost commented Sep 7, 2016

@sunfishcode After looking at the text decoder challenges I now think this PR could be handled, but it will demand that text decoders are two-pass, they will need to pre-compute the number of uses on a first pass so that they know if the arguments to br_if are used last here or should be presented as block scoped constant references.

Another way to look at this is that producers could just leave the values on the stack after br_if, probably don't care or have a need to drop them, and it will give them a more compact encoding to avoid dropping them. A two pass text decoder will know the last use so does not care either. This will hurt a simple compiler, and consume unnecessary stack space, all the issues you were trying to address with drop. I hope this does not pass, but can work around it.

There are lots of 'connectivity' operators that only copy through references, perhaps shuffling them etc, and it is even possible to define functions that do this, yet that does not make them 'special' enough to keep their arguments live after the operation - so I don't think the 'special' argument helps this PR.

Here are some examples of what the text decoder will need to do and some of the lossage:

block $out : i32
  get_local $i1
  <cond>
  br_if $out
  drop
  ...
=>
{
  const i32 $c1 = $l1
  br_if $out ($c1) <cond>
  ...

After second pass:

=>
{
  br_if $out ($l1) <cond>
  ...

If the producer has not bothered to drop the value the live range would be the same and known after a second pass, so produce the same text code for different wasm code creating a loss. This is a result of canonicalizing away drop operators in the text format.

block $out : i32
  get_local $i1
  <cond>
  br_if $out
  <just leaving the value on the stack unused>
  ...
=>
{
  br_if $out ($l1) <cond>
  ...

@sunfishcode
Copy link
Member

@JSStats The text format strategy I sketched out above doesn't require two passes.

If br_if followed by drop becomes a concern in practice, we can add a br_if-with-folded-drop as mentioned above.

Concerning "connectivity" operands, whether or not a call is a connectivity operator depends on the contents of the callee; in general, a call can and often does do computation with its arguments, and the language can't specially handle the case where it doesn't (and, we pop them into locals before starting the callee regardless). Branch operands do seem to be the only pure connectivity operands wasm has, except maybe tee_local.

@ghost
Copy link

ghost commented Sep 7, 2016

@sunfishcode Actually it does need two passes. Even the expression pattern requires stacking pending operations until the results are consumed, and in the limit it might need to delay emitting the first operator until the end of a block. In your example above the text decoder will at least need to delay emitting the br_if until the result is consumed, until the call it does not know if the value is used once in stack order and thus can be emitted in an expression pattern, or needs to be emitted as a block scoped constant. The results may well not be consumed until the end of the function, this is probably quite likely, so it does in general require two passes. Using the expression pattern there are many barriers to being able to form an expression (any uses not in stack order), at which point the decoder can emit the text of pending operators and any values on the stack as block scoped constants, importantly it does not have to wait until these are all consumed, it does not need to know their live range, it just needs to know if an expression pattern can be emitted. However this problem is reversed when the arguments are kept live, the text decoder must assume the arguments are block scoped constants and does not know otherwise until they are popped, so it creates a greater challenge. As I note, it is not a show stopped, can be worked around, but it is different and adds extra complexity, and demands two passes or more specifically at least looking ahead until the live range of definitions used by br_if arguments are known.

You make lots of valid point regards 'connectivity', but how do they support this PR? There needs to be a rationale that connects the facts to the decision, and I don't see that?

We already have br_if that pops it's arguments, this PR is proposing a change, adds complexity, has no supporting statistics but experience suggests it would not be supported by statistics. It's like suggesting we change set_local to tee_local, without keeping set_local, when we know that the common case is that the result is not used, and know that pick addresses the use case in general. We could remove tee_local too after adding pick if statistics then failed to support the utility of tee_local - we need pick then need statistics.

Your case for this PR does seem to depend on drop semantics, and sorry but the case for drop has collapsed. In a way this PR could be seen as an attempt to impose drop on the wasm design, to frustrate producers into requiring the use of drop. Also pick is a more general solution to connecting stack values, solves the use case here very nicely, and there would be no use case for this PR with pick so can we add pick first and clarify that blocks unwind first and then re-visit this issue.

@sunfishcode
Copy link
Member

@JSStats It really doesn't require two passes.

We indeed don't have a lot of data on br_if with operands, so doing the simpler thing now,
and leaving the option open for adding the other form in the future, seems the best route.

Using pick to create a second sequence of operands on the stack just so that br_if can pop one sequence and leave the other sequence in place is awkward. A simple consumer would be obliged to pattern-match that sequence if wants to generate code that doesn't uselessly copy data around.

In reverse, the adding a drop to a br_if that does return its value would lead a simple consumer to do exactly the same thing it would do on a br_if that doesn't: free the resources on the fallthrough path.

@ghost
Copy link

ghost commented Sep 8, 2016

@sunfishcode Using pick to create a second sequence of operands on the stack just so that br_if can pop one sequence and leave the other sequence in place is awkward. A simple consumer would be obliged to pattern-match that sequence if wants to generate code that doesn't uselessly copy data around.

That redundancy would only occur when the arguments were all in the correct order and only when they were live after the br_if. Even then pick is great for a simple consumer, it is easy to defer the pick leafs and the simple consumer can re-order them to reduce register pressure, just as the FF baseline compiler does for some leafs already. It would not be making duplicates, just referencing their source on the stack, and could do this in any order that suits, for example to reduce register pressure and spills for the condition expression computation. I think having a simple consumer do similar optimizations for get_local has been discussed, but perhaps that was a little more difficult, pick seems much easier. Only way to be sure it for someone to try.

When some or all of the values are not used after the br_if the simple consumer would have much less register pressure and spills along the branch-not-taken path.

I worry that it's just resistance to exploring the benefits of pick, and appeal to your again to ponder the advantages.

Re the text decoder, I think your efforts on the text format are great, just wish you would ponder where it will be heading with multiple values and pick which I think you will find helps. Could you please try your text decoder on this example:

block $out : i32
  i32.const 1
  <cond>
  br_if $out
  i32.const 2
  set_local $l1
  i32.neg
end

This is how I would like it presented, and this is what I expect is most familiar:

{
  const $c1 = 1;
  br_if $out ($c1) <cond>
  $l1 = 2;
  i32.neg($c1);
}

If the values are passed through then it could also be the following. But this is creating a new definition, it does not seem so familiar. This text code pattern would generally be used when the definition had been modified, not when it was just copied through. Would you consider it good code style to be creating multiple copies of scoped constants?

{
  const $c1 = br_if $out (1) <cond>
  $l1 = 2;
  i32.net($c1);
}

Another option using first, getting even more remotely familiar and less readable.

{
  i32.neg(first(br_if $out (1) <cond>, $l1 = 2));
}

All the above deal with a single value. Try multiple values and we end up with the following general pattern which is pointlessly verbose.

 const ($c3, $c4) = br_if $out ($c1, $c2) <cond>

Anyway it seems to be possible to decode away these copies, by considering the live range, so this is not really a show stopper for this PR, perhaps this text decoder discussion should be moved elsewhere.

@rossberg
Copy link
Member Author

rossberg commented Sep 9, 2016

Merged into binary-0xc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants