Skip to content

Conversation

@dcodeIO
Copy link
Contributor

@dcodeIO dcodeIO commented Feb 5, 2019

As a follow-up to #1889, this PR implements the proposed BinaryenForceInline C-API. It refactors the core inlining logic to src/ir/inlining.h as suggested and reuses it in the C-API to replace a specific call within a caller with an inlined block.

Some issues / notes:

  • Unfortunately, it's not possible to make a BinaryenCallInline helper that'd do this at creation time of the function. The function doesn't yet exist at that time so we can't add necessary locals to it. A Builder-like API that first creates the function and then adds to it seems like too big a change to me at this point, though.
  • There is no integrated way to detect recursion yet, i.e. when inlining a function into itself over and over again, but it should be easy to detect this on the generator's side.

You''ll probably have some comments on possible improvements :)

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Feb 5, 2019

While trying this out I noticed that there is an issue with block names not being unique. That'll need a fix.

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Feb 5, 2019

Overall this looks somewhat inefficient due to traversing the entire function body twice (replacing the call + making unique labels), hmm...

@kripken
Copy link
Member

kripken commented Feb 5, 2019

What do you mean by "creation time" here?

Unfortunately, it's not possible to make a BinaryenCallInline helper that'd do this at creation time of the function.

What do you think about my suggestion from earlier to make the API "force-inline function X" which would then inline X into all functions that call it? If that works for your use case, it seems like a nicer API, and it would be a lot faster.

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Feb 5, 2019

What do you mean by "creation time" here?

The ideal API for my use case would be to simply call BinaryenCallInline(...) just as I would call BinaryenCall and it would inline the function in place. But that requires to have an actual Function already where we can add the additional locals to, but with the C-API you first build the body and then call BinaryenAddFunction with it, which then creates the function that we needed earlier to add locals to. Hence mentioning a builder-like API that would first create the function and then we'd add to its body, making something like this possible. Not sure how feasible this is, though, as it's quite some work for a quite specific need.

What do you think about my suggestion from earlier to make the API "force-inline function X"

Yeah, that'd most likely make the most sense here. My thinking was that doing it on a per-call basis would make it as general as possible to cover all cases, even those we haven't thought about yet where a function is inlined only sometimes for example.

Unfortunately, I just figured that there's another issue with doing inlining as an afterthought, that comes from not being able to precompute expressions that include an inlined helper (we use these like macros in C++), because precompute doesn't traverse into called functions (inlined bodies, as we did before manually, could simply be precomputed). This is pretty much a showstopper and I haven't yet come up with a solution.

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Feb 6, 2019

So, I figured that we are relying on inlining so much, for example we need ad-hoc precompute on inlined code for our macros to work, that doing inlining after everything else actually raises more concerns than it solves. I made my peace with it now and improved this on our side. Hope that I didn't waste too much of your time, and thanks for helping me out, as always! :) [took a while for me to figure this out, sorry]

@kripken
Copy link
Member

kripken commented Feb 6, 2019

Thanks for the clarification about creation time, now I see.

About the precompute issue, it sounds like doing inlining early, before running the optimizer, would be best for you? Are you saying you don't need binaryen to inline in that case?

@kripken
Copy link
Member

kripken commented Feb 6, 2019

That is, it seems like using this PR (or an improved version of it) to inline first thing, and then optimize, would get what you want?

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Feb 6, 2019

That is, it seems like using this PR (or an improved version of it) to inline first thing, and then optimize, would get what you want?

To me it appears that the amount of work to implement what's necessary simply isn't worth it. Let me try to give an example:

@inline function GET_SIZE<T>(): usize {
  if (isReference<T>()) return offsetof<T>();
  return sizeof<T>();
}

That's a simplified example of a macro (everything in its body is actually constant, T is always known and the inner calls are built-ins that emit constants) as I described earlier, which in our case has the (occasional) requirement to evaluate to a constant value when used as part of another expression as well. For example:

load<i32>(somePtr, GET_SIZE<f64>());

Here, the second argument to the load<T> built-in becomes the explicit offset immediate of the emited load instruction, which must be fixed at compile-time as of WebAssembly's design. This makes it necessary to inline the function right away, because the precompute pass that we use to evaluate the argument doesn't traverse into calls for good reasons but is able to precompute inlined blocks. If we'd delay inlining of the call any longer, we wouldn't be able to compile the load call. As you can see, this is mostly something useful in our standard library, but not so important for user code.

That's why it appears to me that a BinaryenCallInline-ish API that'd inline immediately is the only proper solution to the issue, but would require a lot of work (can't inline while we don't have a function to inline into yet, as described above). Maybe, having a precompute pass that can traverse into functions independently of inlining (so we can delay inlining) might be another solution, but i haven't thought that through yet.

@kripken
Copy link
Member

kripken commented Feb 6, 2019

I see, thanks.

I guess those inlined functions are special - simple enough that simple precomputation turns them into a constant? So they don't have locals, for example?

Overall, it seems like that's a special case that the AssemblyScript compiler would maybe have to handle itself - those functions being inlined are really like compiler intrinsics, that is, special functions the compiler controls and is aware of.

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Feb 21, 2019

Closing this in favor of an eventual improved solution, that is if someone else or myself happens to have a fitting use case again. Thanks for all your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants