Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Asyncify/Bysyncify] Mechanism to exclude indirect calls in certain functions #2218

Closed
Keno opened this issue Jul 11, 2019 · 9 comments · Fixed by #2264
Closed

[Asyncify/Bysyncify] Mechanism to exclude indirect calls in certain functions #2218

Keno opened this issue Jul 11, 2019 · 9 comments · Fixed by #2264

Comments

@Keno
Copy link

Keno commented Jul 11, 2019

In investigating the performance of bysyncify (for JuliaLang/julia#32532), I'm noticing a significant number of functions being instrumented that could never appear on stack during an unwind. The reason that these functions are being instrumented is that they're using indirect calls. However, the call targets for these indirect calls is limited and none of the call targets can possibly unwind. It would be useful to be able to exclude these from the bysyncify transform. In the short term, a command line argument to emcc and bysyncify is probably sufficient, though longer term it would be nice to be able to annotate such functions in the C source (since that's a more natural place to keep this knowledge in sync). I don't know the wasm object file format well enough to know if there is some mechanism we could piggy back on to plumb this information through (cc @sbc100 for thoughts on that).

@kripken
Copy link
Member

kripken commented Jul 11, 2019

We have an option already to ignore all indirect calls (BYSYNCIFY_IGNORE_INDIRECT), is that too much for you - do you have some indirect calls on the stack sometimes?

If the indirect call type is enough to determine what matters and what doesn't, that would be interesting to know as well - I can add optimizations for that. But in a big enough project with similar indirect call types all around, that won't help much of course.

Overall, we may need a mechanism like this, although I've been hoping it's not necessary as the overhead is much lower in my tests compared to Asyncify/Emterpretify, and these hand-crafted lists are error-prone and add complexity. But it's possible we still need such a thing.

@kripken
Copy link
Member

kripken commented Jul 12, 2019

Oh, another question - which optimization level was the overhead measured on? Unoptimized builds with Bysyncify will be very large and slow, but optimized builds should be pretty reasonable (except for pathological cases of massive functions etc.).

@Keno
Copy link
Author

Keno commented Jul 12, 2019

We have an option already to ignore all indirect calls (BYSYNCIFY_IGNORE_INDIRECT), is that too much for you - do you have some indirect calls on the stack sometimes?

Yes, that's too coarse grained unfortunately. We do often have indirect class on the stack (it's a programming language after all, so users can do whatever they want ;)). For example, we know that gmp, mpfr, and LLVM are never on the unwind stack, but they do provide callbacks to count allocations, which are indirect calls, so end up getting transformed all the way through.

If the indirect call type is enough to determine what matters and what doesn't, that would be interesting to know as well - I can add optimizations for that. But in a big enough project with similar indirect call types all around, that won't help much of course.

Unfortunately, I don't think this is sufficient :(, there's just too many possible targets in the code base.

Overall, we may need a mechanism like this, although I've been hoping it's not necessary as the overhead is much lower in my tests compared to Asyncify/Emterpretify, and these hand-crafted lists are error-prone and add complexity. But it's possible we still need such a thing.

Yes, agreed. However, one additional point I should make is that when we generate wasm from julia proper (as opposed to the runtime), we do have this information at the language level, so can just plumb it through. At that point, it wouldn't be a hand-maintained source annotation, but just another semantic from the high level language that gets plumbed through (since the high level language can reason about any possible side effects for a given call site, even if it ends up as an indirect call).

Oh, another question - which optimization level was the overhead measured on? Unoptimized builds with Bysyncify will be very large and slow, but optimized builds should be pretty reasonable (except for pathological cases of massive functions etc.).

I tried both optimized and unoptimized builds, but the bysyncify version of even the optimized build was noticably slower than even the unoptimized build without it (the whole thing itself is fairly slow at the moment, since the JIT is off and I didn't cross compile the julia system image, so everything is interpreted - that's to be worked on, but in the meantime it makes any slow down super noticable).

@kripken
Copy link
Member

kripken commented Jul 12, 2019

Thanks @Keno, yeah, that makes sense.

First I'd like to finish investigating the perf issue in that PR - I am hoping that when optimized the overhead will decrease from 300% to just 25%. If so, then this becomes less urgent, but still interesting.

As for how to do this, there isn't a great way to annotate functions through the entire pipeline, sadly. One option might be to have a void bysyncify_ignore(void) that if a function calls it, we ignore it in bysyncify (and turn that call into a no-op). Alternatively, a list of function names is possible, but we'd need to inhibit some optimizations (like function merging). A risk in both approaches is inlining, which the user may need to manually disable in some places, again, inhibiting some optimizations.

@Keno
Copy link
Author

Keno commented Jul 12, 2019

One option might be to have a void bysyncify_ignore(void) that if a function calls it, we ignore it in bysyncify (and turn that call into a no-op).

I think the problem with this approach is that inlining is no longer semantically correct (because a call to the magic function will end up in the outer function, which may have indirect calls that unwind). I do still think plumbing something through the stack might be worthwhile. In the particular case I was looking at, there's an indirect call in the allocator that gets picked up, so basically every function in the system gets transformed, despite most not needing it. I do agree it would be a bit odd to, say, add a flag to the wasm object format specifically for bysyncify, but perhaps we could simply do something like emit the llvm.global.annotations array into a separate section of the wasm file and have an appropriate contract with bysyncify to understand that. I feel like such a mechanism could be generically useful to exploration as wasm evolves.

@kripken
Copy link
Member

kripken commented Jul 16, 2019

Talking to @dschuff and @jgravelle-google , there are some ideas about how we could pass metadata through from clang and LLVM IR all the way to wasm and binaryen. But there is no actual plan or work yet.

@kripken kripken changed the title [Bysyncify] Mechanism to exclude indirect calls in certain functions [Asyncify/Bysyncify] Mechanism to exclude indirect calls in certain functions Jul 23, 2019
@kripken
Copy link
Member

kripken commented Jul 23, 2019

Ok, after getting data from Julia, DosBox, and Doom 3, it definitely looks like we want some sort of manual list mechanism, as indirect calls are too much of a problem (I did notice Julia has a bunch of invoke_* calls, and I checked what happens if those are optimized into direct calls instead of looking like indirect ones; sadly it didn't help much.)

In the long term some source-level mechanism would be good as discussed above. That'll take work and time though. In the short term just a plain list of functions, like EMTERPRETIFY_WHITELIST/BLACKLIST, might work.

The whitelist option would mean "only modify these functions, literally nothing else"; the blacklist option would mean "don't modify these specific functions, otherwise do what you think is right". Would the whitelist be enough for everyone? It sounded on the mailing list like that might be the case for DosBox and Doom 3.

@Keno
Copy link
Author

Keno commented Jul 23, 2019

I don't think whitelist is sufficient for us, though blacklist would be, since we just have a few specific problematic functions that do indirect calls, but we know cannot task switch, but many other functions can be on the stack during task switch and we don't want to have to list them.

@kripken
Copy link
Member

kripken commented Jul 24, 2019

Makes sense, thanks. Ok, I'll work to add both types of lists.

kripken added a commit that referenced this issue Jul 26, 2019
The blacklist means "functions here are to be ignored and not instrumented, we can assume they never unwind." The whitelist means "only these functions, and no others, can unwind." I had hoped such lists would not be necessary, since Asyncify's overhead is much smaller than the old Asyncify and Emterpreter, but as projects have noticed, the overhead to size and speed is still significant. The lists give power users a way to reduce any unnecessary overhead.

A slightly tricky thing is escaping of names: we escape names from the names section (see #2261 #1646). The lists arrive in human-readable format, so we escape them before comparing to the internal escaped names. To enable that I refactored wasm-binary a little bit to provide the escaping logic, cc @yurydelendik

If both lists are specified, an error is shown (since that is meaningless). If a name appears in a list that is not in the module, we show a warning, which will hopefully help people debug typos etc. I had hoped to make this an error, but the problem is that due to inlining etc. a single list will not always work for both unoptimized and optimized builds (a function may vanish when optimizing, due to duplicate function elimination or inlining).

Fixes #2218.
kripken added a commit to emscripten-core/emscripten that referenced this issue Jul 27, 2019
A slight complication here is that we need function names for the lists. This extends the current symbol-map mechanism to keep around function names in the wasm until the very end of the optimization pipeline. (There are still risks with inlining and duplicate function elimination changing things, though, which can't be solved without adding some new special function annotation.)

Uses WebAssembly/binaryen#2218
belraquib pushed a commit to belraquib/emscripten that referenced this issue Dec 23, 2020
A slight complication here is that we need function names for the lists. This extends the current symbol-map mechanism to keep around function names in the wasm until the very end of the optimization pipeline. (There are still risks with inlining and duplicate function elimination changing things, though, which can't be solved without adding some new special function annotation.)

Uses WebAssembly/binaryen#2218
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants