Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make compilation faster #2182

Open
davidmdm opened this issue Apr 10, 2024 · 22 comments
Open

Make compilation faster #2182

davidmdm opened this issue Apr 10, 2024 · 22 comments
Labels
enhancement New feature or request

Comments

@davidmdm
Copy link

Is your feature request related to a problem? Please describe.
When running large wasm files like the ones generated by the Go Toolchain (that embed the goruntime), it now takes 5-6 times longer on my machine to compile a module.

For some program I have to compile and execute it takes:

  • v1.6.0 ~= 4s
  • v1.7.0 ~= 30s+

Although I can expect the executables produced by v1.7.0 to be much more optimized and efficient, this tradeoff is not worth it for programs that want to interpret one-off wasm programs.

Describe the solution you'd like
Ideally when compiling a module as part of the runtime config, I should be able to choose an optimization level to choose between fast compilation with slow performance vs slow compilation with fast performance.

@davidmdm davidmdm added the enhancement New feature or request label Apr 10, 2024
@ncruces
Copy link
Collaborator

ncruces commented Apr 11, 2024

wazero is maintained by a small team, so when the new compiler was introduced, it was decided to renove the old compiler (which was a totally different code base).

The new compiler is more modular, so it may be possible to disable certain optimization passes (I'll leave it to @mathetake to comment on that). It's also a recent codebase, and there might be some opportunity to optimize it. Having said that, it's probably unrealistic to expect it to become as fast as the previous compiler was.

You have two mitigation strategies at your disposal:

  1. cache the compilation result
  2. use the interpreter

Other than that, if you can find (or fix!) a bottleneck in the compiler (pprof is highly recommended), we're enthusiastic about any improvements.

@davidmdm
Copy link
Author

Thanks for your response. Firstly I want to say how much I appreciate the wazero project, and understand the limitations it is under and think y'all are doing a fantastic job.

Hopefully, given the modularity of the new compiler, this feature could be feasible.

Opening this issue not as a bug but just as a mark of interest in this aspect of the compiler.

Given the varied use cases of wasm, I hope in the future wazero can provide an option for use cases that prefer quick compilation over quick runtime performance.

I will experiment with the interpreter and report back with hard numbers later but I think the interpreter seemed to take as much time as the v1.7.0 compiler.

If the interpreter can work for fast startup then this is fine for me!

@davidmdm
Copy link
Author

As promised here are my findings running against a 65Mb wasm file on my Macbook Air M2 (Arm64):
(these results include compiling and executing the wasm - i suspect execution speed is negligible)

v1.6.0 compiler   : 2.49s
v1.6.0 interpreter: 1.866s

v1.7.0 compiler   : N/A
v1.7.0 interpreter: 2.087s

v1.7.1 compiler   : 26.134s
v1.7.1 interpreter: 1.967s

What we can draw is that the the compiler is about one order of magnitude (10x) slower than the previous 1.6.0 compiler.

However, that being said, I was wrong when I created the issue, and must have had a misconfiguration on my end: The interpreter is only marginally, and arguably negligibly slower.

Previous to v1.7.X there was little reason to use the interpreter over the compiler except for supporting more architectures.
Now I think it would be reasonable to add to the documentation the differences in startup time, and market the interpreter setup as the solution for programs that need fast... Well interpretation times.

This advice could be revisited if and when optimization levels become a thing.

In the meantime, I am satisfied using the interpreter.

@davidmdm
Copy link
Author

Turns out the interpreter does not scale very well. For programs that I can compile and run quickly with v1.6.0 (~5s), with the interpreter it takes 30+ seconds on both v1.6.0 and v1.7.x of wazero.

An option similar to zig's releaseFast flag, where we could disable a lot of the optimizations and achieve closer to the compilation speed of v1.6.0 would be beneficial.

TLDR: Contrary to what I believed before, the interpreter is not a silver bullet as it does not scale for complex tasks.

@mathetake
Copy link
Member

so basically, we have no resources or plan to introduce any other complexity in compiler implementation. In fact, as you can see #2214, there's really plenty of rooms for making the current compiler faster. You can try and see where is the bottleneck of compilation, and if you can contribute as well. At least, we should be able to make our current compiler as fast as wasmtime in terms of compilation performance (not runtime perf!).

Given that, I am changing the title of this issue to something like optimizing compilation perf

@mathetake mathetake changed the title Support different optimization levels Make compilation faster May 23, 2024
@mathetake
Copy link
Member

#2226

@davidmdm
Copy link
Author

davidmdm commented Jun 8, 2024

Love the amount of PRs and energy going into this and wanted to drop my appreciation here. 🚀 🚀 🚀

@mathetake
Copy link
Member

@davidmdm mind trying out the main branch and sharing the result with us when you get a chance? 🙏

@davidmdm
Copy link
Author

davidmdm commented Jun 11, 2024

Absolutely! Here are the results from running time wazero compile binary.wasm

v1.6.0 -> 2.656s
v1.7.2 -> 28.759
main   -> 17.45s

If it would help to profile the application, the wasm program I am using is publicly available and can be downloaded here as a gzip.

It is essentially a program that embeds the ArgoCD Helm Chart, executes it, and performs a couple patches to some internal resources before spitting it back out again.

So some characteristics:

  • embeds large assets
  • uses a lot of marshalling/unmarshalling

Great work! The next wazero release will be at least twice as fast as v1.7.2!

EDIT: adding wasmtime as reference for the same wasm binary:

wasmtime -> 5.917s

@mathetake
Copy link
Member

let's keep this open until the perf becomes comparable to wasmtime. Thank you for the testing @davidmdm !

@mathetake
Copy link
Member

oh wait, I remember wasmtime does a parallel compilation (using multiple workers to compile multiple functions simultaneously) vs wazero compilation runs in a single thread/goroutine. I wonder if it's possible for wasmtime to compile in a single thread

@davidmdm
Copy link
Author

Perhaps there are opportunities to make certain parts of the wazero compiler concurrent? If so that may bridge the gap considerably!

@mathetake
Copy link
Member

yeah that's one thing we should consider, but for now I think I would like to focus on the single thread perf and then we can return to the parallelization (which I think shouldn't be that hard)

@ncruces
Copy link
Collaborator

ncruces commented Jun 12, 2024

If we're going for that kind of sophistication (compile functions in parallel), I wonder if we could also compile functions incrementally (on first use).

@inliquid
Copy link

compile functions incrementally (on first use).

Sounds like introduction of JIT, or do I miss something?

@ncruces
Copy link
Collaborator

ncruces commented Jun 13, 2024

The difference between a JIT and AOT is blurred, IMO.

But if we can compile functions one-by-one in parallel, maybe we can compile functions one-by-one on first use?

To not make things much harder, I guess we'd need to know the call dependency tree (compile all functions a certain function can possibly call). But maybe that's not useful since with indirect calls those may all functions in the Wasm?

TBH, I don't know.

@mathetake
Copy link
Member

now with the current main branch, it seems like 14s~15s sec (previously 30s+) to compile your binary on my local machine @davidmdm 😎

@davidmdm
Copy link
Author

@mathetake Oh I know! I am not saying much but I promise you that I am lurking and doing little victory dances every time I see a PR shave off another second come through!

I ran it myself and also got ~14 seconds! We(you!)'ve crossed the >50% improvement since v1.7.0 🎉

Thanks so much for this effort.

@davidmdm
Copy link
Author

Dropping another ❤️.

My binary, which I realized doesn't need to do anything other than import k8's client-go package to take 30+ seconds to compile in 1.7.2 is now approximately 13 seconds on master.

❤️ ❤️ 🚀 🚀 🎸 🎸 🥳 🥳

@evacchi
Copy link
Contributor

evacchi commented Jul 10, 2024

Awesome!!! @mathetake the real mvp 😎

@davidmdm
Copy link
Author

Hello wazero team!

I am wondering what the status of this effort is? If there are still plans to try and make the wazero compiler concurrent, etc.

Totally understand the constraints the team is under, and do not assume for a second that this takes priority over all other things.

For my project I am still using 1.6.0 since 1.8.x is still 5x slower. However I would love to update. Just want to gauge/feel out the timeline. I understand if one cannot be provided.

@mathetake
Copy link
Member

Unfortunately, we don't have any cycles to dedicate to wazero at all at the moment😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants