Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed compilation cache contents via go:embed #1733

Closed
rgl opened this issue Sep 23, 2023 · 13 comments
Closed

Embed compilation cache contents via go:embed #1733

rgl opened this issue Sep 23, 2023 · 13 comments
Labels
enhancement New feature or request

Comments

@rgl
Copy link

rgl commented Sep 23, 2023

Is your feature request related to a problem? Please describe.

At rgl/python-wazero-poc I'm playing with the idea of creating a single-file binary that embeds python and a python script.

Is this even possible? If, so, how? :-)

Describe the solution you'd like

Having a way to embed the compiled wasm module in the go binary (e.g. via //go:embed python.wasm.bin).

@rgl rgl added the enhancement New feature or request label Sep 23, 2023
@ncruces
Copy link
Collaborator

ncruces commented Sep 23, 2023

See:
https://github.com/ncruces/RethinkRAW/tree/master/pkg/dcraw

Or:
https://github.com/ncruces/go-sqlite3
https://github.com/ncruces/go-sqlite3/tree/main/embed

If you're creating a library, putting the embed file in another package that sets a var in the main package gives your users flexibility to build their own. Otherwise, make it a const in your main package.

Do cache the compilation result, but don't compile on init, it takes to long to run on import, IMO.

@ncruces ncruces closed this as completed Sep 23, 2023
@rgl
Copy link
Author

rgl commented Sep 23, 2023

I do not want to embed the .wasm. I want to embed the compiled version of it; the one that is compiled by CompileModule and cached by wazero.NewCompilationCacheWithDir inside the, e.g., wazero-v1.5.0-amd64-linux/d791cfcf1399be1f136502c985eac5daf2ac828add3e4953ba304cad2c892236 file. The intent, is to skip the compilation step altogether at runtime (and just use the cached/compiled file), and have my application be a single file executable on disk.

@ncruces
Copy link
Collaborator

ncruces commented Sep 23, 2023

I'll reopen, and leave it to the core team to address, but I'm pretty sure that won't happen any time soon.

I have in the past 1 tried to adapt the compilation cache to use embed. So, it is possible.

But it's not very practical, for various reasons.
You'd have to have a go generate to generate the cache.
You'd need to embed the WASM as well as the compiled code.
And the compiled code is large, like 10x larger than the WASM.

Footnotes

  1. https://github.com/ncruces/wazero/tree/cache

@ncruces ncruces reopened this Sep 23, 2023
@mathetake
Copy link
Member

I don't think we have support for this considering the complexity vs the practical gain. Why not just use the temporary dir?

@mathetake mathetake closed this as not planned Won't fix, can't repro, duplicate, stale Sep 23, 2023
@mathetake mathetake changed the title Embed compiled wasm binary Embed compilation cache contents via go:embed Sep 23, 2023
@rgl
Copy link
Author

rgl commented Sep 24, 2023

For me, one of the major appeals of the Go runtime/ecosystem is its focus on the operational simplification side. It typically has minimal runtime dependencies, e.g., it only needs the kernel, it only generates a single-file artifact/binary, and the binary can be run from a read-only filesystem. IMHO, this should also extend to Go based libraries/runtimes, like wazero. In wazero case, for me, it means, it should have a simpler operational side too; I see that as having a way to ship as a single-file binary that can also run from a read-only filesystem.

FWIW, I do not really known the wazero implementation. I've only actually used it for the first time yesterday, so I barely scratched its surface. I just wanted to say that this is one of its aspects that I find a bit unexpected, and it should have a way to load everything from the go fs.FS abstraction (or from an even simpler one, like a []byte). For me, having a bigger binary is not really a problem; having a cache is. As it has a tendency of leaking/leaving files around as time/versions churn on. And "cache (in)validation is one of the hardest problems in computer science".

@mathetake
Copy link
Member

cache is CPU-specific (not only just architecture but also depending on the generation of architecture). and you really don't want to embed inside the executable - how do you know exact CPU architecture of users of your executable?

@mathetake
Copy link
Member

what I am saying is the cache you mentioned wazero-v1.5.0-amd64-linux/d791cfcf1399be1f136502c985eac5daf2ac828add3e4953ba304cad2c892236 is not a Wasm at all but contains the native executable of amd64. So how do you know and how do you cache these file before the executable is executed and distributed?

@mathetake
Copy link
Member

that's why @ncruces mentioned and everyone in the community does is to embed .wasm and that works. E.g. https://xeiaso.net/talks/wazero-lightning-2023

@mathetake
Copy link
Member

Having said that @ncruces said also that it is possible to provide read-only caches - so what I really want to see is compelling use real world cases and real number of/reasons why embedding .wasm doesn't work. Hope this helps!

@rgl
Copy link
Author

rgl commented Sep 24, 2023

cache is CPU-specific (not only just architecture but also depending on the generation of architecture). and you really don't want to embed inside the executable - how do you know exact CPU architecture of users of your executable?

without knowing the long-term goals of wazero, one way is to redefine/simplify the problem and limit the emitted machine code like the go compiler does with GOOS/GOARCH.

that's why @ncruces mentioned and everyone in the community does is to embed .wasm and that works. E.g. https://xeiaso.net/talks/wazero-lightning-2023

indeed, that part was quite straightforward and worked nicely at my first try of the wazero library. not being able to embed all the compiled binary code is what surprised me. please note that embedding the .wasm was not really part of my question. my intent was/is to ask about the embedding of the compiled/cached wasm binary.

Having said that @ncruces said also that it is possible to provide read-only caches - so what I really want to see is compelling use real world cases and real number of/reasons why embedding .wasm doesn't work. Hope this helps!

I did look at the suggestion and I'm still trying to understand it. from what I can tell from the comments at https://github.com/tetratelabs/wazero/blob/v1.5.0/cache.go#L19-L27, it seems its not possible to provide our own cache implementation without forking wazero. and from what you've said, it seems that currently its-not/will-never-be possible to limit the emitted instructions, so this looks like it will be quite infeasible to implement/maintain outside wazero.

... so what I really want to see is compelling use real world cases and real number of/reasons why embedding .wasm doesn't work. Hope this helps!

as I've tried to describe, not having a cache eliminates a whole class of problems. having it embedded minimizes those problems, while compromising the instruction set that can be used.

@ncruces
Copy link
Collaborator

ncruces commented Sep 24, 2023

cache is CPU-specific (not only just architecture but also depending on the generation of architecture). and you really don't want to embed inside the executable - how do you know exact CPU architecture of users of your executable?

without knowing the long-term goals of wazero, one way is to redefine/simplify the problem and limit the emitted machine code like the go compiler does with GOOS/GOARCH.

The compiled code actually does not (currently) depend on GOOS, I think. But there's a subtle class of issues that you're missing, and which is another reason I gave up on this idea.

A JIT can make an important assumption: the CPU generating the code is the same CPU that runs the code. This means it can test CPU features at compile time once, and then generate code accordingly. The Go compiler, OTOH, has to gate usages of these kinds of CPU instructions, or not use them at all.

This is not theoretical, here's two cases where the generated code already depends on optional features of the amd64 architecture:

if c.cpuFeatures.HasExtra(platform.CpuExtraFeatureABM) {
if unsignedInt == wazeroir.UnsignedInt32 {
c.assembler.CompileRegisterToRegister(amd64.LZCNTL, target.register, target.register)
} else {
c.assembler.CompileRegisterToRegister(amd64.LZCNTQ, target.register, target.register)
}
} else {

if c.cpuFeatures.HasExtra(platform.CpuExtraFeatureABM) {
if unsignedInt == wazeroir.UnsignedInt32 {
c.assembler.CompileRegisterToRegister(amd64.TZCNTL, target.register, target.register)
} else {
c.assembler.CompileRegisterToRegister(amd64.TZCNTQ, target.register, target.register)
}
} else {

Also, these code paths have been a source of issues in the past, because they're harder to test (see: #1111, #1112).

Having said that @ncruces said also that it is possible to provide read-only caches - so what I really want to see is compelling use real world cases and real number of/reasons why embedding .wasm doesn't work. Hope this helps!

I did look at the suggestion and I'm still trying to understand it. from what I can tell from the comments at https://github.com/tetratelabs/wazero/blob/v1.5.0/cache.go#L19-L27, it seems its not possible to provide our own cache implementation without forking wazero. and from what you've said, it seems that currently its-not/will-never-be possible to limit the emitted instructions, so this looks like it will be quite infeasible to implement/maintain outside wazero.

Yes, it's hard to maintain outside of wazero. I did that to understand if:

  1. it'd work (it would)
  2. the benefits would outweigh the costs (not sure here)
  3. I could come up with a mergeable implementation (maybe)

Maybe you can take this forward?

ISTM wazero doesn't want to shoulder the complexities of this distribution model, and I actually agree with that sentiment. So, for 3 I think you want to figure out the minimal subset of APIs necessary to make it so you can handle the complexity outside wazero.

Then, if you still think the benefits (speed of instantiation) would outweigh the costs (much larger binaries, complex go generate pipeline to compile code for reuse, possibility of runtime crashes or undefined behaviour if compile CPU ≠ runtime CPU, etc), I think if you make your case, we'll definitely listen.

I mean, if you find the right balance, I'll probably go as far as to add this option to my own libraries, so you have my upvote. I simple gave up after trying as "not worth the effort."

@rgl
Copy link
Author

rgl commented Sep 24, 2023

I'm probably kinda mistaken about this, but JIT != AOT, which wazero advertises itself as ("Compiler compiles WebAssembly modules into machine code ahead of time (AOT)"). To me, AOT sets some expectations in my mind, like, it can cross-compile the code in a way that is not tied to the current CPU/hardware, its only tied to whatever CPU/features/hardware I mention in the compilation options. Maybe the "compile mode" of wazero should come with some kind of disclaimer about that.

Since that is not what wazero currently does, it invalidates/answers the whole premise of trying to include the compiled wasm in the compiled go. It actually gave me another reason to be wary of the cache.

I did appreciate all the explanations! Thank You! :-)

@ncruces
Copy link
Collaborator

ncruces commented Sep 24, 2023

JIT vs. ATO is blurry.

Yes, wazero is not compiling right before execution only the bits that actually get executed.

It compiles entire WASM modules at load time. Is “load time” “just in” or “ahead of time”? That's just naming.

The important thing to consider here, is that wazero can assume the runtime CPU is the same as the compile time CPU. So it can test the compile time CPU for features and generate code for that specific CPU.

If you break that with cache, it's on you to fix it.

Even beyond CPU features, wazero is not built to cross compile. You'd have to change it to produce arm64 binaries on amd64 and vice versa. I love it that Go makes cross compiling easy peasy, but it's also hard to maintain. If the choice is between focusing on this feature or implementing additional WASM proposals, I guess I'm fine with the choice being to deprioritize this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants