Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AOT Engine error: Method too large #405

Open
bhelx opened this issue Jun 24, 2024 · 10 comments
Open

AOT Engine error: Method too large #405

bhelx opened this issue Jun 24, 2024 · 10 comments

Comments

@bhelx
Copy link
Contributor

bhelx commented Jun 24, 2024

This is a known error, but wanted to create this for tracking and discussion:

Method too large: com/dylibso/chicory/$gen/CompiledModule.func_153

I believe the strategy here is to split the method up after it we finish it (assuming it breaks the limit). There is some prior art for this that works with ASM, but is fairly old. I assume there are some modern open source optimizers or tools that work with ASM we can use here.

@electrum electrum changed the title ATO Engine error: Method too large AOT Engine error: Method too large Jun 26, 2024
@electrum
Copy link
Collaborator

I'm working on a solution for this, with the goal to get Python running in AOT. The Python interpreter has a single huge method for its interpreter loop. Do you have other examples of WASM programs with large functions?

I looked at msplit, but couldn't get it to work, and I'm not sure it's the right approach anyway, as it's trying to split arbitrary bytecode that would never be generated by our AOT.

My current idea is to have a different code generation mode for large functions:

  • generate a state class (public primitive fields) for WASM locals instead of JVM locals
  • identify interior blocks to split out into separate methods
    • choosing targets of a BR_TABLE is probably good enough for real programs
  • use exceptions to handle non-local jumps out of blocks in methods
    • preallocate exception instances for performance (no stack trace overhead) when there is nothing to return

@electrum
Copy link
Collaborator

WASM locals are actually simpler than JVM locals, as they are declared up front for the function, whereas JVM locals can be reused with different types. Similarly, WASM blocks are more restrictive than JVM bytecode, as you can only jump to the start or end of a block, but in JVM bytecode there are fewer restrictions.

So while the idea of splitting bytecode seems nice from separation of concerns perspective, it seems more difficult to do, as you need to analyze it and rediscover the constraints from the original WASM. I'm exploring both approaches.

@evacchi
Copy link
Collaborator

evacchi commented Jun 26, 2024

I wonder if these old projects had a specific way to deal with this problem at codegen time

@electrum
Copy link
Collaborator

electrum commented Jun 26, 2024

@evacchi Thanks for the pointers. Cibyl doesn't seem to handle large methods, which likely didn't matter as it was a developer environment targeting J2ME phones. You could adjust the original source code if the method size was a problem.

I remember reading the NestedVM paper many years ago. Reading it again, it's very interesting to see how similar it is to WebAssembly. They represent CPU registers as class fields and structure the code execution using the original binary addresses with an explicit program counter. The method size limit is worked around by using a trampoline transformation.

NestedVM is implementing an efficient interpreter for MIPS binaries, retaining their original semantics, whereas WASM is defined in a more abstract way and is intended to be efficiently transformed to the target environment. We do an idiomatic transformation to JVM bytecode rather than trying to implement an interpreter for the WASM abstract machine. This should allow the JVM JIT to generate code that is as efficient as possible.

WASM is a stack machine with a very similar structure to JVM bytecode, which allows us to translate in simple manner, instruction at a time, without needing to build an IR and full compiler. This is why we use static methods and pass Instance and Memory explicitly, rather than using a stateful class. Calling an instance method requires the receiver object to be pushed first, before the other parameters. But because we translate directly, by the time we get to the CALL instruction, the other parameters are already on the stack. So instead of an instance method, we use static methods with the special parameters at the end (and thus pushed last). This is why I'm leaning towards using an explicit state object rather than a stateful class.

@bhelx
Copy link
Contributor Author

bhelx commented Jun 26, 2024

I'm working on a solution for this, with the goal to get Python running in AOT. The Python interpreter has a single huge method for its interpreter loop. Do you have other examples of WASM programs with large functions?

I tried this with my SNES emulator project. It maybe has a large function for the same reason as python. a large emulator loop. I'll upload the repo tomorrow if that's helpful.

My current idea is to have a different code generation mode for large functions:

That's an interesting idea. I hadn't really considered it because my assumption was you need to know the structure of the whole function before you can split it. It makes sense that msplit maybe doesn't work anymore. It's been a while since it's had an update.

@electrum
Copy link
Collaborator

I'll upload the repo tomorrow if that's helpful.

Please do!

@andreaTP
Copy link
Collaborator

Now that I read the evolution of this @enebo might be able to share his experience with Ruby here.

@bhelx
Copy link
Contributor Author

bhelx commented Jun 27, 2024

@electrum Here is the SNES repo and an AOT branch https://github.com/dylibso/chicory-snes/tree/aot

you'll need to LEGALLY acquire an SNES rom to run it!

@enebo
Copy link
Contributor

enebo commented Jun 28, 2024

JRuby has had this issue but it has been rare enough that we never had enough motivation to address it. It only happened in a single case which was a project which used the state machine ragel to generate a Ruby parser. Our fallback when we cannot compile is to just interpret the code. So that library never ran very fast in JRuby. Prism came to the rescue so now that case has left the building.

We have also spent time examining bytecode to reduce how much we emit (and invokedynamic can help reduce overall bytecode). This is more of a mitigation than an actual solution.

JRuby's IR is a register-based IR and those registers end up being locals. For us to implement something like method splitting we have to deal with this by likely backing those into a primitive array and generating synthetic methods for the splits. Our IR design had not considered the need for it so I think we would do something different if we started again. Splitting is also desirable from the standpoint that uncommon paths can be split out so that the JVM can see a smaller common path method.

@andreaTP
Copy link
Collaborator

andreaTP commented Jul 2, 2024

Thanks for sharing your experience @enebo !

Our fallback when we cannot compile is to just interpret the code.

Let see how this goes, but this is a safe net we might want to implement for multiple reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants