Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AOT call stack optimizations #3773

Merged
merged 3 commits into from
Sep 10, 2024

Conversation

loganek
Copy link
Collaborator

@loganek loganek commented Sep 5, 2024

  • implemented TINY / STANDARD frame modes - tiny mode is only able to keep track on the IP and func idx, STANDARD mode provides more capabilities (parameters, stack pointer etc)
  • implemented FRAME_PER_FUNCTION / FRAME_PER_CALL modes - frame per function adds code at the beginning and at the end of each function for allocating / deallocating stack frame, whereas in per-call mode the frame is allocated before each call. The exception is call to imported function, where frame-per-function mode also allocates the stack before the call instruction (as it can't instrument the imported function)

At the moment TINY + FRAME_PER_FUNCTION is automatically enabled in case GC or perf profiling are disabled and values call stack feature is not requested. In all the other cases STANDARD + FRAME_PER_CALL is used.

STANDARD + FRAME_PER_FUNCTION and TINY + FRAME_PER_CALL are currently not implemented but possible, and might be enabled in the future.

#3758

@loganek loganek force-pushed the loganek/tiny-frame2 branch 4 times, most recently from 36c0ed8 to 81be41a Compare September 5, 2024 16:39
@loganek
Copy link
Collaborator Author

loganek commented Sep 5, 2024

@wenyongh I'm not 100% sure if I haven't missed any places where I should release the frame so I'd appreciate if you feedback especially in that area. Thanks

@loganek
Copy link
Collaborator Author

loganek commented Sep 5, 2024

spec tests are failing on nuttx, likely unrelated to this issue and I hope this can fix it: #3771

core/iwasm/compilation/aot_stack_frame.h Show resolved Hide resolved
core/iwasm/aot/aot_runtime.c Show resolved Hide resolved
core/iwasm/aot/aot_runtime.c Show resolved Hide resolved
* because frame allocation is not part of the function) or the function
* within the module (in that case the function itself is responsible for
* allocating / freeing the frame), so we need to check it at runtime.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the frame of the import function isn't allocated, there may be several issues: (1) the dumped call stacks may be not complete, (2) for GC, the native function's parameters are not committed frame and AOT runtime can not traversing them and add them to GC's root set, GC reclaim process may be invalid, (3) for performance profiling, the execution time of the import function isn't found.
Can we allocate the import function's frame before calling the function and freeing after calling the function? Like what current code does for both import and non-import functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we allocate the import function's frame before calling the function and freeing after calling the function? Like what current code does for both import and non-import functions.

@wenyongh that's exactly what happens now; for call opcode we have:

if (comp_ctx->aux_stack_frame_type) {
if (func_idx < import_func_count
&& comp_ctx->call_stack_features.frame_per_function) {
if (!aot_alloc_frame_per_function_frame_for_aot_func(
comp_ctx, func_ctx, func_idx)) {
return false;
}
}
else if (!comp_ctx->call_stack_features.frame_per_function) {
if (comp_ctx->aux_stack_frame_type
!= AOT_STACK_FRAME_TYPE_STANDARD) {
aot_set_last_error("unsupported mode");
return false;
}
if (!alloc_frame_for_aot_func(comp_ctx, func_ctx, func_idx)) {
return false;
}
}
}

so the frame will be allocated using aot_alloc_frame_per_function_frame_for_aot_func if this is imported function. For call_indirect there's always a call to aot_alloc_frame (there's no condition that checks if it's frame-per-function mode), and in the aot_alloc_frame the frame allocation is only skipped for non-imported function; otherwise it's allocated:

bool
aot_alloc_frame(WASMExecEnv *exec_env, uint32 func_index)
{
AOTModule *module =
(AOTModule *)((AOTModuleInstance *)exec_env->module_inst)->module;
if (is_frame_per_function(exec_env)
&& func_index >= module->import_func_count) {
/* in frame per function mode the frame is allocated at
the beginning of each frame, so we only need to allocate
the frame for imported functions */
return true;
}
if (is_tiny_frame(exec_env)) {
return aot_alloc_tiny_frame(exec_env, func_index);
}
else {
return aot_alloc_standard_frame(exec_env, func_index);
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I think we can add check in the compiled LLVM IR of opcode call_indirect: compare the function index with import_func_count, and call aot_alloc_frame if func_idx < import_func_count. It doesn't necessary always calling aot_alloc_frame.

And seems no need to add the flag comp_ctx->call_stack_features.frame_per_function, my idea is that: (1) in the new code of aot compiler, for both full frame and tiny frame, (a) always allocate frame in the beginning of a non-imported function, (b) always allocate frame before calling the imported function (in both op call and op call_indirect), (c) always emit the WASM_FEATURE_FRAME_PER_FUNCTION flag in the aot file. This simplify the aot compilation and seems no need to aot_alloc_import_frame.

(2) in aot runtime, (a) always aot_alloc_frame before calling import function, (b) aot_alloc_frame before calling non-import function if WASM_FEATURE_FRAME_PER_FUNCTION flag isn't detected, so as to keep backward compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be simpler, please correct me if I am wrong or there is something I missed.

Copy link
Collaborator Author

@loganek loganek Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I think we can add check in the compiled LLVM IR of opcode call_indirect: compare the function index with import_func_count, and call aot_alloc_frame if func_idx < import_func_count. It doesn't necessary always calling aot_alloc_frame.

That's I think possible, I thought it will add slightly more code. However, I just noticed the comparison is already generated in AOT and there are two blocks for import and non-import function, so shouldn't bloat the code.

And seems no need to add the flag comp_ctx->call_stack_features.frame_per_function

Yes, it could potentially be removed; however, I didn't do it (just yet) due to multimodule / dynamic linking - if we always use frame-per-function mode, the callstack will contain duplicates. So I think we should keep both modes in the compiler for now until we figure out how to do it correctly; there are few options that come to my mind:

  • explicitly expose frame-per-function as parameter to wamrc, so users can choose the mode
  • allow users to define which functions are provided by another module and which functions are imported from native or modules without instrumentation

Either is fine for me, but also I don't think any of the option needs to be implemented right now and can be added in the future without breaking the ABI. What do you think?

Copy link
Contributor

@wenyongh wenyongh Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for me the first option (expose frame-per-function as parameter to wamrc) is better and can be implemented in the future. The latter may be a little complex for developer to choose which functions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, I'll make it a follow-up task for the #3758

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenyongh I also removed aot_free_import_frame function and moved the allocation of the frame for imported function to the AOT code. Let me know if you have any questions.

@loganek loganek force-pushed the loganek/tiny-frame2 branch 2 times, most recently from eb4d5bb to 32f09a1 Compare September 6, 2024 09:04
 - implemented TINY / STANDARD frame modes - tiny mode is only able to keep track on the IP and func idx, STANDARD mode provides more capabilities (parameters, stack pointer etc)
 - implemented FRAME_PER_FUNCTION / FRAME_PER_CALL modes - frame per function adds code at the beginning and at the end of each function for allocating / deallocating stack frame, whereas in per-call mode the frame is allocated before each call. The exception is call to imported function, where frame-per-function mode also allocates the stack before the `call` instruction (as it can't instrument the imported function)

At the moment TINY + FRAME_PER_FUNCTION is automatically enabled in case GC or perf profiling are disabled and `values` call stack feature is not requested. In all the other cases STANDARD + FRAME_PER_CALL is used.

STANDARD + FRAME_PER_FUNCTION and TINY + FRAME_PER_CALL are currently not implemented but possible, and might be enabled in the future.
core/iwasm/aot/aot_runtime.c Show resolved Hide resolved
core/iwasm/compilation/aot_stack_frame_comp.c Outdated Show resolved Hide resolved
core/iwasm/compilation/aot_compiler.c Outdated Show resolved Hide resolved
core/iwasm/compilation/aot_emit_control.c Outdated Show resolved Hide resolved
core/iwasm/compilation/aot_emit_control.c Show resolved Hide resolved
core/iwasm/compilation/aot_emit_function.c Outdated Show resolved Hide resolved
core/iwasm/compilation/aot_emit_function.c Outdated Show resolved Hide resolved
core/iwasm/compilation/aot_stack_frame_comp.c Outdated Show resolved Hide resolved
core/iwasm/compilation/aot_stack_frame_comp.c Show resolved Hide resolved
core/iwasm/compilation/aot_stack_frame_comp.c Show resolved Hide resolved
core/iwasm/compilation/aot_emit_control.c Outdated Show resolved Hide resolved
core/iwasm/compilation/aot_emit_control.c Outdated Show resolved Hide resolved
Copy link
Contributor

@wenyongh wenyongh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor issues.

core/iwasm/compilation/aot_emit_function.c Outdated Show resolved Hide resolved
core/iwasm/compilation/aot_emit_function.c Outdated Show resolved Hide resolved
@wenyongh wenyongh merged commit cbc2078 into bytecodealliance:main Sep 10, 2024
387 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants