AOT call stack optimizations #3773

loganek · 2024-09-05T15:48:59Z

implemented TINY / STANDARD frame modes - tiny mode is only able to keep track on the IP and func idx, STANDARD mode provides more capabilities (parameters, stack pointer etc)
implemented FRAME_PER_FUNCTION / FRAME_PER_CALL modes - frame per function adds code at the beginning and at the end of each function for allocating / deallocating stack frame, whereas in per-call mode the frame is allocated before each call. The exception is call to imported function, where frame-per-function mode also allocates the stack before the call instruction (as it can't instrument the imported function)

At the moment TINY + FRAME_PER_FUNCTION is automatically enabled in case GC or perf profiling are disabled and values call stack feature is not requested. In all the other cases STANDARD + FRAME_PER_CALL is used.

STANDARD + FRAME_PER_FUNCTION and TINY + FRAME_PER_CALL are currently not implemented but possible, and might be enabled in the future.

#3758

loganek · 2024-09-05T16:41:31Z

@wenyongh I'm not 100% sure if I haven't missed any places where I should release the frame so I'd appreciate if you feedback especially in that area. Thanks

loganek · 2024-09-05T16:56:57Z

spec tests are failing on nuttx, likely unrelated to this issue and I hope this can fix it: #3771

core/iwasm/compilation/aot_stack_frame.h

core/iwasm/aot/aot_runtime.c

wenyongh · 2024-09-06T06:20:19Z

core/iwasm/aot/aot_runtime.c

+ * because frame allocation is not part of the function) or the function
+ * within the module (in that case the function itself is responsible for
+ * allocating / freeing the frame), so we need to check it at runtime.
+ */


If the frame of the import function isn't allocated, there may be several issues: (1) the dumped call stacks may be not complete, (2) for GC, the native function's parameters are not committed frame and AOT runtime can not traversing them and add them to GC's root set, GC reclaim process may be invalid, (3) for performance profiling, the execution time of the import function isn't found.
Can we allocate the import function's frame before calling the function and freeing after calling the function? Like what current code does for both import and non-import functions.

Can we allocate the import function's frame before calling the function and freeing after calling the function? Like what current code does for both import and non-import functions.

@wenyongh that's exactly what happens now; for call opcode we have:

wasm-micro-runtime/core/iwasm/compilation/aot_emit_function.c

Lines 1485 to 1503 in eb4d5bb

if (comp_ctx->aux_stack_frame_type) {

if (func_idx < import_func_count

&& comp_ctx->call_stack_features.frame_per_function) {

if (!aot_alloc_frame_per_function_frame_for_aot_func(

comp_ctx, func_ctx, func_idx)) {

return false;

}

}

else if (!comp_ctx->call_stack_features.frame_per_function) {

if (comp_ctx->aux_stack_frame_type

!= AOT_STACK_FRAME_TYPE_STANDARD) {

aot_set_last_error("unsupported mode");

return false;

}

if (!alloc_frame_for_aot_func(comp_ctx, func_ctx, func_idx)) {

return false;

}

}

}

so the frame will be allocated using aot_alloc_frame_per_function_frame_for_aot_func if this is imported function. For call_indirect there's always a call to aot_alloc_frame (there's no condition that checks if it's frame-per-function mode), and in the aot_alloc_frame the frame allocation is only skipped for non-imported function; otherwise it's allocated:

wasm-micro-runtime/core/iwasm/aot/aot_runtime.c

Lines 3803 to 3822 in eb4d5bb

bool

aot_alloc_frame(WASMExecEnv *exec_env, uint32 func_index)

{

AOTModule *module =

(AOTModule *)((AOTModuleInstance *)exec_env->module_inst)->module;

if (is_frame_per_function(exec_env)

&& func_index >= module->import_func_count) {

/* in frame per function mode the frame is allocated at

the beginning of each frame, so we only need to allocate

the frame for imported functions */

return true;

}

if (is_tiny_frame(exec_env)) {

return aot_alloc_tiny_frame(exec_env, func_index);

}

else {

return aot_alloc_standard_frame(exec_env, func_index);

}

}

Yes, but I think we can add check in the compiled LLVM IR of opcode call_indirect: compare the function index with import_func_count, and call aot_alloc_frame if func_idx < import_func_count. It doesn't necessary always calling aot_alloc_frame.

And seems no need to add the flag comp_ctx->call_stack_features.frame_per_function, my idea is that: (1) in the new code of aot compiler, for both full frame and tiny frame, (a) always allocate frame in the beginning of a non-imported function, (b) always allocate frame before calling the imported function (in both op call and op call_indirect), (c) always emit the WASM_FEATURE_FRAME_PER_FUNCTION flag in the aot file. This simplify the aot compilation and seems no need to aot_alloc_import_frame.

(2) in aot runtime, (a) always aot_alloc_frame before calling import function, (b) aot_alloc_frame before calling non-import function if WASM_FEATURE_FRAME_PER_FUNCTION flag isn't detected, so as to keep backward compatibility.

I think it should be simpler, please correct me if I am wrong or there is something I missed.

Yes, but I think we can add check in the compiled LLVM IR of opcode call_indirect: compare the function index with import_func_count, and call aot_alloc_frame if func_idx < import_func_count. It doesn't necessary always calling aot_alloc_frame.

That's I think possible, I thought it will add slightly more code. However, I just noticed the comparison is already generated in AOT and there are two blocks for import and non-import function, so shouldn't bloat the code.

And seems no need to add the flag comp_ctx->call_stack_features.frame_per_function

Yes, it could potentially be removed; however, I didn't do it (just yet) due to multimodule / dynamic linking - if we always use frame-per-function mode, the callstack will contain duplicates. So I think we should keep both modes in the compiler for now until we figure out how to do it correctly; there are few options that come to my mind:

explicitly expose frame-per-function as parameter to wamrc, so users can choose the mode

allow users to define which functions are provided by another module and which functions are imported from native or modules without instrumentation

Either is fine for me, but also I don't think any of the option needs to be implemented right now and can be added in the future without breaking the ABI. What do you think?

Yes, for me the first option (expose frame-per-function as parameter to wamrc) is better and can be implemented in the future. The latter may be a little complex for developer to choose which functions?

agree, I'll make it a follow-up task for the #3758

@wenyongh I also removed aot_free_import_frame function and moved the allocation of the frame for imported function to the AOT code. Let me know if you have any questions.

- implemented TINY / STANDARD frame modes - tiny mode is only able to keep track on the IP and func idx, STANDARD mode provides more capabilities (parameters, stack pointer etc) - implemented FRAME_PER_FUNCTION / FRAME_PER_CALL modes - frame per function adds code at the beginning and at the end of each function for allocating / deallocating stack frame, whereas in per-call mode the frame is allocated before each call. The exception is call to imported function, where frame-per-function mode also allocates the stack before the `call` instruction (as it can't instrument the imported function) At the moment TINY + FRAME_PER_FUNCTION is automatically enabled in case GC or perf profiling are disabled and `values` call stack feature is not requested. In all the other cases STANDARD + FRAME_PER_CALL is used. STANDARD + FRAME_PER_FUNCTION and TINY + FRAME_PER_CALL are currently not implemented but possible, and might be enabled in the future.

…e, instead of doing it in runtime

core/iwasm/aot/aot_runtime.c

core/iwasm/compilation/aot_stack_frame_comp.c

core/iwasm/compilation/aot_compiler.c

core/iwasm/compilation/aot_emit_control.c

core/iwasm/compilation/aot_emit_function.c

core/iwasm/compilation/aot_stack_frame_comp.c

core/iwasm/compilation/aot_emit_control.c

wenyongh

LGTM with minor issues.

core/iwasm/compilation/aot_emit_function.c

loganek force-pushed the loganek/tiny-frame2 branch 4 times, most recently from 36c0ed8 to 81be41a Compare September 5, 2024 16:39

loganek force-pushed the loganek/tiny-frame2 branch from 81be41a to 39fc6de Compare September 5, 2024 19:45

wenyongh reviewed Sep 6, 2024

View reviewed changes

loganek force-pushed the loganek/tiny-frame2 branch 2 times, most recently from eb4d5bb to 32f09a1 Compare September 6, 2024 09:04

loganek added 2 commits September 6, 2024 14:20

Generate code for deciding whether indirect call should allocate fram…

e5ae586

…e, instead of doing it in runtime

loganek force-pushed the loganek/tiny-frame2 branch from 32f09a1 to e5ae586 Compare September 6, 2024 13:27

wenyongh reviewed Sep 9, 2024

View reviewed changes

loganek force-pushed the loganek/tiny-frame2 branch from d168044 to 94c5df0 Compare September 9, 2024 07:35

wenyongh reviewed Sep 9, 2024

View reviewed changes

core/iwasm/compilation/aot_emit_control.c Outdated Show resolved Hide resolved

core/iwasm/compilation/aot_emit_control.c Outdated Show resolved Hide resolved

loganek force-pushed the loganek/tiny-frame2 branch from 94c5df0 to 4dae5a7 Compare September 9, 2024 09:31

wenyongh reviewed Sep 9, 2024

View reviewed changes

core/iwasm/compilation/aot_emit_function.c Outdated Show resolved Hide resolved

core/iwasm/compilation/aot_emit_function.c Outdated Show resolved Hide resolved

Address code review comments

b0dc668

loganek force-pushed the loganek/tiny-frame2 branch from 4dae5a7 to b0dc668 Compare September 9, 2024 10:48

wenyongh merged commit cbc2078 into bytecodealliance:main Sep 10, 2024
387 checks passed

loganek mentioned this pull request Sep 11, 2024

[RFC] Option to reduce code size of the generated AOT file when stack trace is enabled #3758

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AOT call stack optimizations #3773

AOT call stack optimizations #3773

loganek commented Sep 5, 2024 •

edited

Loading

loganek commented Sep 5, 2024

loganek commented Sep 5, 2024

wenyongh Sep 6, 2024

loganek Sep 6, 2024

wenyongh Sep 6, 2024

wenyongh Sep 6, 2024

loganek Sep 6, 2024 •

edited

Loading

wenyongh Sep 6, 2024 •

edited

Loading

loganek Sep 6, 2024

loganek Sep 6, 2024

wenyongh left a comment

	if (comp_ctx->aux_stack_frame_type) {
	if (func_idx < import_func_count
	&& comp_ctx->call_stack_features.frame_per_function) {
	if (!aot_alloc_frame_per_function_frame_for_aot_func(
	comp_ctx, func_ctx, func_idx)) {
	return false;
	}
	}
	else if (!comp_ctx->call_stack_features.frame_per_function) {
	if (comp_ctx->aux_stack_frame_type
	!= AOT_STACK_FRAME_TYPE_STANDARD) {
	aot_set_last_error("unsupported mode");
	return false;
	}
	if (!alloc_frame_for_aot_func(comp_ctx, func_ctx, func_idx)) {
	return false;
	}
	}
	}

	bool
	aot_alloc_frame(WASMExecEnv *exec_env, uint32 func_index)
	{
	AOTModule *module =
	(AOTModule )((AOTModuleInstance )exec_env->module_inst)->module;

	if (is_frame_per_function(exec_env)
	&& func_index >= module->import_func_count) {
	/* in frame per function mode the frame is allocated at
	the beginning of each frame, so we only need to allocate
	the frame for imported functions */
	return true;
	}
	if (is_tiny_frame(exec_env)) {
	return aot_alloc_tiny_frame(exec_env, func_index);
	}
	else {
	return aot_alloc_standard_frame(exec_env, func_index);
	}
	}

AOT call stack optimizations #3773

AOT call stack optimizations #3773

Conversation

loganek commented Sep 5, 2024 • edited Loading

loganek commented Sep 5, 2024

loganek commented Sep 5, 2024

wenyongh Sep 6, 2024

Choose a reason for hiding this comment

loganek Sep 6, 2024

Choose a reason for hiding this comment

wenyongh Sep 6, 2024

Choose a reason for hiding this comment

wenyongh Sep 6, 2024

Choose a reason for hiding this comment

loganek Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

wenyongh Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

loganek Sep 6, 2024

Choose a reason for hiding this comment

loganek Sep 6, 2024

Choose a reason for hiding this comment

wenyongh left a comment

Choose a reason for hiding this comment

loganek commented Sep 5, 2024 •

edited

Loading

loganek Sep 6, 2024 •

edited

Loading

wenyongh Sep 6, 2024 •

edited

Loading