Refine AOT/JIT code call wasm-c-api import process #2982

wenyongh · 2024-01-05T09:12:41Z

Allow to invoke the quick all entry wasm_runtime_quick_invoke_c_api_import to
call the wasm-c-api import functions to speedup the calling process, which reduces
the data copying.

Use wamrc --invoke-c-api-import to generate the optimized AOT code, and set
jit_options->quick_invoke_c_api_import true in wasm_engine_new when LLVM JIT
is enabled.

yamt · 2024-01-05T12:03:02Z

core/iwasm/include/wasm_c_api.h

@@ -417,6 +417,7 @@ struct wasm_ref_t;

 typedef struct wasm_val_t {
  wasm_valkind_t kind;
+  uint8_t __paddings[7];


i guess these manual padding warrants comments.
for platforms with a loose alignment for 64 bit types?

yes, the layout of 32-bit may be different from the layout of 64-bit, while in AOT compiler, it uses fixed layout and hopes they are the same.

yamt · 2024-01-05T12:15:49Z

Allow to invoke the quick all entry wasm_runtime_quick_invoke_c_api_import to call the wasm-c-api import functions to speedup the calling process, which reduces the data copying.

i feel we already have too many function calling mechanisms.
do you have some benchmark numbers to show it's worth to add another?

wenyongh · 2024-01-08T03:15:04Z

Allow to invoke the quick all entry wasm_runtime_quick_invoke_c_api_import to call the wasm-c-api import functions to speedup the calling process, which reduces the data copying.

i feel we already have too many function calling mechanisms. do you have some benchmark numbers to show it's worth to add another?

Yes, there are some calling mechanisms now, such as AOT calls AOT, AOT calls host, host calls AOT and so on. For the callings between AOT and host, there are mainly three calling conventions: (1) host calls AOT, (2) AOT calls host native APIs, whose convention is same as AOT function and can be registered by wasm_runtime_register_natives, (3) AOT calls host wasm-c-api APIs whose convention is defined by wasm-c-api, and can be registered by wasm_instance_new(.., const wasm_extern_vec_t *imports).

Since there are scenarios in which there may be frequent (lots of) callings between host and AOT/JIT, e.g. Envoy, refining these calling processes becomes important as it really impacts performance a lot in that scenario. Currently for (2), developer can use wamrc --native-lib=xxx.so to register the empty native APIs with the same signatures registered when iwasm runs, so as to speedup the calling process from AOT to host. For (1), I found a way to refine it, and submitted PR #2978 to register some built-in quick entries. I think we can extend the mechanism soon, to allow developer to register his quick entries, for example wasm_runtime_register_quick_aot_entries. For (3), I submitted this PR. In the original implementation, the calling process is AOT code -> aot_invoke_native -> wasm_runtime_invoke_c_api_native -> c-api import, and there may be lots of memory copings. In the new implementation, the calling process is AOT code -> wasm_runtime_quick_invoke_c_api_native -> c-api import, and the wasm_val_t *params are prepared in the AOT code and directly consumed by c-api import function, so it improves the performance.

I tested the callings for four empty c-api import functions with 1/2/3/4 arguments respectively from AOT code, so the execution time is mostly the time of the calling process. The sample is uploaded:
c-api-imports-test.zip
And here is the test result:

We can see the execution time after optimization is about 23% to 24% of that without optimization. It improves a lot.

My suggestion is that we disable it by default, and add a new document to describe these optimization opportunities (register quick AOT entries, wamrc --native-lib=.., and wamrc --invoke-c-api-import), and add it as a new item in perf_tune.doc. How do you think? Thanks.

Merge bytecodealliance:main into wenyongh:dev/quick_invoke_c_api_import

wenyongh · 2024-01-10T10:36:45Z

Merge this PR as it may improve the calling process a lot and may benefit some scenarios, e.g. Envoy.

…2982) Allow to invoke the quick call entry wasm_runtime_quick_invoke_c_api_import to call the wasm-c-api import functions to speedup the calling process, which reduces the data copying. Use `wamrc --invoke-c-api-import` to generate the optimized AOT code, and set `jit_options->quick_invoke_c_api_import` true in wasm_engine_new when LLVM JIT is enabled.

wenyongh added 3 commits January 5, 2024 11:48

Refine JIT code call wasm-c-api import

e97ec98

quick call c-api-import from aot/jit code

790cc75

add argument result_count and refine code

6910a0e

yamt reviewed Jan 5, 2024

View reviewed changes

wenyongh marked this pull request as draft January 9, 2024 09:03

Merge pull request #868 from bytecodealliance/main

da9d79a

Merge bytecodealliance:main into wenyongh:dev/quick_invoke_c_api_import

wenyongh marked this pull request as ready for review January 10, 2024 08:46

minor change

0aa3d97

wenyongh merged commit b21f17d into bytecodealliance:main Jan 10, 2024
393 checks passed

wenyongh deleted the dev/quick_invoke_c_api_import branch January 11, 2024 06:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine AOT/JIT code call wasm-c-api import process #2982

Refine AOT/JIT code call wasm-c-api import process #2982

wenyongh commented Jan 5, 2024

yamt Jan 5, 2024

wenyongh Jan 9, 2024

yamt commented Jan 5, 2024

wenyongh commented Jan 8, 2024

wenyongh commented Jan 10, 2024

Refine AOT/JIT code call wasm-c-api import process #2982

Refine AOT/JIT code call wasm-c-api import process #2982

Conversation

wenyongh commented Jan 5, 2024

yamt Jan 5, 2024

Choose a reason for hiding this comment

wenyongh Jan 9, 2024

Choose a reason for hiding this comment

yamt commented Jan 5, 2024

wenyongh commented Jan 8, 2024

wenyongh commented Jan 10, 2024