Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine AOT/JIT code call wasm-c-api import process #2982

Merged

Conversation

wenyongh
Copy link
Contributor

@wenyongh wenyongh commented Jan 5, 2024

Allow to invoke the quick all entry wasm_runtime_quick_invoke_c_api_import to
call the wasm-c-api import functions to speedup the calling process, which reduces
the data copying.

Use wamrc --invoke-c-api-import to generate the optimized AOT code, and set
jit_options->quick_invoke_c_api_import true in wasm_engine_new when LLVM JIT
is enabled.

@@ -417,6 +417,7 @@ struct wasm_ref_t;

typedef struct wasm_val_t {
wasm_valkind_t kind;
uint8_t __paddings[7];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess these manual padding warrants comments.
for platforms with a loose alignment for 64 bit types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the layout of 32-bit may be different from the layout of 64-bit, while in AOT compiler, it uses fixed layout and hopes they are the same.

@yamt
Copy link
Collaborator

yamt commented Jan 5, 2024

Allow to invoke the quick all entry wasm_runtime_quick_invoke_c_api_import to call the wasm-c-api import functions to speedup the calling process, which reduces the data copying.

i feel we already have too many function calling mechanisms.
do you have some benchmark numbers to show it's worth to add another?

@wenyongh
Copy link
Contributor Author

wenyongh commented Jan 8, 2024

Allow to invoke the quick all entry wasm_runtime_quick_invoke_c_api_import to call the wasm-c-api import functions to speedup the calling process, which reduces the data copying.

i feel we already have too many function calling mechanisms. do you have some benchmark numbers to show it's worth to add another?

Yes, there are some calling mechanisms now, such as AOT calls AOT, AOT calls host, host calls AOT and so on. For the callings between AOT and host, there are mainly three calling conventions: (1) host calls AOT, (2) AOT calls host native APIs, whose convention is same as AOT function and can be registered by wasm_runtime_register_natives, (3) AOT calls host wasm-c-api APIs whose convention is defined by wasm-c-api, and can be registered by wasm_instance_new(.., const wasm_extern_vec_t *imports).

Since there are scenarios in which there may be frequent (lots of) callings between host and AOT/JIT, e.g. Envoy, refining these calling processes becomes important as it really impacts performance a lot in that scenario. Currently for (2), developer can use wamrc --native-lib=xxx.so to register the empty native APIs with the same signatures registered when iwasm runs, so as to speedup the calling process from AOT to host. For (1), I found a way to refine it, and submitted PR #2978 to register some built-in quick entries. I think we can extend the mechanism soon, to allow developer to register his quick entries, for example wasm_runtime_register_quick_aot_entries. For (3), I submitted this PR. In the original implementation, the calling process is AOT code -> aot_invoke_native -> wasm_runtime_invoke_c_api_native -> c-api import, and there may be lots of memory copings. In the new implementation, the calling process is AOT code -> wasm_runtime_quick_invoke_c_api_native -> c-api import, and the wasm_val_t *params are prepared in the AOT code and directly consumed by c-api import function, so it improves the performance.

I tested the callings for four empty c-api import functions with 1/2/3/4 arguments respectively from AOT code, so the execution time is mostly the time of the calling process. The sample is uploaded:
c-api-imports-test.zip
And here is the test result:
image

We can see the execution time after optimization is about 23% to 24% of that without optimization. It improves a lot.

My suggestion is that we disable it by default, and add a new document to describe these optimization opportunities (register quick AOT entries, wamrc --native-lib=.., and wamrc --invoke-c-api-import), and add it as a new item in perf_tune.doc. How do you think? Thanks.

@wenyongh wenyongh marked this pull request as draft January 9, 2024 09:03
Merge bytecodealliance:main into wenyongh:dev/quick_invoke_c_api_import
@wenyongh wenyongh marked this pull request as ready for review January 10, 2024 08:46
@wenyongh
Copy link
Contributor Author

Merge this PR as it may improve the calling process a lot and may benefit some scenarios, e.g. Envoy.

@wenyongh wenyongh merged commit b21f17d into bytecodealliance:main Jan 10, 2024
393 checks passed
@wenyongh wenyongh deleted the dev/quick_invoke_c_api_import branch January 11, 2024 06:31
victoryang00 pushed a commit to victoryang00/wamr-aot-gc-checkpoint-restore that referenced this pull request May 27, 2024
…2982)

Allow to invoke the quick call entry wasm_runtime_quick_invoke_c_api_import to
call the wasm-c-api import functions to speedup the calling process, which reduces
the data copying.

Use `wamrc --invoke-c-api-import` to generate the optimized AOT code, and set
`jit_options->quick_invoke_c_api_import` true in wasm_engine_new when LLVM JIT
is enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants