Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protect access to exec env #1991

Merged
merged 2 commits into from
Feb 27, 2023

Conversation

eloparco
Copy link
Contributor

@eloparco eloparco commented Feb 26, 2023

Fix concurrent access to exec env: it is accessed by the thread setting the suspend flags (to stop other threads) and threads (in the same cluster) reading it to check for termination.

Tested it by using the thread sanitizer and running WASI tests in the proposal https://github.com/WebAssembly/wasi-threads/tree/main/test/testsuite.

@eloparco eloparco force-pushed the eloparco/exec-env-lock branch from 9df7a5b to de86504 Compare February 27, 2023 02:17
core/iwasm/interpreter/wasm_interp_classic.c Outdated Show resolved Hide resolved
core/iwasm/libraries/libc-wasi/libc_wasi_wrapper.c Outdated Show resolved Hide resolved
core/iwasm/common/wasm_exec_env.c Outdated Show resolved Hide resolved
@eloparco eloparco force-pushed the eloparco/exec-env-lock branch 2 times, most recently from d44f851 to 5550b56 Compare February 27, 2023 09:42
ret = wasm_cluster_create_thread(exec_env, new_module_inst, false,
thread_start, thread_start_arg);
if (ret != 0) {
LOG_ERROR("Failed to spawn a new thread");
goto thread_spawn_fail;
}
os_mutex_unlock(&exec_env->wait_lock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there data race here?
Had better not remove the lock, instead, add a cond_wait like pthread_create_wrapper in lib_pthread_wrapper.c:

/* Wait for the thread routine to assign the exec_env to
thread_info_node, otherwise the exec_env in the thread
info node may be NULL in the next pthread API call */
os_cond_wait(&exec_env->wait_cond, &exec_env->wait_lock);

And in the thread_start, add cond_signal like pthread_start_routine:

os_cond_signal(&parent_exec_env->wait_cond);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I removed it is that, after getting the exec_env lock there, the cluster lock gets acquired


but then an attempt to get the exec_env lock is made again at https://github.com/eloparco/wasm-micro-runtime/blob/5550b5647037a7cc9674dda03413a95e3c06cba3/core/iwasm/libraries/thread-mgr/thread_manager.c#L946 generating a warning in the sanitizer for potential deadlock.

Let me see how it can be fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the thread isn't actually started, but it is already added to the cluster's exec_env list, and another thread (seems main thread) tries to terminate it? Maybe after adding the similar logic like pthread_create_wrapper, the issue can be resolved since the main thread will wait unit the child thread actually started.
Here try adding cond_wait:

    os_mutex_lock(&exec_env->wait_lock);
    ret = wasm_cluster_create_thread(exec_env, new_module_inst, false,
                                     thread_start, thread_start_arg);
    ...
    os_cond_wait(&exec_env->wait_cond, &exec_env->wait_lock);
    os_mutex_unlock(&exec_env->wait_lock);

And in thread_start:

    os_mutex_lock(&parent_exec_env->wait_lock);
    ...
    os_cond_signal(&parent_exec_env->wait_cond, &parent_exec_env->wait_lock);
    os_mutex_unlock(&parent_exec_env->wait_lock);

Copy link
Contributor Author

@eloparco eloparco Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually only happens when it's the spawned thread that calls proc_exit.

I see that for pthreads that lock is used to wait for the update of ThreadInfoNode.

/* Wait for the thread routine to assign the exec_env to
thread_info_node, otherwise the exec_env in the thread
info node may be NULL in the next pthread API call */
os_cond_wait(&exec_env->wait_cond, &exec_env->wait_lock);

In lib_wasi_threads_wrapper.c we don't use ThreadInfoNode so I don't think the lock there is even needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, once this one is merged, we can merge #1985 too

@eloparco eloparco force-pushed the eloparco/exec-env-lock branch from 5550b56 to 15a2826 Compare February 27, 2023 10:15
@eloparco eloparco force-pushed the eloparco/exec-env-lock branch from 15a2826 to 3ff2609 Compare February 27, 2023 11:12
Copy link
Contributor

@wenyongh wenyongh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wenyongh wenyongh merged commit 52e26e5 into bytecodealliance:main Feb 27, 2023
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 14, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 14, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 14, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 14, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 14, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 14, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 14, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 17, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
loganek added a commit to loganek/wasm-micro-runtime that referenced this pull request Jul 20, 2023
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
wenyongh pushed a commit that referenced this pull request Jul 21, 2023
We have observed a significant performance degradation after merging
#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
victoryang00 pushed a commit to victoryang00/wamr-aot-gc-checkpoint-restore that referenced this pull request May 27, 2024
…iance#1991)

Data race may occur when accessing exec_env's fields, e.g. suspend_flags
and handle. Add lock `exec_env->wait_lock` for them to resolve the issue.
victoryang00 pushed a commit to victoryang00/wamr-aot-gc-checkpoint-restore that referenced this pull request May 27, 2024
We have observed a significant performance degradation after merging
bytecodealliance#1991
Instead of protecting suspend flags with a mutex, we implement the flags
as atomic variable and only use mutex when atomics are not available
on a given platform.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants