Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sc-executor: Increase maximum instance count #1856

Merged
merged 6 commits into from
Oct 22, 2023

Conversation

bkchr
Copy link
Member

@bkchr bkchr commented Oct 11, 2023

Changes the maximum instances count for wasmtime to 64. It also allows to only pass in maximum 32 for --max-runtime-instances as 256 was way too big. With 64 instances in total and 32 that can be configured in maximum, there should be enough space to accommodate for extra instances that are may required to be allocated adhoc.

It was reported that the maximum instance count of `32` isn't enough. By
default `wasmtime` uses an instance count of `1000`. So, it should be
safe to increase the instance count to `1000`.
@bkchr bkchr added the T0-node This PR/Issue is related to the topic “node”. label Oct 11, 2023
@bkchr bkchr requested a review from a team October 11, 2023 22:12
@bkchr bkchr requested a review from koute as a code owner October 11, 2023 22:12
@bkchr
Copy link
Member Author

bkchr commented Oct 11, 2023

Should fix: moonbeam-foundation/moonbeam#2509

@crystalin
Copy link

Thank you Basti.
That seems quite a huge jump. Should we warn about possible memory usage/limit? (I don't know how wastime handle this internally)

@bkchr
Copy link
Member Author

bkchr commented Oct 11, 2023

Hmm yeah, actually this is already failing :P I had thought that this is only about reserved memory.

@bkchr
Copy link
Member Author

bkchr commented Oct 11, 2023

Changes the value. I probably should improve this even more and just let it block when we try to allocate more than 64 instances.

Copy link
Contributor

@koute koute left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, ultimately it would be best to have a low limit (we most likely don't need to support more parallel executor instances than the machine has hardware threads; otherwise that'll just lower the overall throughput and increase the risk of running out of memory) and have it block if we have too many instances running in parallel.

@bkchr bkchr requested review from a team and koute October 15, 2023 08:12
@@ -22,7 +22,7 @@ use std::str::FromStr;
/// Parameters used to config runtime.
#[derive(Debug, Clone, Args)]
pub struct RuntimeParams {
/// The size of the instances cache for each runtime. The values higher than 256 are illegal.
/// The size of the instances cache for each runtime. The values higher than 32 are illegal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. what about a public const for 32?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking about this as well. But then we would need to put this to some random place again or import another crate. So, I decided against this for now.

Comment on lines 126 to 131
if *counter + 1 < MAX_INSTANCE_COUNT {
*counter += 1;
} else {
self.wait_for_instance.wait(&mut counter);
*counter += 1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a possibility for a race condition here?

Consider the following scenario:

  • Thread A acquires the lock. Finds counter == MAX_INSTANCE_COUNT. Waits in the counter condvar.
  • Some other thread (in the drop above) decrements the counter value and calls notify_one (note that we release the lock before sending the notification)
  • Before thread A is woken up another thread B grabs the lock and the counter goes back to MAX_INSTANCE_COUNT.
  • Thread A now receives the notification and acquires the lock. Increments the counter to be equal to MAX_INSTANCE_COUNT + 1

Maye this is safer?

		while *counter >= MAX_INSTANCE_COUNT {
			self.wait_for_instance.wait(&mut counter);
		}
		*counter += 1;

Or maybe release the lock in the drop after the notification is sent. But I don't know if is guaranteed that no other thread (not one in the wait queue) can grab the lock (alter the counter and release the lock) before the notified one is started.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, actually, I think you are right here. Yeah, 👍 to just have a loop here.

Copy link
Contributor

@koute koute left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test would be nice. (:

bkchr and others added 3 commits October 22, 2023 00:42
Co-authored-by: Koute <koute@users.noreply.github.com>
Co-authored-by: Koute <koute@users.noreply.github.com>
@bkchr bkchr requested a review from davxy October 22, 2023 07:48
@bkchr bkchr merged commit e2b21d0 into master Oct 22, 2023
@bkchr bkchr deleted the bkchr-wasmtime-increase-instance-count branch October 22, 2023 08:44
bgallois pushed a commit to duniter/duniter-polkadot-sdk that referenced this pull request Mar 25, 2024
Changes the maximum instances count for `wasmtime` to `64`. It also
allows to only pass in maximum `32` for `--max-runtime-instances` as
`256` was way too big. With `64` instances in total and `32` that can be
configured in maximum, there should be enough space to accommodate for
extra instances that are may required to be allocated adhoc.

---------

Co-authored-by: Koute <koute@users.noreply.github.com>
bkchr pushed a commit that referenced this pull request Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T0-node This PR/Issue is related to the topic “node”.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants