wasm trap: out of bounds memory access #10095

aurexav · 2021-10-25T03:52:46Z

Proposing failed: ClientImport(
	RuntimeApiError(
		Application(
			Execution(
				Other(
					"Wasm execution trapped:
					wasm trap: out of bounds memory access
					wasm backtrace:    
						0: 0x12ab38 - <unknown>!pallet_session::historical::ProvingTrie<T>::generate_for::h3b8e607c714d13ce
						1: 0x11734b - <unknown>!<(TupleElement0,TupleElement1) as frame_support::traits::hooks::OnInitialize<BlockNumber>>::on_initialize::h8592203255e3a184
						2: 0x10d0e0 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::initialize_block::h06d8223ffc8d1d3b
						3: 0x13901c - <unknown>!Core_initialize_block"
				)
			)
		)
	)
)

Happened on linux-gnu-nightly-2021-09-01. Downgrade to linux-gnu-nightly-2021-04-22 got fixed.
But still happened on windows-msvc-nightly-2021-04-22, and got fixed by upgrading to windows-msvc-nightly-2021-05-01.
I'm not sure if this is a rust issue or parity wasm issue.

Other projects also encounter this. cc @h4x3rotab

darwinia-network/darwinia-common#879
Phala-Network/phala-blockchain#542

The text was updated successfully, but these errors were encountered:

h4x3rotab · 2021-10-25T03:57:31Z

Maybe related: Phala-Network/phala-blockchain#542

bkchr · 2021-10-25T11:16:46Z

@AurevoirXavier how can we reproduce this?

aurexav · 2021-10-26T02:34:48Z

@AurevoirXavier how can we reproduce this?

I'm not sure what causes this. I can't give a minimal env to reproduce this.
Sorry.

git clone https://github.com/darwinia-network/darwinia-common
git checkout v2.6.7
// modify the session time to MINUTE / 2, otherwise you need to wait 10 minutes
// https://github.com/darwinia-network/darwinia-common/blob/77dde36f41/node/runtime/pangolin/src/constants/src/lib.rs#L27
cargo build --release
cargo run -- chain pangolin-dev --alice --tmp

At the end of the session, this error will pop up. And the chain get stalled.

koute · 2021-11-12T09:46:38Z

Found the problem. Our WASM instance fast reuse code is not clearing statics between calls.

Here's what exactly happens:

TrieDBMut::new from trie-db is called.
That calls HashSet::new() from hashbrown.
That calls RandomState::new() from ahash.
That initializes an OnceBox from once-cell with a Box<dyn Trait>.
On the first invocation the OnceBox gets initialized and everything works fine.
On the second invocation the WASM instance is reused so the OnceBox is still initialized, but now it points to memory that is technically uninitialized.

A workaround for this issue would be to either switch to an interpreted WASM runtime ~~or go into substrate/client/executor/src/wasm_runtime.rs and change fast_instance_reuse: true to fast_instance_reuse: false~~ (do not do it; that's broken).

@bkchr I know that the fast reuse codepath deliberately retains some things between calls (e.g. the linear memory itself); not sure if this behavior is intended, but if it is then in my opinion this is way too dangerous to leave unfixed.

bkchr · 2021-11-12T09:51:34Z

Okay. That doesn't sound good. Fast instance reuse should explicitly reset static variables.

Cc @pepyakin

bkchr · 2021-11-12T10:02:30Z

substrate/client/executor/src/integration_tests/mod.rs

Line 514 in 5d22361

fn returns_mutable_static(wasm_method: WasmExecutionMethod) {

This should test it

pepyakin · 2021-11-12T10:08:21Z

or the returns_mutable_static_bss. That one tests the zero case.

koute · 2021-11-12T10:33:27Z

Interesting; I'll check why the tests don't fail here.

pepyakin · 2021-11-12T10:50:00Z

FWIW, filled an issue that I meant to fill for some time #10244

TL;DR: Since wasmtime evolved quite some, we may not need to use fast_instance_reuse.

koute · 2021-11-16T07:09:27Z

Okay, it looks like the static clearing actually works, but it needs the (data $.bss ...) WASM section to work. For the test runtime that section is properly generated, while for the production runtime it isn't.

To get rustc to generate it the -C link-arg=--import-memory argument is necessary (otherwise it is implicitly assumed the WASM memory is always zero'd), and in substrate-wasm-builder it is actually passed. The build script for the production runtime explicitly calls import_memory:

WasmBuilder::new()
    .with_current_project()
    .export_heap_base()
    .import_memory()
    .build()

and the substrate-wasm-builder does set RUSTFLAGS with that argument:

pub fn import_memory(mut self) -> Self {
    self.rust_flags.push("-C link-arg=--import-memory".into());
    self
}
// ...
    build_cmd.args(&["rustc", "--target=wasm32-unknown-unknown"])
// ...
        .env("RUSTFLAGS", rustflags.clone())

But it doesn't work. Why? We're calling the substrate-wasm-builder from inside of a build script, which means that the host cargo exports an empty CARGO_ENCODED_RUSTFLAGS environment variable while the build script is running, which overrides the RUSTFLAGS we're setting inside of the build script. The cargo subprocess responsible for building the WASM runtime then ignores our RUSTFLAGS and uses the empty CARGO_ENCODED_RUSTFLAGS instead, so the -C link-arg=--import-memory is effectively not enabled, so there's no BSS section present, so the statics are not cleared when fast instance reuse is enabled.

The good news is that Basti has already inadvertently fixed the issue with the CARGO_ENCODED_RUSTFLAGS in #9637:

     // exclusive). The runner project is created in `CARGO_TARGET_DIR` and executing it will
     // create a sub target directory inside of `CARGO_TARGET_DIR`.
     .env_remove("CARGO_TARGET_DIR")
+    // As we are being called inside a build-script, this env variable is set. However, we set
+    // our own `RUSTFLAGS` and thus, we need to remove this. Otherwise cargo favors this
+    // env variable.
+    .env_remove("CARGO_ENCODED_RUSTFLAGS")
     // We don't want to call ourselves recursively
     .env(crate::SKIP_BUILD_ENV, "");

So with that PR merged the runtime is once again built with the necessary flag and is compatible with our current fast instance reuse code. (@AurevoirXavier @h4x3rotab FYI you'll probably want to at least backport this patch into your codebases and rebuild your runtimes.)

Anyway, since using the fast instance reuse codepath on any runtime built without the --import-memory flag is fundamentally broken and unsafe we should at least modify our WASM executor so that it checks whether the runtime was properly built and forcefully disable fast instance reuse if it is. @bkchr @pepyakin Sounds good? If it is I'll whip up a PR. (Yes, we might remove the fast instance reuse codepath anyway, but that'll probably take a little longer to evaluate/benchmark, while checking whether the memory is imported and disabling the instance reuse should be relatively simple.)

bkchr · 2021-11-16T08:06:46Z

The good news is that Basti has already inadvertently fixed the issue with the CARGO_ENCODED_RUSTFLAGS in #9637:

Now as you say this :D I have done this change there not inadvertently. If I remember correctly, the fast instance reuse test failed and I have fixed it...

Anyway, since using the fast instance reuse codepath on any runtime built without the --import-memory flag is fundamentally broken and unsafe we should at least modify our WASM executor so that it checks whether the runtime was properly built and forcefully disable fast instance reuse if it is

Sounds good to me. We should also make import-memory the default of wasm builder and not configurable.

aurexav · 2021-11-16T08:11:01Z

@koute Any other way to pass the import-memory to rustc.
Our dependencies still falling behind lots of commits.

bkchr · 2021-11-16T08:14:26Z

@AurevoirXavier you need this patch: 73acb2a

koute · 2021-11-16T08:14:35Z

@koute Any other way to pass the import-memory into rustc. Our dependencies still falling behind lots of commit.

You don't have to upgrade the whole substrate; just apply Basti's patch (which I've highlighted in my previous post - the single extra .env_remove("CARGO_ENCODED_RUSTFLAGS") in utils/wasm-builder/src/wasm_project.rs) and you should be good to go. (You can then rebuild your runtime and use wasm-dis on your .wasm runtime blob to make sure the (data $.bss ...) is present.)

[edit]Ninja'd by Basti (:[/edit]

aurexav · 2021-11-16T08:15:41Z

@AurevoirXavier you need this patch: 73acb2a

Aha, yes! I have just thought about that. Thanks!

koute · 2021-11-18T06:18:09Z

Just a PSA - I said that disabling fast instance reuse is one possible workaround for this too; do not disable it. There's a bug in wasmtime (or possibly in our code; I'm still investigating) which will crash the node after 10k runtime invocations if the instance reuse is disabled. Until I fix that instance reuse is absolutely necessary when using a compiled runtime.

pepyakin · 2021-11-18T11:30:17Z

10k instances? That sounds as if the pooling strategy (#10244) was already enabled... At least I vaguely remember this limit from there. Likely I am wrong

koute · 2021-11-18T11:36:49Z

10k instances? That sounds as if the pooling strategy (#10244) was already enabled... At least I vaguely remember this limit from there. Likely I am wrong

It's because we use the same Store when creating each new instance, which according to wasmtime docs:

A Store is intended to be a short-lived object in a program. No form of GC is implemented at this time so once an instance is created within a Store it will not be deallocated until the Store itself is dropped. This makes Store unsuitable for creating an unbounded number of instances in it because Store will never release this memory.

So this not only results in a memory leak, but also essentially breaks the executor after 10k invocations. I'm currently in the process of fixing that, and after that we can turn on the pooling (which requires us to actually drop the Store anyway).

bkchr assigned koute Nov 9, 2021

pepyakin mentioned this issue Nov 12, 2021

Investigate instance pooling strategy in wasmtime #10244

Closed

aurexav mentioned this issue Nov 16, 2021

Upgrade Priority HIGH - darwinia-v0.11.6 darwinia-network/darwinia#784

Closed

koute mentioned this issue Nov 17, 2021

Clear WASM linear memory on other OSes besides Linux too when using fast instance reuse #10291

Merged

paritytech-processbot bot closed this as completed in #10291 Nov 18, 2021

koute mentioned this issue Sep 29, 2022

Member Request polkadot-fellows/seeding#27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wasm trap: out of bounds memory access #10095

wasm trap: out of bounds memory access #10095

aurexav commented Oct 25, 2021 •

edited

Loading

h4x3rotab commented Oct 25, 2021

bkchr commented Oct 25, 2021

aurexav commented Oct 26, 2021

koute commented Nov 12, 2021 •

edited

Loading

bkchr commented Nov 12, 2021

bkchr commented Nov 12, 2021

pepyakin commented Nov 12, 2021

koute commented Nov 12, 2021

pepyakin commented Nov 12, 2021

koute commented Nov 16, 2021 •

edited

Loading

bkchr commented Nov 16, 2021

aurexav commented Nov 16, 2021 •

edited

Loading

bkchr commented Nov 16, 2021

koute commented Nov 16, 2021 •

edited

Loading

aurexav commented Nov 16, 2021

koute commented Nov 18, 2021 •

edited

Loading

pepyakin commented Nov 18, 2021

koute commented Nov 18, 2021

wasm trap: out of bounds memory access #10095

wasm trap: out of bounds memory access #10095

Comments

aurexav commented Oct 25, 2021 • edited Loading

h4x3rotab commented Oct 25, 2021

bkchr commented Oct 25, 2021

aurexav commented Oct 26, 2021

koute commented Nov 12, 2021 • edited Loading

bkchr commented Nov 12, 2021

bkchr commented Nov 12, 2021

pepyakin commented Nov 12, 2021

koute commented Nov 12, 2021

pepyakin commented Nov 12, 2021

koute commented Nov 16, 2021 • edited Loading

bkchr commented Nov 16, 2021

aurexav commented Nov 16, 2021 • edited Loading

bkchr commented Nov 16, 2021

koute commented Nov 16, 2021 • edited Loading

aurexav commented Nov 16, 2021

koute commented Nov 18, 2021 • edited Loading

pepyakin commented Nov 18, 2021

koute commented Nov 18, 2021

aurexav commented Oct 25, 2021 •

edited

Loading

koute commented Nov 12, 2021 •

edited

Loading

koute commented Nov 16, 2021 •

edited

Loading

aurexav commented Nov 16, 2021 •

edited

Loading

koute commented Nov 16, 2021 •

edited

Loading

koute commented Nov 18, 2021 •

edited

Loading