Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(runtime): upgrade to Wasmer 0.17 and nightly 2020-05-15 #2668

Merged
merged 1 commit into from
May 21, 2020

Conversation

olonho
Copy link
Contributor

@olonho olonho commented May 18, 2020

Used https://crates.io/crates/rust-latest to find latest suitable nightly.

Fixes #2055

Test plan

cargo test --all

@gitpod-io
Copy link

gitpod-io bot commented May 18, 2020

Copy link
Contributor

@lexfrl lexfrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This potentially changes the protocol. @olonho if you are certain it doesn't change anything on the protocol level, i.e, any two contracts compiled using the two different wasmer version will have the exact same output, then no need to do anything. Otherwise please update protocol version.

@lexfrl
Copy link
Contributor

lexfrl commented May 18, 2020

Yes, actually it changes protocol since this is not implemented yet..

It's defined here https://github.com/nearprotocol/nearcore/blob/0456008492e57612e44b994aa75519a03606874c/core/chain-configs/src/lib.rs#L10

@olonho
Copy link
Contributor Author

olonho commented May 18, 2020

@bowenwang1996 @frol for me cargo test -p neard --test rpc_nodes gives strange problem.

@olonho
Copy link
Contributor Author

olonho commented May 18, 2020

And before that

May 18 18:07:40.616  INFO actix_server::builder: Starting "actix-web-service-0.0.0.0:63411" service on 0.0.0.0:63411    
May 18 18:07:40.617  INFO stats: Server listening at ed25519:82M8LNM7AzJHhHKn6hymVW1jBzSwFukHp1dycVcU7MD@0.0.0.0:63404
May 18 18:07:40.618  INFO stats: Server listening at ed25519:CTVkQMjLyr4QzoXrTDVzfCUp95sCJPwLJZ34JTiekxMV@0.0.0.0:63406
May 18 18:07:40.619 ERROR actix_server::signals: Can not initialize stream handler for Hup err: Too many open files (os error 24)    
May 18 18:07:40.619 ERROR actix_server::signals: Can not initialize stream handler for Term err: Too many open files (os error 24)    
May 18 18:07:40.619 ERROR actix_server::signals: Can not initialize stream handler for Quit err: Too many open files (os error 24)    
May 18 18:07:40.620 ERROR actix_server::signals: Can not initialize stream handler for Int err: Too many open files (os error 24)    
May 18 18:07:40.620 ERROR actix_server::signals: Can not initialize stream handler for Hup err: Too many open files (os error 24)    
May 18 18:07:40.620 ERROR actix_server::signals: Can not initialize stream handler for Term err: Too many open files (os error 24)    
May 18 18:07:40.620 ERROR actix_server::signals: Can not initialize stream handler for Quit err: Too many open files (os error 24)    
test test_rpc_routing ... FAILED

@olonho
Copy link
Contributor Author

olonho commented May 18, 2020

(lldb) bt
error: need to add support for DW_TAG_base_type '()' encoded with DW_ATE = 0x7, bit_size = 0
* thread #2, name = 'test_rpc_routing', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x000000010271aab7 rpc_nodes-6c83cfbda1bcd90b`std::panicking::rust_panic_with_hook::hf8b9378dd2e7986a at panicking.rs:0 [opt]
    frame #1: 0x000000010271a732 rpc_nodes-6c83cfbda1bcd90b`rust_begin_unwind at panicking.rs:385:5 [opt]
    frame #2: 0x000000010276e8bf rpc_nodes-6c83cfbda1bcd90b`core::panicking::panic_fmt::hab6ef1464e9720aa at panicking.rs:89:14 [opt]
    frame #3: 0x000000010276e7c5 rpc_nodes-6c83cfbda1bcd90b`core::option::expect_none_failed::h4732b42c308e77b2 at option.rs:1272:5 [opt]
    frame #4: 0x000000010270dd70 rpc_nodes-6c83cfbda1bcd90b`std::io::stdio::set_panic::h4b7f8e824a63b1ff [inlined] core::result::Result$LT$T$C$E$GT$::expect::h8122e5d5a4c91422 at result.rs:963:23 [opt]
    frame #5: 0x000000010270dd4d rpc_nodes-6c83cfbda1bcd90b`std::io::stdio::set_panic::h4b7f8e824a63b1ff [inlined] std::thread::local::LocalKey$LT$T$GT$::with::h9ce19569fd942870 at local.rs:239 [opt]
    frame #6: 0x000000010270dd08 rpc_nodes-6c83cfbda1bcd90b`std::io::stdio::set_panic::h4b7f8e824a63b1ff at stdio.rs:815 [opt]
    frame #7: 0x000000010271a24c rpc_nodes-6c83cfbda1bcd90b`std::panicking::default_hook::hc8a20a2d2e3a3021 at panicking.rs:212:30 [opt]
    frame #8: 0x00000001008b2b77 rpc_nodes-6c83cfbda1bcd90b`_$LT$alloc..boxed..Box$LT$F$GT$$u20$as$u20$core..ops..function..Fn$LT$A$GT$$GT$::call::hd169e4acfddafc58(self=0x00000001077047d0, args=(&core::panic::PanicInfo) @ 0x000070000ec95190) at boxed.rs:1048:9
    frame #9: 0x00000001008b2f50 rpc_nodes-6c83cfbda1bcd90b`near_actix_utils::init_stop_on_panic::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h4ef86e3f8402fc99(info=0x000070000ec951f8) at lib.rs:10:13
    frame #10: 0x000000010271ab86 rpc_nodes-6c83cfbda1bcd90b`std::panicking::rust_panic_with_hook::hf8b9378dd2e7986a at panicking.rs:481:17 [opt]
    frame #11: 0x000000010271a732 rpc_nodes-6c83cfbda1bcd90b`rust_begin_unwind at panicking.rs:385:5 [opt]
    frame #12: 0x000000010276e8bf rpc_nodes-6c83cfbda1bcd90b`core::panicking::panic_fmt::hab6ef1464e9720aa at panicking.rs:89:14 [opt]
    frame #13: 0x000000010276e7c5 rpc_nodes-6c83cfbda1bcd90b`core::option::expect_none_failed::h4732b42c308e77b2 at option.rs:1272:5 [opt]
    frame #14: 0x000000010270dd70 rpc_nodes-6c83cfbda1bcd90b`std::io::stdio::set_panic::h4b7f8e824a63b1ff [inlined] core::result::Result$LT$T$C$E$GT$::expect::h8122e5d5a4c91422 at result.rs:963:23 [opt]
    frame #15: 0x000000010270dd4d rpc_nodes-6c83cfbda1bcd90b`std::io::stdio::set_panic::h4b7f8e824a63b1ff [inlined] std::thread::local::LocalKey$LT$T$GT$::with::h9ce19569fd942870 at local.rs:239 [opt]
    frame #16: 0x000000010270dd08 rpc_nodes-6c83cfbda1bcd90b`std::io::stdio::set_panic::h4b7f8e824a63b1ff at stdio.rs:815 [opt]
    frame #17: 0x000000010271a24c rpc_nodes-6c83cfbda1bcd90b`std::panicking::default_hook::hc8a20a2d2e3a3021 at panicking.rs:212:30 [opt]
    frame #18: 0x00000001008b2b77 rpc_nodes-6c83cfbda1bcd90b`_$LT$alloc..boxed..Box$LT$F$GT$$u20$as$u20$core..ops..function..Fn$LT$A$GT$$GT$::call::hd169e4acfddafc58(self=0x00000001077047d0, args=(&core::panic::PanicInfo) @ 0x000070000ec95470) at boxed.rs:1048:9
    frame #19: 0x00000001008b2f50 rpc_nodes-6c83cfbda1bcd90b`near_actix_utils::init_stop_on_panic::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h4ef86e3f8402fc99(info=0x000070000ec954d8) at lib.rs:10:13
    frame #20: 0x000000010271ab86 rpc_nodes-6c83cfbda1bcd90b`std::panicking::rust_panic_with_hook::hf8b9378dd2e7986a at panicking.rs:481:17 [opt]
    frame #21: 0x000000010271a732 rpc_nodes-6c83cfbda1bcd90b`rust_begin_unwind at panicking.rs:385:5 [opt]
    frame #22: 0x000000010276e8bf rpc_nodes-6c83cfbda1bcd90b`core::panicking::panic_fmt::hab6ef1464e9720aa at panicking.rs:89:14 [opt]
    frame #23: 0x000000010276e74a rpc_nodes-6c83cfbda1bcd90b`core::option::expect_failed::h68bd601d867bb8d1 at option.rs:1264:5 [opt]
    frame #24: 0x000000010268eeb9 rpc_nodes-6c83cfbda1bcd90b`core::option::Option$LT$T$GT$::expect::hb23010c09b1d3c1a(self=Option<tokio::time::driver::handle::Handle> @ 0x000070000ec95630, msg=(data_ptr = "there is no timer running, must be called from the context of Tokio runtime/Users/igotti/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.18/src/time/driver/handle.rs", length = 75)) at option.rs:349:21
    frame #25: 0x000000010264ded0 rpc_nodes-6c83cfbda1bcd90b`tokio::time::driver::handle::Handle::current::hab789195a5356fa4 at handle.rs:24:9
    frame #26: 0x000000010263ceb3 rpc_nodes-6c83cfbda1bcd90b`tokio::time::driver::registration::Registration::new::h661a732d876660f1(deadline=Instant @ 0x000070000ec956c8, duration=(secs = 0, nanos = 0)) at registration.rs:18:22
    frame #27: 0x0000000102678a73 rpc_nodes-6c83cfbda1bcd90b`tokio::time::delay::delay_until::h4828abc8fd1372ea(deadline=Instant @ 0x000070000ec95730) at delay.rs:19:24
    frame #28: 0x0000000102678acc rpc_nodes-6c83cfbda1bcd90b`tokio::time::delay::delay_for::h32c7aada38a30074(duration=(secs = 0, nanos = 10000000)) at delay.rs:37:5
    frame #29: 0x0000000100c3246a rpc_nodes-6c83cfbda1bcd90b`actix::utils::TimerFunc$LT$A$GT$::new::h15963d8af16733a9(timeout=(secs = 0, nanos = 10000000), f=closure-0 @ 0x000070000ec957f8) at utils.rs:102:22
    frame #30: 0x0000000100c3d2a3 rpc_nodes-6c83cfbda1bcd90b`actix::actor::AsyncContext::run_later::h584b57e1699290b0(self=0x0000000109191400, dur=(secs = 0, nanos = 10000000), f=closure-0 @ 0x000070000ec95868) at actor.rs:453:20
    frame #31: 0x0000000100c28719 rpc_nodes-6c83cfbda1bcd90b`near_client::client_actor::ClientActor::start_sync::hfe47ca401ecceee7(self=0x00000001091914c0, ctx=0x0000000109191400) at client_actor.rs:898:13
    frame #32: 0x0000000100c1f32d rpc_nodes-6c83cfbda1bcd90b`_$LT$near_client..client_actor..ClientActor$u20$as$u20$actix..actor..Actor$GT$::started::h803a71af14b15ced(self=0x00000001091914c0, ctx=0x0000000109191400) at client_actor.rs:153:9
    frame #33: 0x0000000100c43fcc rpc_nodes-6c83cfbda1bcd90b`_$LT$actix..contextimpl..ContextFut$LT$A$C$C$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h75c76049db077a67(self=Pin<&mut actix::contextimpl::ContextFut<near_client::client_actor::ClientActor, actix::context::Context<near_client::client_actor::ClientActor>>> @ 0x000070000ec95b08, cx=0x000070000ec95bc8) at contextimpl.rs:348:13
    frame #34: 0x0000000100bcbc18 rpc_nodes-6c83cfbda1bcd90b`_$LT$actix..contextimpl..ContextFut$LT$A$C$C$GT$$u20$as$u20$core..ops..drop..Drop$GT$::drop::h89da126a54a20b16(self=0x0000000109191400) at contextimpl.rs:228:21
    frame #35: 0x0000000100bc1a55 rpc_nodes-6c83cfbda1bcd90b`core::ptr::drop_in_place::h49aa5f04a01c996d((null)=0x0000000109191400) at mod.rs:178:1
    frame #36: 0x00000001026ff41f rpc_nodes-6c83cfbda1bcd90b`core::ptr::drop_in_place::h3adeb01366ba9429((null)=0x000000010a5f7600) at mod.rs:178:1
    frame #37: 0x00000001026ff3f1 rpc_nodes-6c83cfbda1bcd90b`core::ptr::drop_in_place::h21831658ee5eb025((null)=0x000000010a5f7600) at mod.rs:178:1
    frame #38: 0x00000001025d2a6c rpc_nodes-6c83cfbda1bcd90b`core::ptr::drop_in_place::h11862f49669379bb((null)=*mut [core::pin::Pin<alloc::boxed::Box<Future>>] @ 0x000070000ec95ce0) at mod.rs:178:1
    frame #39: 0x00000001025af29f rpc_nodes-6c83cfbda1bcd90b`_$LT$alloc..vec..Vec$LT$T$GT$$u20$as$u20$core..ops..drop..Drop$GT$::drop::h13aeed1748fa459d(self=0x000070000ec95e50) at vec.rs:2384:13
    frame #40: 0x00000001025d2c85 rpc_nodes-6c83cfbda1bcd90b`core::ptr::drop_in_place::h1c08e5d943ce898b((null)=0x000070000ec95e50) at mod.rs:178:1
    frame #41: 0x00000001025d2b91 rpc_nodes-6c83cfbda1bcd90b`core::ptr::drop_in_place::h165f5c26bfe337a7((null)=0x000070000ec95e50) at mod.rs:178:1
    frame #42: 0x00000001025d36a8 rpc_nodes-6c83cfbda1bcd90b`core::ptr::drop_in_place::h60c6ee0c12777cf7((null)=0x000070000ec95e48) at mod.rs:178:1
    frame #43: 0x00000001025d37ae rpc_nodes-6c83cfbda1bcd90b`core::ptr::drop_in_place::h6aadef4af554f39e((null)=0x000070000ec95e40) at mod.rs:178:1
    frame #44: 0x00000001025d16c9 rpc_nodes-6c83cfbda1bcd90b`core::mem::drop::h78895fbd6fc750b4(_x=<unavailable>) at mod.rs:873:24
    frame #45: 0x00000001025d84dd rpc_nodes-6c83cfbda1bcd90b`std::thread::local::fast::destroy_value::h099c05168a2dbf8c(ptr="") at local.rs:460:9
    frame #46: 0x000000010271cfcc rpc_nodes-6c83cfbda1bcd90b`std::sys::unix::fast_thread_local::register_dtor::run_dtors::h5113f68dc046930e at fast_thread_local.rs:86:17 [opt]
    frame #47: 0x00007fff6deb6871 libdyld.dylib`tlv_finalize_list + 51
    frame #48: 0x00007fff6e0ba009 libsystem_pthread.dylib`_pthread_tsd_cleanup + 476
    frame #49: 0x00007fff6e0bc512 libsystem_pthread.dylib`_pthread_exit + 70
    frame #50: 0x00007fff6e0bc114 libsystem_pthread.dylib`_pthread_start + 159
    frame #51: 0x00007fff6e0b7b8b libsystem_pthread.dylib`thread_start + 15

@frol
Copy link
Collaborator

frol commented May 18, 2020

Well, the root cause of the explosion is:

ERROR actix_server::signals: ... Too many open files (os error 24)    

It seems that the machine run out of file descriptors. The questions here are:

  • Is it related to the Wasmer upgrade?
  • Is it a leak or just "normal" amount of file descriptors?

Postmortem questions:

  • Why did our panic handler panicked? /cc @ailisp
  • Can we do better? /cc @ailisp

@olonho
Copy link
Contributor Author

olonho commented May 18, 2020

Same problem happens on another machine, so doesn't look like a fluke.

@maxzaver
Copy link
Contributor

maxzaver commented May 18, 2020

This potentially changes the protocol. @olonho if you are certain it doesn't change anything on the protocol level, i.e, any two contracts compiled using the two different wasmer version will have the exact same output, then no need to do anything. Otherwise please update protocol version.

This changes the protocol, because it changes the error semantics.

RuntimeError::Trap { msg: _ } => VMError::FunctionCallError(WasmUnknownError),
RuntimeError::Error { data } => {
RuntimeError::InvokeError(invoke_error) => match invoke_error {
InvokeError::FailedWithNoError => VMError::FunctionCallError(WasmUnknownError),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be super helpful if we could comment on each of these arms explaining in what cases they occur so it is clear from reading the code that we exhaustively handle all possible deterministic and non-deterministic cases correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try to document it per my understanding, but contract/docs on the Wasmer side would be cool.

@bowenwang1996
Copy link
Collaborator

@olonho it is a known issue that we use a lot of file descriptors because tests are run in parallel. Try to do something like ulimit -n 1024 to get around it

@olonho
Copy link
Contributor Author

olonho commented May 18, 2020

Indeed, raising ulimit -n to 1024 fixes the issue.

@frol
Copy link
Collaborator

frol commented May 20, 2020

@olonho Is it waiting for more comments to the error branches or anything else is needed?

nearprotocol-bulldozer bot pushed a commit that referenced this pull request May 20, 2020
In preparation for publishing nearcore crates, I discovered some unnecessary dependencies.

[cargo-udeps](https://github.com/est31/cargo-udeps) helped to indentify all the unused dependencies:

```
$ cargo +nightly udeps --workspace --all-targets --all-features --bins --tests --benches --examples
```

P.S. I had to use a newer rustc (a74d1862d 2020-05-14 worked for me)

UPD: I would like to integrate `cargo udeps` into CI, but that will require Rustc version bump (potentially happening in #2668) /cc @ailisp
@olonho
Copy link
Contributor Author

olonho commented May 20, 2020

Will add comments and raise protocol version, and we shall be all set.

Comment on lines 17 to 18
wasmer-runtime = { version = "0.17", features = ["default-backend-singlepass"], default-features = false }
wasmer-runtime-core = { version = "0.17" }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pin the version to exact number: =0.17.0

@olonho olonho requested a review from SkidanovAlex as a code owner May 20, 2020 20:19
Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olonho you need to add an empty migration script to update genesis

Copy link
Contributor

@maxzaver maxzaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

// we weren't expecting or that we do not handle.
// As of 0.17.0, thrown only from Cranelift BE.
InvokeError::UnknownTrapCode { trap_code: _, srcloc: _ } => {
panic!("Impossible UnknownTrapCode error");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be helpful to add trap_code and srcloc to the panic message, just in case we mess up and have cranelift or llvm running. Same for other panics above.

RuntimeError::Trap { msg: _ } => VMError::FunctionCallError(WasmUnknownError),
RuntimeError::Error { data } => {
RuntimeError::InvokeError(invoke_error) => match invoke_error {
// Indicates an exceptional circumstance such as a bug in Wasmer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanations are super useful! Thank you so much!

@olonho olonho force-pushed the wasmer-0.17 branch 2 times, most recently from d03209d to af1b4d9 Compare May 21, 2020 07:25
@olonho olonho force-pushed the wasmer-0.17 branch 3 times, most recently from 93bca69 to 2c2879f Compare May 21, 2020 09:25
Used https://crates.io/crates/rust-latest to find latest suitable nightly.

Fixes #2055

Test plan
---------
cargo test --all
@olonho olonho merged commit 75a4422 into master May 21, 2020
@olonho olonho deleted the wasmer-0.17 branch May 21, 2020 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade Wasmer to 0.17.0 and upgrade downcasting of errors
5 participants