Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two async ops resolving exactly at the same time = assert_eq failure when doing a sync op #9903

Closed
fasterthanlime opened this issue Mar 26, 2021 · 7 comments
Labels
bug Something isn't working correctly deno_core Changes in "deno_core" crate are needed

Comments

@fasterthanlime
Copy link

fasterthanlime commented Mar 26, 2021

This is a fun one!

I kept hitting this assertion:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`', vendor/deno/core/bindings.rs:416:3
stack backtrace:
   0: rust_begin_unwind
             at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/panicking.rs:493:5
   1: core::panicking::panic_fmt
             at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/core/src/panicking.rs:92:14
   2: deno_core::bindings::send
             at ./vendor/deno/core/bindings.rs:416:3
   3: core::ops::function::Fn::call
             at /home/vscode/.rustup/toolchains/1.50.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:70:5
   4: rusty_v8::function::<impl rusty_v8::support::MapFnFrom<F> for extern "C" fn(*const rusty_v8::function::FunctionCallbackInfo)>::mapping::{{closure}}
             at /home/vscode/.cargo/registry/src/github.com-1ecc6299db9ec823/rusty_v8-0.21.0/src/function.rs:280:7
   5: <extern "C" fn(A0) .> R as rusty_v8::support::CFnFrom<F>>::mapping::c_fn
             at /home/vscode/.cargo/registry/src/github.com-1ecc6299db9ec823/rusty_v8-0.21.0/src/support.rs:722:11
   6: _ZN2v88internal25FunctionCallbackArguments4CallENS0_15CallHandlerInfoE
   7: _ZN2v88internal12_GLOBAL__N_119HandleApiCallHelperILb0EEENS0_11MaybeHandleINS0_6ObjectEEEPNS0_7IsolateENS0_6HandleINS0_10HeapObjectEEESA_NS8_INS0_20FunctionTemplateInfoEEENS8_IS4_EENS0_16BuiltinArgumentsE
   8: _ZN2v88internalL26Builtin_Impl_HandleApiCallENS0_16BuiltinArgumentsEPNS0_7IsolateE
   9: _ZN2v88internal21Builtin_HandleApiCallEiPmPNS0_7IsolateE
  10: Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: failed to initiate panic, error 5
Aborted

And eventually investigated a bit.

This only seems to happen if:

  • Two async Ops resolve at the exact same time (between two polls of the JsRuntime)
  • A sync op is dispatched in the async callback for the first async op

By the time the sync op is dispatched, deno_core::bindings::send expect "everything to have been shifted off the shared queue" by core.js, but it hasn't been yet, because it's doing so in a loop:

// in `core.js`

  function handleAsyncMsgFromRust() {
    while (true) {
      const opIdBuf = shift();
      if (opIdBuf == null) {
        break;
      }
      assert(asyncHandlers[opIdBuf[0]] != null);
      // 👇 that handler (in my code) ends up dispatching a sync op
      // in the middle of this loop, while there a still `opIdBuf`s left
      // in the shared queue.
      asyncHandlers[opIdBuf[0]](opIdBuf[1], true);
    }

    for (let i = 0; i < arguments.length; i += 2) {
      asyncHandlers[arguments[i]](arguments[i + 1], false);
    }
  }

I was able to work around this issue by not calling any async handler until we're done collecting shift() (ie. until it returns null):

// in `core.js` (patched)

  function handleAsyncMsgFromRust() {
    let opIdBufs = [];

    // `shift()` as many messages as we can, collecting them into `opIdBufs`
    while (true) {
      const opIdBuf = shift();
      if (opIdBuf == null) {
        break;
      }
      assert(asyncHandlers[opIdBuf[0]] != null);
      opIdBufs.push(opIdBuf);
    }

    // _only then_ call them one by one.
    for (const opIdBuf of opIdBufs) {
      asyncHandlers[opIdBuf[0]](opIdBuf[1], true);
    }

    // (this is fine, it's only used if we run out of shared queue space)
    for (let i = 0; i < arguments.length; i += 2) {
      asyncHandlers[arguments[i]](arguments[i + 1], false);
    }
  }

Ideally this issue would be a PR with a clean reproduction (outside of my large codebase), but that's going to take me some time, in the meantime I'd be happy to know if I identified the right problem / if this seems like a good solution.


This is the full diff of my fix/workaround btw:

diff --git a/core/core.js b/core/core.js
index 2de8e1ff..a9af077a 100644
--- a/core/core.js
+++ b/core/core.js
@@ -198,12 +198,18 @@ SharedQueue Binary Layout
   }
 
   function handleAsyncMsgFromRust() {
+    let opIdBufs = [];
+
     while (true) {
       const opIdBuf = shift();
       if (opIdBuf == null) {
         break;
       }
       assert(asyncHandlers[opIdBuf[0]] != null);
+      opIdBufs.push(opIdBuf);
+    }
+
+    for (const opIdBuf of opIdBufs) {
       asyncHandlers[opIdBuf[0]](opIdBuf[1], true);
     }
@bartlomieju
Copy link
Member

Thanks for detailed problem description. You're fix looks alright, but it occurs to me that something was inadvertently broken most likely in #9457. Let's see if we need to revert that PR first before applying the fix. @inteon please take a look.

@bartlomieju bartlomieju added bug Something isn't working correctly deno_core Changes in "deno_core" crate are needed labels Mar 26, 2021
@AaronO
Copy link
Contributor

AaronO commented Mar 26, 2021

@bartlomieju @inteon The shared queue will very soon be dropped, which will fix this bug/assert check.

I remember thinking that assert might have been misplaced when reading the code, we can also just move it into the Op::Async and Op::AsyncUnref match branches

@inteon
Copy link
Contributor

inteon commented Mar 26, 2021

Thank you for the bug report.
I think

assert_eq!(state.shared.size(), 0);
should be removed. Not enforcing this assert will not cause any problems.

@AaronO
Copy link
Contributor

AaronO commented Mar 26, 2021

@bartlomieju @inteon Pushed a quickfix #9904

I don't think the quickfix is necessary, since I plan to submit a PR dropping the shared queue today or tomorrow.

But I'll let you guys decide on that.

@inteon
Copy link
Contributor

inteon commented Apr 5, 2021

@bartlomieju @fasterthanlime
Thanks to @AaronO's new serde_v8 changes (that also removed the shared queue) this issue should be fixed.
I think this issue can be closed.

@fasterthanlime
Copy link
Author

Can someone comment on this thread when a new version of deno_core that fixes this issue is available, so that anyone subscribed to this issue is notified? Thanks! 💙

@lucacasonato
Copy link
Member

A new version will be released on Tuesday 13th of April.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly deno_core Changes in "deno_core" crate are needed
Projects
None yet
Development

No branches or pull requests

5 participants