Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to resolve some flaky tests #23716

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mighty1231
Copy link

@mighty1231 mighty1231 commented Feb 21, 2025

test browser64.test_offset_converter

original output for the test

[unresponsive tests: 1]

Why unresponsive?

I don't know. Furthermore, I don't know what wasmOffsetConverter exactly is. But forcing temporal error in JS makes unresponsive.

void magic_test_function(void) {
  int result = EM_ASM_INT({
    function report(x) {
      out(x);
      fetch('http://localhost:8888?stdout=' + encodeURIComponent(x));
    }
    report('magic_test_function: input=' + $0);
-    var converted = wasmOffsetConverter.getName($0);
+    var converted = wasmOffsetConverter.__method_should_not_exist($0);
    report('magic_test_function: converted=' + converted);
    return converted == 'magic_test_function';
  }, get_pc());
  assert(result);
}

-> unresponsive. so, just set try catch on it

void magic_test_function(void) {
  int result = EM_ASM_INT({
    function report(x) {
      out(x);
      fetch('http://localhost:8888?stdout=' + encodeURIComponent(x));
    }
-    report('magic_test_function: input=' + $0);
-    var converted = wasmOffsetConverter.getName($0);
-    report('magic_test_function: converted=' + converted);
-    return converted == 'magic_test_function';
+    try {
+      report('magic_test_function: input=' + $0);
+      var converted = wasmOffsetConverter.getName($0);
+      report('magic_test_function: converted=' + converted);
+      return converted == 'magic_test_function';
+    } catch (e) {
+      report((e?.message ?? '(unknown message)') + ' ' + (e?.stack ?? '(unknown stack)'));
+      return false;
+    }
  }, get_pc());
  assert(result);
}

It gives log like this.

[client logging: /?stdout=magic_test_function: input=2240 ]
[client logging: /?stdout=wasmOffsetConverter.__method_should_not_exist is not a function TypeError: wasmOffsetConverter.__method_should_not_exist is not a function
    at 70112 (http://localhost:8888/test.js:1424:201)
    at runEmAsmFunction (http://localhost:8888/test.js:2415:30)
    at _emscripten_asm_const_int (http://localhost:8888/test.js:2424:14)
    at test.wasm.magic_test_function (http://localhost:8888/test.wasm:wasm-function[25]:0x8d5)
    at test.wasm.__original_main (http://localhost:8888/test.wasm:wasm-function[26]:0x96f)
    at test.wasm._main_thread (http://localhost:8888/test.wasm:wasm-function[30]:0xaff)
    at http://localhost:8888/test.js:2005:52
    at invokeEntryPoint (http://localhost:8888/test.js:2005:75)
    at handleMessage (http://localhost:8888/test.js:765:11) ]

If some error was on JS, this change should give more hint to resolve flakiness.

test wasm64.test_pthread_wait64_notify

file libeventloop.js

  emscripten_set_timeout_loop: (cb, msecs, userData) => {
    function tick() {
      var t = _emscripten_get_now();
      var n = t + msecs;
      {{{ runtimeKeepalivePop() }}}
      callUserCallback(() => {
        if ({{{ makeDynCall('idp', 'cb') }}}(t, userData)) {
          // Save a little bit of code space: modern browsers should treat
          // negative setTimeout as timeout of 0
          // (https://stackoverflow.com/questions/8430966/is-calling-settimeout-with-a-negative-delay-ok) 😇
          {{{ runtimeKeepalivePush() }}}
          setTimeout(tick, n - _emscripten_get_now()); // 😅 Math.max(n - _emscripten_get_now(), 0)
        }
      });
    }
    {{{ runtimeKeepalivePush() }}}
    return setTimeout(tick, 0);
  },

node+wasm64 generated following error, output from Circle CI here

@@ -1,5 +1,8 @@
 main: creating worker
 main: entering timeout loop to wait for wasm worker to run
+(node:27876) TimeoutNegativeWarning: -43.357421875 is a negative number.
+Timeout duration was set to 1.
+(Use `node --trace-warnings ...` to show where the warning was created)
 worker_main
 Waiting on address with unexpected value should return 'not-equal'
 Waiting on address with unexpected value should return 'not-equal' also if timeout==0
  1. emscripten_set_timeout_loop() sometimes call setTimeout with negative value.
  2. wasm64 tests require node version >= 24, instead node 20.
  3. node 23 introduces TimeoutNegativeWarning, which warns that case.
  4. node serves --disable-warning option on 21.3.0 fortunately.

To fix the problem, I changed common.py for every node>=23 calls in this PR.

Does it really resolve flakiness?

For this kind of flakiness(throwing TimeoutNegativeWarning), yes.

  1. change libeventloop.js before merge this PR.

              {{{ runtimeKeepalivePush() }}}
    -          setTimeout(tick, n - _emscripten_get_now());
    +          setTimeout(tick, -1);
            }
  2. run wasm64.test_pthread_wait64_notify

  3. I see the test fails, with following log

    @@ -1,5 +1,8 @@
     main: creating worker
     main: entering timeout loop to wait for wasm worker to run
    +(node:177219) TimeoutNegativeWarning: -1 is a negative number.
    +Timeout duration was set to 1.
    +(Use `node --trace-warnings ...` to show where the warning was created)
     worker_main
     Waiting on address with unexpected value should return 'not-equal'
  4. Apply this patch

  5. Now, test works, with setTimeout(tick, -1);

Possible alternative change

  1. change only test_pthread_wait64_notify
  @node_pthreads
  @no_wasm2js('https://github.com/WebAssembly/binaryen/issues/5991')
  def test_pthread_wait64_notify(self):
+    self.node_args += ['--disable-warning=TimeoutNegativeWarning']
    self.do_run_in_out_file_test('atomic/test_wait64_notify.c')
  1. change libeventloop.js -> It make bundle size bigger, even it has environment with browser, not only Node.
-          setTimeout(tick, n - _emscripten_get_now());
+          setTimeout(tick, Math.max(n - _emscripten_get_now(), 0));

Summary

  • Does it fix flakiness of browser64.test_offset_converter? No.
  • Does it help to debug flakiness of browser64.test_offset_converter? Maybe?
  • Does it fix flakiness of wasm64.test_pthread_wait64_notify? Maybe? But, if every failed test was due to TimeoutNegativeWarning, Yes.

test list

- browser64.test_offset_converter
- wasm64.test_pthread_wait64_notify

1. Sometimes test_offset_converter give
  unresponsiveness and the possibility for it is the error from JS section
2. TimeoutNegativeWarning emerges node >= 23 and
  wasm64 requires node >= 24. `emscripten_set_timeout_loop`
  may throw warning. Node serves [`--disable-warning`
  option on 21.3.0](https://nodejs.org/en/blog/release/v21.3.0)
  fortunately.
@mighty1231
Copy link
Author

test-other.test_codesize_hello_dylink failed. I think I did not introduce any related change.

log from CI

  seen wasm size: 8174 (expected: 9712) (delta: -1538), ratio to expected: 0.158361

======================================================================
FAIL [0.002s]: test_codesize_hello_dylink (test_other.other)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.8/unittest/case.py", line 60, in testPartExecutor
    yield
  File "/usr/lib/python3.8/unittest/case.py", line 676, in run
    self._callTestMethod(testMethod)
  File "/usr/lib/python3.8/unittest/case.py", line 633, in _callTestMethod
    method()
  File "/root/project/test/common.py", line 897, in resulting_test
    return func(self, *args)
  File "/root/project/test/test_other.py", line 9127, in test_codesize_hello
    self.run_codesize_test('hello_world.c', *args)
  File "/root/project/test/test_other.py", line 9017, in run_codesize_test
    self.check_expected_size_in_file('wasm', size_file, wasm_size)
  File "/root/project/test/common.py", line 1685, in check_expected_size_in_file
    self.assertLess(ratio, size_slack)
  File "/usr/lib/python3.8/unittest/case.py", line 1298, in assertLess
    self.fail(self._formatMessage(msg, standardMsg))
  File "/usr/lib/python3.8/unittest/case.py", line 753, in fail
    raise self.failureException(msg)
AssertionError: 0.15836079077429985 not less than 0.05

sbc100 added a commit to sbc100/emscripten that referenced this pull request Feb 21, 2025
sbc100 added a commit to sbc100/emscripten that referenced this pull request Feb 21, 2025
This restores the two different paths for calculating the remaining
milliseconds which was removed in emscripten-core#12122.

Inspired by emscripten-core#23716
@sbc100
Copy link
Collaborator

sbc100 commented Feb 21, 2025

Thanks for looking into these. I summited a PR to address the node warning: #23722

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants