Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pummel/test-crypto-timing-safe-equal-benchmarks failing #34073

Closed
richardlau opened this issue Jun 26, 2020 · 15 comments
Closed

pummel/test-crypto-timing-safe-equal-benchmarks failing #34073

richardlau opened this issue Jun 26, 2020 · 15 comments
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI.

Comments

@richardlau
Copy link
Member

  • Test: pummel/test-crypto-timing-safe-equal-benchmarks
  • Platform: Linux
  • Console Output:
08:05:52 not ok 66 pummel/test-crypto-timing-safe-equal-benchmarks
08:05:52   ---
08:05:52   duration_ms: 6.629
08:05:52   severity: fail
08:05:52   exitcode: 1
08:05:52   stack: |-
08:05:52     assert.js:385
08:05:52         throw err;
08:05:52         ^
08:05:52     
08:05:52     AssertionError [ERR_ASSERTION]: timingSafeEqual should not leak information from its execution time (t=7.814360810767208)
08:05:52         at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-custom-suites-freestyle/test/pummel/test-crypto-timing-safe-equal-benchmarks.js:109:1)
08:05:52         at Module._compile (internal/modules/cjs/loader.js:1217:30)
08:05:52         at Object.Module._extensions..js (internal/modules/cjs/loader.js:1238:10)
08:05:52         at Module.load (internal/modules/cjs/loader.js:1066:32)
08:05:52         at Function.Module._load (internal/modules/cjs/loader.js:954:14)
08:05:52         at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
08:05:52         at internal/main/run_main_module.js:17:47 {
08:05:52       generatedMessage: false,
08:05:52       code: 'ERR_ASSERTION',
08:05:52       actual: false,
08:05:52       expected: true,
08:05:52       operator: '=='
08:05:52     }
08:05:52   ...

Daily master builds appear to have been consistently broken (for at least as long as we have history (27 May, we only keep a month's worth of results)) due to this test.

Previous issues with this test:

@richardlau richardlau added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Jun 26, 2020
@addaleax
Copy link
Member

Fwiw, this has been discussed in the private repo for a bit since this is potentially security-relevant.

@Trott
Copy link
Member

Trott commented Jun 26, 2020

I bisected and it seems to be a result of a V8 update. Since we can't compile/test without most of the cherry-picks required for V8 updates, I was not able to identify a single commit. The first bad commit could be any of:

a489288
c3866a1
db0ed11
308900f
6b5ea2e
37abad4
55f7ae6
1d6adf7
9dfaf49
32be677

@Trott
Copy link
Member

Trott commented Jun 26, 2020

Some work done by @sam-github revealed that adding the patch below makes the test pass even though there's no reason it should change execution, in theory. Sam commented:

I suspect v8 of doing some kind of cacheing of strings passed to eval. It might even be a bug.

diff --git a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js 
index b649b071e1e..bbe93d238b9 100644                                                                                                                                                          --- a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
+++ b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
@@ -23,7 +23,7 @@ function runOneBenchmark(compareFunc, firstBufFill, secondBufFill, bufSize)
-    `);
+    `.replace(';', '#').replace('#', ';'));
}

@richardlau
Copy link
Member Author

Fwiw, this has been discussed in the private repo for a bit since this is potentially security-relevant.

Ah okay, I have no visibility of that. Given the state of daily master CI was discussed at the collaborator summit yesterday I was surprised no issue was open for it.

@addaleax
Copy link
Member

Yeah, I think it’s fine to discuss it openly since the CI failures are available publicly as well.

@Trott
Copy link
Member

Trott commented Jun 26, 2020

Fwiw, this has been discussed in the private repo for a bit since this is potentially security-relevant.

Ah okay, I have no visibility of that. Given the state of daily master CI was discussed at the collaborator summit yesterday I was surprised no issue was open for it.

We'd been talking about moving it to a public repo anyway because it does not seem to be security-relevant once people dug into it a bit. So, this is probably as good a way for that to happen as any. Plus, I think the results are public anyway. (I'm not sure if node-daily-master results is viewable by anyone with a GitHub account. I think it is.)

@richardlau
Copy link
Member Author

Some work done by @sam-github revealed that adding the patch below makes the test pass even though there's no reason it should change execution, in theory. Sam commented:

I suspect v8 of doing some kind of cacheing of strings passed to eval. It might even be a bug.

diff --git a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js 
index b649b071e1e..bbe93d238b9 100644                                                                                                                                                          --- a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
+++ b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
@@ -23,7 +23,7 @@ function runOneBenchmark(compareFunc, firstBufFill, secondBufFill, bufSize)
-    `);
+    `.replace(';', '#').replace('#', ';'));
}

FWIW I can reproduce this on a Linux system inside IBM, where it's flaky:

-bash-4.2$ tools/test.py --repeat 10 test/pummel/test-crypto-timing-safe-equal-benchmarks.js
=== release test-crypto-timing-safe-equal-benchmarks ===
Path: pummel/test-crypto-timing-safe-equal-benchmarks
assert.js:385
    throw err;
    ^

AssertionError [ERR_ASSERTION]: timingSafeEqual should not leak information from its execution time (t=-14.106192455658816)
    at Object.<anonymous> (/home/users/riclau/sandbox/github/nodejs/test/pummel/test-crypto-timing-safe-equal-benchmarks.js:109:1)
    at Module._compile (internal/modules/cjs/loader.js:1217:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1238:10)
    at Module.load (internal/modules/cjs/loader.js:1066:32)
    at Function.Module._load (internal/modules/cjs/loader.js:954:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
    at internal/main/run_main_module.js:17:47 {
  generatedMessage: false,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}
Command: out/Release/node /home/users/riclau/sandbox/github/nodejs/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
...
[00:40|% 100|+   2|-   8]: Done
-bash-4.2$

Applying the patch from @sam-github doesn't appear to help. Running --jitless does, but maybe that's just because it makes everything slower? Or maybe there is a JIT/optimization bug?

-bash-4.2$ git diff
diff --git a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
index b649b07..7930213 100644
--- a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
+++ b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
@@ -1,3 +1,4 @@
+// Flags: --jitless
 'use strict';
 const common = require('../common');
 if (!common.hasCrypto)
-bash-4.2$ tools/test.py --repeat 10 test/pummel/test-crypto-timing-safe-equal-benchmarks.js
[01:40|% 100|+  10|-   0]: Done
-bash-4.2$

@richardlau
Copy link
Member Author

Or this:

-bash-4.2$ git diff
diff --git a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
index b649b07..bc831db 100644
--- a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
+++ b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
@@ -1,3 +1,4 @@
+// Flags: --always_opt
 'use strict';
 const common = require('../common');
 if (!common.hasCrypto)
-bash-4.2$ tools/test.py --repeat 10 test/pummel/test-crypto-timing-safe-equal-benchmarks.js
[01:20|% 100|+  10|-   0]: Done
-bash-4.2$

(there's lots of --v8-options and I'll readily admit to not knowing what most of these do).

@psmarshall
Copy link
Contributor

We can try and rule in/out v8 changes. As far as I can tell, this was bisected to the v8 8.3 upgrade, which was upgraded from 8.1? Using the version numbers there, here's the git log between the two versions: https://chromium.googlesource.com/v8/v8/+log/8.1.307.31..8.3.110.9/?pretty=fuller&n=10000

One thing that would be helpful to know from someone who can reproduce this: why does the T-test fail? Is it that the first execution is slow, and then the rest are very fast? That would support the 'something gets cached incorrectly' theory.

@addaleax
Copy link
Member

I dug in a bit using V8 canary builds, since they provide a good way to bisect the combination of Node.js and V8 changes.

Fake git bisect log
canary20200222115d5495c2 [8.2.152-node.8]: good
canary2020032013d7e86e8c [8.3.35-node.8]: bad
canary20200303535dc9b07d [8.2.237-node.8]: bad
canary202002261c95df47d3 [8.2.184-node.8]: good
canary2020030169bd6793e5 [8.2.222-node.8]: bad
canary20200228c250e83364 [8.2.211-node.8]: good
canary20200229bc7ca2b453 [8.2.220-node.8]: good

That means the causing commit is one of https://chromium.googlesource.com/v8/v8/+log/8.2.220..8.2.222/?pretty=fuller&n=10000. v8/v8@a4c1408 is the only one that sticks out to me. Unfortunately, reverting that patch does not apply cleanly on top of our 8.3 upgrade PR, so I’m not able to confirm.

However! This appears something that has been fixed in one of the later V8 versions:

Fake git bisect log
canary2020032013d7e86e8c [8.3.35-node.8]: bad
canary202004229479c2ddd5 [8.4.157-node.8]: good
canary20200405a0ba79f8ed [8.4.0-node.8 (candidate)]: bad
canary20200413e68a631861 [8.4.49-node.8]: bad
canary20200418c06b589ced [8.4.131-node.8]: bad
canary202004192943fd49c0 [8.4.133-node.8]: bad
canary20200420e3ff8005c4 [8.4.134-node.8]: bad

Iiuc, that means that the fix is one of https://chromium.googlesource.com/v8/v8/+log/8.4.134..8.4.157/?pretty=fuller&n=10000. Unfortunately, that’s a lot of commits, and none of them particularly stand out to me as possibly related to this test or possibly related to one of the commits that have potentially caused this.

(This is all based on repeated runs of the test with up to 30 iterations. That’s not a lot, so there’s always a chance that I went wrong here somewhere.)

One thing that would be helpful to know from someone who can reproduce this: why does the T-test fail? Is it that the first execution is slow, and then the rest are very fast? That would support the 'something gets cached incorrectly' theory.

@psmarshall My understanding is that the code in the test is supposed to remove outliers, so that’s probably not it.

@psmarshall
Copy link
Contributor

The fix might have been v8/v8@8c68536 which is in the fix range @addaleax provided. That would suggest the issue is sensitive to inlining. You could try configuring v8 with --no-turbo-inlining to turn it off and see if that fixes it even in the broken range?

@addaleax
Copy link
Member

@psmarshall Yes, --no-turbo-inlining makes the test pass consistently.

My understanding is that v8/v8@8c68536 is an optimization, so even if it might do the right thing for us here, it still means that something about the test isn’t quite right and should be fixed on our side?

@addaleax
Copy link
Member

So … I’m still trying to understand this. The test is very carefully written to avoid the influence of JS engine modifications, so I still don’t fully understand why they might influence this test one way or another.

One thing I’ve noticed is that this flakiness goes away when testing the binding function directly instead of the JS function, so one thing I will try is to move the typechecking that we do into C++ and see if that helps. It’s probably a good idea anyway for this particular case (and imo other cases as well but that’s way out of scope here).

addaleax added a commit to addaleax/node that referenced this issue Jul 1, 2020
This makes the function more robust against V8 inlining.

Fixes: nodejs#34073
@psmarshall
Copy link
Contributor

Running with --trace-opt --trace-deopt --trace-turbo-inlining should give a full picture of what got optimized when and how the inlining happened too. Maybe that gives more precise clues about what's going wrong.

The crypto.timingSafeEqual function is a bound C++ function so inlining shouldn't matter.... but it seems something with inlining is going on.

@psmarshall
Copy link
Contributor

Oh just saw your PR @addaleax - that JS wrapper before the C++ function would do it for sure

@jasnell jasnell closed this as completed in 1d7be32 Jul 3, 2020
MylesBorins pushed a commit that referenced this issue Jul 14, 2020
This makes the function more robust against V8 inlining.

Fixes: #34073

PR-URL: #34141
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Ujjwal Sharma <ryzokuken@disroot.org>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Zeyu Yang <himself65@outlook.com>
MylesBorins pushed a commit that referenced this issue Jul 16, 2020
This makes the function more robust against V8 inlining.

Fixes: #34073

PR-URL: #34141
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Ujjwal Sharma <ryzokuken@disroot.org>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Zeyu Yang <himself65@outlook.com>
addaleax added a commit that referenced this issue Sep 22, 2020
This makes the function more robust against V8 inlining.

Fixes: #34073

PR-URL: #34141
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Ujjwal Sharma <ryzokuken@disroot.org>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Zeyu Yang <himself65@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants