Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix flaky async-hooks/test-zlib.zlib-binding.deflate #21077

Closed
wants to merge 1 commit into from

Conversation

addaleax
Copy link
Member

@addaleax addaleax commented Jun 1, 2018

Previously, the typed arrays used in this test would not
automatically be kept alive by the native handle when
it’s using them, so the V8 garbage collector could collect
them while they are still in use by the zlib module,
leading to memory corruption.

Fixes: #20907

Fyi @apapirovski @joyeecheung @Trott @Fishrock123 @BridgeAR @mcollina ;)

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added async_hooks Issues and PRs related to the async hooks subsystem. test Issues and PRs related to the tests. labels Jun 1, 2018
@addaleax addaleax added the zlib Issues and PRs related to the zlib subsystem. label Jun 1, 2018
@addaleax
Copy link
Member Author

addaleax commented Jun 1, 2018

Copy link
Member

@apapirovski apapirovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅 I love that this is the fix.

Copy link
Contributor

@Fishrock123 Fishrock123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, unbelievable. What a debug story this was.

https://twitter.com/addaleax/status/1002279690445484033

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@addaleax
Copy link
Member Author

addaleax commented Jun 1, 2018

Might want to fast-track this? Feel free to 👍 this comment if you think so

@addaleax addaleax added the fast-track PRs that do not need to wait for 48 hours to land. label Jun 1, 2018
@hashseed
Copy link
Member

hashseed commented Jun 1, 2018

For posterity, do you mind adding a more detailed explanation to this? Including what you used to debug?

Previously, the typed arrays used in this test would not
automatically be kept alive by the native handle when
it’s using them, so the V8 garbage collector could collect
them while they are still in use by the zlib module,
leading to memory corruption.

Fixes: nodejs#20907
@addaleax
Copy link
Member Author

addaleax commented Jun 1, 2018

@hashseed Thanks for the reminder :) I’ve updated the PR description/commit message a bit, let me know how that sounds to you

@apapirovski
Copy link
Member

@addaleax I'm super happy with the commit message on my end for sure but if you don't mind explaining how you finally found it / got it to fail with valgrind, I think that would be really great? Just for the closure for all of us who observed... 😄

@hashseed
Copy link
Member

hashseed commented Jun 1, 2018

Tbh I don't actually see why this test fails. I can only guess that the zlib binding somehow doesn't keep the backing store of the incoming buffer around, but does the correct thing if the argument is a TypedArray? That sounds like a bug in zlib to me though.

@addaleax
Copy link
Member Author

addaleax commented Jun 1, 2018

@hashseed It doesn’t keep the backing store around either way, and relies on the JS side of things to do that (but this test didn’t do that before). It doesn’t really treat Buffers different from other kinds of TypedArrays…

I assume the intention for that original behaviour here was not having to track more Persistents in C++ land. I have no idea whether that’s reasonable or not, tbh.

Copy link
Member

@joyeecheung joyeecheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work that went into this!

(We have something similar to this fix in fs.write, this pattern just looks like a footgun to me...can we do something about this? Refactor how the bindings write stuff?)

@addaleax
Copy link
Member Author

addaleax commented Jun 1, 2018

@apapirovski I don’t really know either! It just Worked™ once when running it with valgrind and redirecting output to a file, it contained a pretty good indication of this use-after-free behaviour…

@hashseed
Copy link
Member

hashseed commented Jun 1, 2018

Oh I see.

So the issue is that Init takes a Uint32Array, but only keeps a raw uint32_t pointer to the backing store. Even when the backing store doesn't move, it may still be collected if the Uint32Array dies. That then may or may not cause a segfault, depending on whether the OS has unmapped the backing memory area.

Can we expect users to know about this detail or can we expect actual uses of this API to always use the write result, so we would not run into this issue in the wild?

@apapirovski
Copy link
Member

apapirovski commented Jun 1, 2018

Can we expect users to know about this detail or can we expect actual uses of this API to always use the write result, so we would not run into this issue in the wild?

This tests internals, our zlib code actually doesn't behave in this way so it was just a flawed implementation of the C++ binding.

(This test is specifically for the async hooks in relation to the binding hence using it directly.)

@Trott
Copy link
Member

Trott commented Jun 1, 2018

Sole CI failure is a known-flaky being worked on.

Re-run: https://ci.nodejs.org/job/node-test-commit-linux-containered/4864/

@nodejs nodejs deleted a comment from refack Jun 1, 2018
@Trott
Copy link
Member

Trott commented Jun 1, 2018

@addaleax
Copy link
Member Author

addaleax commented Jun 2, 2018

Landed in 2efe4c2

@addaleax addaleax closed this Jun 2, 2018
@addaleax addaleax deleted the zlib-binding-deflate branch June 2, 2018 14:08
addaleax added a commit that referenced this pull request Jun 2, 2018
Previously, the typed arrays used in this test would not
automatically be kept alive by the native handle when
it’s using them, so the V8 garbage collector could collect
them while they are still in use by the zlib module,
leading to memory corruption.

Fixes: #20907

PR-URL: #21077
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Yang Guo <yangguo@chromium.org>
Reviewed-By: James M Snell <jasnell@gmail.com>
@apapirovski apapirovski mentioned this pull request Jun 3, 2018
2 tasks
MylesBorins pushed a commit that referenced this pull request Jun 6, 2018
Previously, the typed arrays used in this test would not
automatically be kept alive by the native handle when
it’s using them, so the V8 garbage collector could collect
them while they are still in use by the zlib module,
leading to memory corruption.

Fixes: #20907

PR-URL: #21077
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Yang Guo <yangguo@chromium.org>
Reviewed-By: James M Snell <jasnell@gmail.com>
@MylesBorins MylesBorins mentioned this pull request Jun 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async_hooks Issues and PRs related to the async hooks subsystem. fast-track PRs that do not need to wait for 48 hours to land. test Issues and PRs related to the tests. zlib Issues and PRs related to the zlib subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky or bugged async-hooks/test-zlib.zlib-binding.deflate
10 participants