-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky linking_interrupted test. #8162
Conversation
r? @Eh2406 (rust_highfive has picked a reviewer for you, use r? to override) |
I ran a bunch of tests on azure, and locally, and it seems to be fairly stable. I'm a little unhappy with the call to sleep, but it is difficult to know when rustc has completely exited. I tried a rustc wrapper, but sometimes it was killed on Windows. I can put the rustc wrapper in, but I felt like the test was getting overly complicated, where a sleep solves the problem almost always. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find!
tests/testsuite/freshness.rs
Outdated
// killed. This write will tell them to exit, pretending that they died | ||
// before finishing. Ignore the error, because (sometimes?) on Windows | ||
// everything is already killed. | ||
let _ = conn.write(b"X"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could just be drop(conn)
to indicate to anyone blocked on it that the other side is gone?
tests/testsuite/freshness.rs
Outdated
// Note: rustc and the linker may still be running because we didn't kill | ||
// the entire process group. Normally, when a user hits Ctrl-C, everything | ||
// is killed. However, setting up process groups in a cross-platform test | ||
// is a pain, and there's no easy way to know when everything has been |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW we've already got test code that does this, perhaps it wouldn't be too bad to refactor that so it could be used here too?
tests/testsuite/freshness.rs
Outdated
// file...t1-HASH.rcgu.o`. I think what is happening is that the old rustc | ||
// process is still running, and deletes the `*.o` files while the command | ||
// below is trying to write them. Not sure if that is macOS-specific. | ||
std::thread::sleep(std::time::Duration::new(2, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you're mentioning above sounds right, since child.kill()
only kills Cargo itself we've still got rustc running, and rustc, realizing the linker failed, will eventually clean up after itself. If that races with the next rustc clobbering the same files that can cause issues.
I think though it'd be best (if not too hard) to use process groups here to ensure that we kill the entire process tree, including rustc.
The problem with the death tests that use job objects is that there's no way to know when the process actually exits. As you can see in the code with the long comment about Windows, it loops 10 times trying to delete the target directory to know when everything is done, which this test can't do (because it needs to preserve the target directory). I'll try to put together something with a rustc wrapper (I have it written already, just wasn't working great on windows). It would be nice to use file locks, but I don't think that is particularly easy to do cross-platform. |
That's true yeah, I'm just generally pretty loathe to add more and more sleeps into the test suite. I'd personally prefer to go relatively far out of our way to avoid adding sleeps and such, since sleeps are guaranteed to be flaky "eventually", just not as often. |
According to https://docs.microsoft.com/en-us/windows/win32/procthread/nested-jobs nested jobs have been supported since Windows 8. This code isn't particularly safe with multiple test threads running at once because it modifies the job object of the test itself.
5a8c926
to
01648b9
Compare
Yea, I agree, I hate having sleeps as well, I was just being a little lazy. I pushed an update that adds a wrapper around rustc so that the test can detect when it exits. I also switched to job objects/sessions so that everything gets killed to better approximate what actually happens. Did a bunch of testing on azure and locally, too. |
@bors: r+ Thanks! |
📌 Commit 01648b9 has been approved by |
☀️ Test successful - checks-azure |
Update cargo 11 commits in 8751eb3010d4cdb5329b5a6bd2b6d765c95b0dca..90931d9b31e8b854522fed00916504a3ac6d8619 2020-04-21 18:04:35 +0000 to 2020-04-28 01:56:59 +0000 - Use associated constants directly on primitive types instead of modules (rust-lang/cargo#8077) - Clear `RUSTDOCFLAGS` before running tests (rust-lang/cargo#8168) - Fix warning for `resolve` mismatch in workspace. (rust-lang/cargo#8169) - Fix flaky linking_interrupted test. (rust-lang/cargo#8162) - Fixed some unnecessary borrows and clones. (rust-lang/cargo#8146) - Added warning when using restricted names in Windows. (rust-lang/cargo#8136) - Add changelog about dylib uplift. (rust-lang/cargo#8161) - Mention that cargo_metadata can parse json messages (rust-lang/cargo#8158) - Re-enable rustc-info-cache test again (rust-lang/cargo#8155) - Updates to path source walking. (rust-lang/cargo#8095) - Bump to 0.46.0, update changelog (rust-lang/cargo#8153)
The
linking_interrupted
test is a little flaky, and can randomly fail.On windows-gnu, the last call to
cargo test
was failing because it was hitting a file lock. Presumably this is because thekill
had not finished? Added a wait to ensure that cargo has exited.Also, sometimes on Windows, the rustc process gets killed. Not sure why that doesn't happen all the time, maybe related to the race mentioned above.
Finally, on macOS, I was noticing some errors where I think the rustc process had not yet exited, and deleted the
*.o
files while the next cargo build was running.