Fix flaky linking_interrupted test. #8162

ehuss · 2020-04-26T18:23:57Z

The linking_interrupted test is a little flaky, and can randomly fail.

On windows-gnu, the last call to cargo test was failing because it was hitting a file lock. Presumably this is because the kill had not finished? Added a wait to ensure that cargo has exited.

Also, sometimes on Windows, the rustc process gets killed. Not sure why that doesn't happen all the time, maybe related to the race mentioned above.

Finally, on macOS, I was noticing some errors where I think the rustc process had not yet exited, and deleted the *.o files while the next cargo build was running.

rust-highfive · 2020-04-26T18:23:59Z

r? @Eh2406

(rust_highfive has picked a reviewer for you, use r? to override)

ehuss · 2020-04-26T18:25:56Z

I ran a bunch of tests on azure, and locally, and it seems to be fairly stable. I'm a little unhappy with the call to sleep, but it is difficult to know when rustc has completely exited. I tried a rustc wrapper, but sometimes it was killed on Windows. I can put the rustc wrapper in, but I felt like the test was getting overly complicated, where a sleep solves the problem almost always.

alexcrichton

Nice find!

alexcrichton · 2020-04-27T14:47:01Z

tests/testsuite/freshness.rs

+    // killed. This write will tell them to exit, pretending that they died
+    // before finishing. Ignore the error, because (sometimes?) on Windows
+    // everything is already killed.
+    let _ = conn.write(b"X");


I think this could just be drop(conn) to indicate to anyone blocked on it that the other side is gone?

alexcrichton · 2020-04-27T14:49:33Z

tests/testsuite/freshness.rs

+    // Note: rustc and the linker may still be running because we didn't kill
+    // the entire process group. Normally, when a user hits Ctrl-C, everything
+    // is killed. However, setting up process groups in a cross-platform test
+    // is a pain, and there's no easy way to know when everything has been


FWIW we've already got test code that does this, perhaps it wouldn't be too bad to refactor that so it could be used here too?

alexcrichton · 2020-04-27T14:50:52Z

tests/testsuite/freshness.rs

+    // file...t1-HASH.rcgu.o`. I think what is happening is that the old rustc
+    // process is still running, and deletes the `*.o` files while the command
+    // below is trying to write them. Not sure if that is macOS-specific.
+    std::thread::sleep(std::time::Duration::new(2, 0));


I think what you're mentioning above sounds right, since child.kill() only kills Cargo itself we've still got rustc running, and rustc, realizing the linker failed, will eventually clean up after itself. If that races with the next rustc clobbering the same files that can cause issues.

I think though it'd be best (if not too hard) to use process groups here to ensure that we kill the entire process tree, including rustc.

ehuss · 2020-04-27T15:03:31Z

The problem with the death tests that use job objects is that there's no way to know when the process actually exits. As you can see in the code with the long comment about Windows, it loops 10 times trying to delete the target directory to know when everything is done, which this test can't do (because it needs to preserve the target directory).

I'll try to put together something with a rustc wrapper (I have it written already, just wasn't working great on windows). It would be nice to use file locks, but I don't think that is particularly easy to do cross-platform.

alexcrichton · 2020-04-27T15:08:33Z

That's true yeah, I'm just generally pretty loathe to add more and more sleeps into the test suite. I'd personally prefer to go relatively far out of our way to avoid adding sleeps and such, since sleeps are guaranteed to be flaky "eventually", just not as often.

According to https://docs.microsoft.com/en-us/windows/win32/procthread/nested-jobs nested jobs have been supported since Windows 8. This code isn't particularly safe with multiple test threads running at once because it modifies the job object of the test itself.

ehuss · 2020-04-27T19:27:10Z

Yea, I agree, I hate having sleeps as well, I was just being a little lazy.

I pushed an update that adds a wrapper around rustc so that the test can detect when it exits. I also switched to job objects/sessions so that everything gets killed to better approximate what actually happens. Did a bunch of testing on azure and locally, too.

alexcrichton · 2020-04-27T19:58:17Z

@bors: r+

Thanks!

bors · 2020-04-27T19:58:19Z

📌 Commit 01648b9 has been approved by alexcrichton

bors · 2020-04-27T19:58:26Z

⌛ Testing commit 01648b9 with merge 7e1011d...

bors · 2020-04-27T20:27:04Z

☀️ Test successful - checks-azure
Approved by: alexcrichton
Pushing 7e1011d to master...

Update cargo 11 commits in 8751eb3010d4cdb5329b5a6bd2b6d765c95b0dca..90931d9b31e8b854522fed00916504a3ac6d8619 2020-04-21 18:04:35 +0000 to 2020-04-28 01:56:59 +0000 - Use associated constants directly on primitive types instead of modules (rust-lang/cargo#8077) - Clear `RUSTDOCFLAGS` before running tests (rust-lang/cargo#8168) - Fix warning for `resolve` mismatch in workspace. (rust-lang/cargo#8169) - Fix flaky linking_interrupted test. (rust-lang/cargo#8162) - Fixed some unnecessary borrows and clones. (rust-lang/cargo#8146) - Added warning when using restricted names in Windows. (rust-lang/cargo#8136) - Add changelog about dylib uplift. (rust-lang/cargo#8161) - Mention that cargo_metadata can parse json messages (rust-lang/cargo#8158) - Re-enable rustc-info-cache test again (rust-lang/cargo#8155) - Updates to path source walking. (rust-lang/cargo#8095) - Bump to 0.46.0, update changelog (rust-lang/cargo#8153)

Fix flaky linking_interrupted test.

aca8bae

rust-highfive assigned Eh2406 Apr 26, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 26, 2020

ehuss mentioned this pull request Apr 26, 2020

Added warning when using restricted names in Windows. #8136

Merged

alexcrichton reviewed Apr 27, 2020

View reviewed changes

ehuss added 2 commits April 27, 2020 12:02

Add a wrapper to wait for rustc to exit.

01648b9

ehuss force-pushed the linking-interrupted-flaky branch from 5a8c926 to 01648b9 Compare April 27, 2020 19:23

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 27, 2020

bors merged commit 7e1011d into rust-lang:master Apr 27, 2020

ehuss mentioned this pull request Apr 28, 2020

Update cargo rust-lang/rust#71642

Merged

ehuss added this to the 1.45.0 milestone Feb 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky linking_interrupted test. #8162

Fix flaky linking_interrupted test. #8162

ehuss commented Apr 26, 2020

rust-highfive commented Apr 26, 2020

ehuss commented Apr 26, 2020

alexcrichton left a comment

alexcrichton Apr 27, 2020

alexcrichton Apr 27, 2020

alexcrichton Apr 27, 2020

ehuss commented Apr 27, 2020

alexcrichton commented Apr 27, 2020

ehuss commented Apr 27, 2020

alexcrichton commented Apr 27, 2020

bors commented Apr 27, 2020

bors commented Apr 27, 2020

bors commented Apr 27, 2020

Fix flaky linking_interrupted test. #8162

Fix flaky linking_interrupted test. #8162

Conversation

ehuss commented Apr 26, 2020

rust-highfive commented Apr 26, 2020

ehuss commented Apr 26, 2020

alexcrichton left a comment

Choose a reason for hiding this comment

alexcrichton Apr 27, 2020

Choose a reason for hiding this comment

alexcrichton Apr 27, 2020

Choose a reason for hiding this comment

alexcrichton Apr 27, 2020

Choose a reason for hiding this comment

ehuss commented Apr 27, 2020

alexcrichton commented Apr 27, 2020

ehuss commented Apr 27, 2020

alexcrichton commented Apr 27, 2020

bors commented Apr 27, 2020

bors commented Apr 27, 2020

bors commented Apr 27, 2020