Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky linking_interrupted test. #8162

Merged
merged 3 commits into from
Apr 27, 2020

Conversation

ehuss
Copy link
Contributor

@ehuss ehuss commented Apr 26, 2020

The linking_interrupted test is a little flaky, and can randomly fail.

On windows-gnu, the last call to cargo test was failing because it was hitting a file lock. Presumably this is because the kill had not finished? Added a wait to ensure that cargo has exited.

Also, sometimes on Windows, the rustc process gets killed. Not sure why that doesn't happen all the time, maybe related to the race mentioned above.

Finally, on macOS, I was noticing some errors where I think the rustc process had not yet exited, and deleted the *.o files while the next cargo build was running.

@rust-highfive
Copy link

r? @Eh2406

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 26, 2020
@ehuss
Copy link
Contributor Author

ehuss commented Apr 26, 2020

I ran a bunch of tests on azure, and locally, and it seems to be fairly stable. I'm a little unhappy with the call to sleep, but it is difficult to know when rustc has completely exited. I tried a rustc wrapper, but sometimes it was killed on Windows. I can put the rustc wrapper in, but I felt like the test was getting overly complicated, where a sleep solves the problem almost always.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find!

// killed. This write will tell them to exit, pretending that they died
// before finishing. Ignore the error, because (sometimes?) on Windows
// everything is already killed.
let _ = conn.write(b"X");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could just be drop(conn) to indicate to anyone blocked on it that the other side is gone?

// Note: rustc and the linker may still be running because we didn't kill
// the entire process group. Normally, when a user hits Ctrl-C, everything
// is killed. However, setting up process groups in a cross-platform test
// is a pain, and there's no easy way to know when everything has been
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW we've already got test code that does this, perhaps it wouldn't be too bad to refactor that so it could be used here too?

// file...t1-HASH.rcgu.o`. I think what is happening is that the old rustc
// process is still running, and deletes the `*.o` files while the command
// below is trying to write them. Not sure if that is macOS-specific.
std::thread::sleep(std::time::Duration::new(2, 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you're mentioning above sounds right, since child.kill() only kills Cargo itself we've still got rustc running, and rustc, realizing the linker failed, will eventually clean up after itself. If that races with the next rustc clobbering the same files that can cause issues.

I think though it'd be best (if not too hard) to use process groups here to ensure that we kill the entire process tree, including rustc.

@ehuss
Copy link
Contributor Author

ehuss commented Apr 27, 2020

The problem with the death tests that use job objects is that there's no way to know when the process actually exits. As you can see in the code with the long comment about Windows, it loops 10 times trying to delete the target directory to know when everything is done, which this test can't do (because it needs to preserve the target directory).

I'll try to put together something with a rustc wrapper (I have it written already, just wasn't working great on windows). It would be nice to use file locks, but I don't think that is particularly easy to do cross-platform.

@alexcrichton
Copy link
Member

That's true yeah, I'm just generally pretty loathe to add more and more sleeps into the test suite. I'd personally prefer to go relatively far out of our way to avoid adding sleeps and such, since sleeps are guaranteed to be flaky "eventually", just not as often.

ehuss added 2 commits April 27, 2020 12:02
According to https://docs.microsoft.com/en-us/windows/win32/procthread/nested-jobs
nested jobs have been supported since Windows 8. This code isn't
particularly safe with multiple test threads running at once because
it modifies the job object of the test itself.
@ehuss ehuss force-pushed the linking-interrupted-flaky branch from 5a8c926 to 01648b9 Compare April 27, 2020 19:23
@ehuss
Copy link
Contributor Author

ehuss commented Apr 27, 2020

Yea, I agree, I hate having sleeps as well, I was just being a little lazy.

I pushed an update that adds a wrapper around rustc so that the test can detect when it exits. I also switched to job objects/sessions so that everything gets killed to better approximate what actually happens. Did a bunch of testing on azure and locally, too.

@alexcrichton
Copy link
Member

@bors: r+

Thanks!

@bors
Copy link
Contributor

bors commented Apr 27, 2020

📌 Commit 01648b9 has been approved by alexcrichton

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 27, 2020
@bors
Copy link
Contributor

bors commented Apr 27, 2020

⌛ Testing commit 01648b9 with merge 7e1011d...

@bors
Copy link
Contributor

bors commented Apr 27, 2020

☀️ Test successful - checks-azure
Approved by: alexcrichton
Pushing 7e1011d to master...

@bors bors merged commit 7e1011d into rust-lang:master Apr 27, 2020
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 28, 2020
Update cargo

11 commits in 8751eb3010d4cdb5329b5a6bd2b6d765c95b0dca..90931d9b31e8b854522fed00916504a3ac6d8619
2020-04-21 18:04:35 +0000 to 2020-04-28 01:56:59 +0000
- Use associated constants directly on primitive types instead of modules (rust-lang/cargo#8077)
- Clear `RUSTDOCFLAGS` before running tests (rust-lang/cargo#8168)
- Fix warning for `resolve` mismatch in workspace. (rust-lang/cargo#8169)
- Fix flaky linking_interrupted test. (rust-lang/cargo#8162)
- Fixed some unnecessary borrows and clones. (rust-lang/cargo#8146)
- Added warning when using restricted names in Windows. (rust-lang/cargo#8136)
- Add changelog about dylib uplift. (rust-lang/cargo#8161)
- Mention that cargo_metadata can parse json messages (rust-lang/cargo#8158)
- Re-enable rustc-info-cache test again (rust-lang/cargo#8155)
- Updates to path source walking. (rust-lang/cargo#8095)
- Bump to 0.46.0, update changelog (rust-lang/cargo#8153)
@ehuss ehuss added this to the 1.45.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants