-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spurious Windows errors on CI #5481
Comments
This is a little worrisome! I think this means that something in the system has a handle open which we're not accounting for, and I agree that we need to track that down to see where that stray handle is going. |
Just an update on my investigation. I wrote a little library that uses the Restart Manager API to detect which processes have handles on the executable. Unfortunately, I was unable to catch anything on my local system (perhaps it slows things down too much, or the problem is unrelated to open handles). However, on AppVeyor it very quickly caught some processes. Unfortunately they are the Service processes, and you can't tell which service had the open handle. Here's the process list that had the binary open just before attempting to unlink or rename it:
I don't feel like this is very helpful. It might just be a limitation on Windows that you can't unlink or move a binary immediately after executing it. We could maybe put some windows-specific retry logic into these two tests that will retry a few times with a short delay. Otherwise I'm running out of ideas. |
👻 spooky... I think historically I've ended up just tryint to avoid this sort of situation in most tests, but if these tests fundamentally need to execute this pattern there's not much we can do unfortunately :( It may be possible though to at least rewrite some tests to avoid this pattern (e.g. copy the executable somewhere else and execute it, don't execute it where Cargo put it) |
I've attempted to mitigate |
… r=matklad Fix random Windows CI error for changing_bin_features_caches_targets Fixes #5481.
changing_bin_features_caches_targets
andrename_with_link_search_path
fail frequently on AppVeyor.I have been doing some investigation and I have narrowed down some small, reliable reproductions. In general, it appears that attempting to rename or unlink a binary immediately after executing it causes problems. It's as-if there is a ghost entry left behind that causes further attempts to replace it with a new file, or to delete its parent to fail.
I have a test that does this in a loop, and it fails after some number of iterations (tends to happen very fast on AppVeyor):
The link calls will sometimes fail with "access denied". I have even noticed that
A.exists()
is true immediately after the call tounlink
!It is unrelated to hard-links, copying the file fails, too. It's also not specific to Rust, since I've been able to repro with Python.
I have Defender disabled, indexing disabled, and it's not related to mspdbsrv (although all 3 of those can make it substantially worse).
A workaround is to add a 1 second delay after executing a file before deleting it. However, I'm going to continue investigating to figure out why this happens.
The text was updated successfully, but these errors were encountered: