Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious Windows errors on CI #5481

Closed
ehuss opened this issue May 4, 2018 · 4 comments · Fixed by #5493
Closed

Spurious Windows errors on CI #5481

ehuss opened this issue May 4, 2018 · 4 comments · Fixed by #5493

Comments

@ehuss
Copy link
Contributor

ehuss commented May 4, 2018

changing_bin_features_caches_targets and rename_with_link_search_path fail frequently on AppVeyor.

I have been doing some investigation and I have narrowed down some small, reliable reproductions. In general, it appears that attempting to rename or unlink a binary immediately after executing it causes problems. It's as-if there is a ghost entry left behind that causes further attempts to replace it with a new file, or to delete its parent to fail.

I have a test that does this in a loop, and it fails after some number of iterations (tends to happen very fast on AppVeyor):

  1. Link B to A
  2. Run A
  3. Delete A
  4. Link C to A
  5. Run A
  6. Delete A

The link calls will sometimes fail with "access denied". I have even noticed that A.exists() is true immediately after the call to unlink!

It is unrelated to hard-links, copying the file fails, too. It's also not specific to Rust, since I've been able to repro with Python.

I have Defender disabled, indexing disabled, and it's not related to mspdbsrv (although all 3 of those can make it substantially worse).

A workaround is to add a 1 second delay after executing a file before deleting it. However, I'm going to continue investigating to figure out why this happens.

@alexcrichton
Copy link
Member

This is a little worrisome! I think this means that something in the system has a handle open which we're not accounting for, and I agree that we need to track that down to see where that stray handle is going.

@ehuss
Copy link
Contributor Author

ehuss commented May 5, 2018

Just an update on my investigation. I wrote a little library that uses the Restart Manager API to detect which processes have handles on the executable. Unfortunately, I was unable to catch anything on my local system (perhaps it slows things down too much, or the problem is unrelated to open handles). However, on AppVeyor it very quickly caught some processes. Unfortunately they are the Service processes, and you can't tell which service had the open handle.

Here's the process list that had the binary open just before attempting to unlink or rename it:

PID 748:
    "Application Experience" (AeLookupSvc)
    "Background Intelligent Transfer Service" (BITS)
    "Certificate Propagation" (CertPropSvc)
    "Group Policy Client" (gpsvc)
    "IKE and AuthIP IPsec Keying Modules" (IKEEXT)
    "IP Helper" (iphlpsvc)
    "Server" (LanmanServer)
    "User Profile Service" (ProfSvc)
    "Task Scheduler" (Schedule)
    "System Event Notification Service" (SENS)
    "Remote Desktop Configuration" (SessionEnv)
    "Shell Hardware Detection" (ShellHWDetection)
    "Themes" (Themes)
    "Windows Management Instrumentation" (Winmgmt)
    "Microsoft Account Sign-in Assistant" (wlidsvc)
PID 1068:
    "User Access Logging Service" (UALSVC)
PID 4:
    "System"

I don't feel like this is very helpful. It might just be a limitation on Windows that you can't unlink or move a binary immediately after executing it. We could maybe put some windows-specific retry logic into these two tests that will retry a few times with a short delay. Otherwise I'm running out of ideas.

@alexcrichton
Copy link
Member

👻

spooky...

I think historically I've ended up just tryint to avoid this sort of situation in most tests, but if these tests fundamentally need to execute this pattern there's not much we can do unfortunately :(

It may be possible though to at least rewrite some tests to avoid this pattern (e.g. copy the executable somewhere else and execute it, don't execute it where Cargo put it)

@alexcrichton
Copy link
Member

I've attempted to mitigate rename_with_link_search_path specifically in #5488

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants