Skip to content

std::fs::remove_dir_all() doesn't delete all files on Windows 7 #99934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
crusader-mike opened this issue Jul 30, 2022 · 7 comments · Fixed by #99937
Closed

std::fs::remove_dir_all() doesn't delete all files on Windows 7 #99934

crusader-mike opened this issue Jul 30, 2022 · 7 comments · Fixed by #99937
Labels
C-bug Category: This is a bug.

Comments

@crusader-mike
Copy link

crusader-mike commented Jul 30, 2022

On Windows 7 my incremental builds produce following warnings:

warning: Failed to garbage collect finalized incremental compilation session directory `\\?\D:\Work\<skip>\target\debug\incre
mental\<skip>-h0ku3qc8pde\s-gc3wk9dfjy-13pdd9o-11auxo9w4xmty`: The directory is not empty. (os error 145)

I tracked this issue down to a bug in std::fs::remove_dir_all() (here). This function tries "posix delete" each file and if it fails -- falls back to "windows delete". Problem is that it does not restart enumeration in fallback -- thus first batch of filenames end up being skipped (and subsequent "delete directory" fails because dir is not empty). See this (note false is always passed as restart parameter). Enumeration state is stored within underlying kernel object, not handle -- duplicating handle won't reset it.

I suggest ReOpenFile() before fallback or better yet -- reuse DirBuf returned on first iteration.

$ cargo --version
cargo 1.64.0-nightly (8827baaa7 2022-07-14)

General notes:

  • 1kb DirBuf used by std::fs is way too small (it takes about 5-10 files) -- enumerating directory with 1 mil files will take ages, I suggest 64kb (or better yet -- make it a parameter)
  • If you need an object model that "fits" both Posix and Win32 FS API I suggest smth like this:
    • have a Dir object that represents "open" directory and allows access to other objects using relative path. It should also have:
      fn enumerate(&self, local_path, buf_sz) -> DirEnum;  <-- will call ReOpenFile() (or it's equivalent from NT API)
      fn enumerate_self(self, buf_sz) -> DirEnum;   <-- note this consumes self (no need to ReOpenFile, just move underlying HANDLE)
    
    • have a DirEnum object that represents open directory with active enumeration going on and has same interface plus:
    fn done(&self) -> bool;
    fn next_batch(&self, restart: bool) -> DirEntryBatch;  <-- this should use syscall(SYS_getdents) on Linux (to perform) and returned object can be empty (all entries marked as deleted) without done() being true
    
@ChrisDenton
Copy link
Member

ChrisDenton commented Jul 30, 2022

Hi, thanks for the report! I have submitted #99937 to address the specific problem in this issue. It's intended to be the most minimal change possible so that it can (hopefully) be quickly reviewed and merged.

@crusader-mike
Copy link
Author

@ChrisDenton Thank you. You (or whoever is in charge of std::fs) should consider notes in first post. Current design is inefficient -- it unnecessarily penalises Win7 (or any Win32 FS that doesn't support posix semantic) by re-reading first batch twice, it also suboptimal in other ways (as described).

@ChrisDenton
Copy link
Member

While I do agree it should definitely be implemented more efficiently, I would doubt reading the first batch twice has a measurable impact on performance. At that point the directory information will be in the OS cache (because we only just loaded it) so it's just copying the memory from the OS to the application again. This is far cheaper than all the other operations we're doing (e.g deleting files and loading potentially uncached directory information) and it's only a one time cost so it's diluted by the sheer number of other operations. That said, reusing it would be better from a code perspective, if nothing else.

You're right that increasing the buffer size to 64k would have a measurably positive impact on performance. However, it comes with a cost. Currently we open and store directory handles as soon as we find them. This means that increasing the buffer size potentially increases the number of open handles. For very very deeply nested directory structures this risks exhausting the process handle limit. This risk is further increased if the process is doing similarly deep deletion on multiple threads. The trade-off may be worth it but that's a design decision that needs to be made. Or else a mitigation needs to be implemented so we switch strategies in the pathological case of extremely deeply nested directories.

@crusader-mike
Copy link
Author

At that point the directory information will be in the OS cache

There is absolutely zero guarantee this will happen. Especially if OS is under load. You really can't rely on this kind of logic.

Currently we open and store directory handles as soon as we find them. This means that increasing the buffer size potentially increases the number of open handles.

Yeah... This is another call for re-design.

In addition to everything said, also note that current implementation leaves state in underlying kernel object -- i.e. if dir enumeration stopped early, current position will be remembered and next time user uses this handle, it may lead to surprises.

Also, design I proposed allows for overlapping of "retrieve next batch" and "process previous batch" operations. Another avenue for performance, if user needs it.

@ChrisDenton, what is the reason for this whole "use posix-semantic in delete"? Why would you want to do it here? And if there is a reason -- why would silently falling back to "windows semantic" is ok?

@ChrisDenton
Copy link
Member

The reason is to actually successfully delete more often. A problem that plagues Windows 7, 8 and early versions of Windows 10 is that a virus checker or other software may hold an open handle to a file, preventing actual deletion until they close the handle. From the point of view of remove_dir_all this means deleting the file was "successful" but when it comes to delete the parent directory it fails because the file is currently still there. POSIX delete avoids this problem by immediately removing the file from the directory without waiting for all handles to be closed.

@crusader-mike
Copy link
Author

yeah... typical suggested approach on old Windows is to "if it failed -- wait a bit and try again". Maybe it should be adopted in std::fs fallback execution path...

@bors bors closed this as completed in 656cd3a Jul 30, 2022
@crusader-mike
Copy link
Author

@ChrisDenton I noticed an anomaly: if I delete directory in question and run cargo build again -- it fails with exactly same error (even though directory doesn't exist anymore). Looks like some sort of caching mechanism is in play -- do you know anything about it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants