-
Notifications
You must be signed in to change notification settings - Fork 13.3k
std::fs::remove_dir_all()
doesn't delete all files on Windows 7
#99934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, thanks for the report! I have submitted #99937 to address the specific problem in this issue. It's intended to be the most minimal change possible so that it can (hopefully) be quickly reviewed and merged. |
@ChrisDenton Thank you. You (or whoever is in charge of |
While I do agree it should definitely be implemented more efficiently, I would doubt reading the first batch twice has a measurable impact on performance. At that point the directory information will be in the OS cache (because we only just loaded it) so it's just copying the memory from the OS to the application again. This is far cheaper than all the other operations we're doing (e.g deleting files and loading potentially uncached directory information) and it's only a one time cost so it's diluted by the sheer number of other operations. That said, reusing it would be better from a code perspective, if nothing else. You're right that increasing the buffer size to 64k would have a measurably positive impact on performance. However, it comes with a cost. Currently we open and store directory handles as soon as we find them. This means that increasing the buffer size potentially increases the number of open handles. For very very deeply nested directory structures this risks exhausting the process handle limit. This risk is further increased if the process is doing similarly deep deletion on multiple threads. The trade-off may be worth it but that's a design decision that needs to be made. Or else a mitigation needs to be implemented so we switch strategies in the pathological case of extremely deeply nested directories. |
There is absolutely zero guarantee this will happen. Especially if OS is under load. You really can't rely on this kind of logic.
Yeah... This is another call for re-design. In addition to everything said, also note that current implementation leaves state in underlying kernel object -- i.e. if dir enumeration stopped early, current position will be remembered and next time user uses this handle, it may lead to surprises. Also, design I proposed allows for overlapping of "retrieve next batch" and "process previous batch" operations. Another avenue for performance, if user needs it. @ChrisDenton, what is the reason for this whole "use posix-semantic in delete"? Why would you want to do it here? And if there is a reason -- why would silently falling back to "windows semantic" is ok? |
The reason is to actually successfully delete more often. A problem that plagues Windows 7, 8 and early versions of Windows 10 is that a virus checker or other software may hold an open handle to a file, preventing actual deletion until they close the handle. From the point of view of |
yeah... typical suggested approach on old Windows is to "if it failed -- wait a bit and try again". Maybe it should be adopted in |
@ChrisDenton I noticed an anomaly: if I delete directory in question and run cargo build again -- it fails with exactly same error (even though directory doesn't exist anymore). Looks like some sort of caching mechanism is in play -- do you know anything about it? |
On Windows 7 my incremental builds produce following warnings:
I tracked this issue down to a bug in
std::fs::remove_dir_all()
(here). This function tries "posix delete" each file and if it fails -- falls back to "windows delete". Problem is that it does not restart enumeration in fallback -- thus first batch of filenames end up being skipped (and subsequent "delete directory" fails because dir is not empty). See this (notefalse
is always passed as restart parameter). Enumeration state is stored within underlying kernel object, not handle -- duplicating handle won't reset it.I suggest
ReOpenFile()
before fallback or better yet -- reuse DirBuf returned on first iteration.General notes:
std::fs
is way too small (it takes about 5-10 files) -- enumerating directory with 1 mil files will take ages, I suggest 64kb (or better yet -- make it a parameter)Dir
object that represents "open" directory and allows access to other objects using relative path. It should also have:The text was updated successfully, but these errors were encountered: