Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd: rd /s for recursive directory removal fails intermittently and is asynchronous #309

Closed
mklement0 opened this issue Nov 20, 2018 · 2 comments
Labels
Area-Interop Communication between processes Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Cmd.exe The issue is related to the legacy command interpreter, CMD.exe. Resolution-Won't-Fix We're just really obstinate about this. There's probably a good reason.

Comments

@mklement0
Copy link

mklement0 commented Nov 20, 2018

Note: .NET Core's System.IO.Directory.Delete() and PowerShell's Remove-Item are equally affected: see here and here.

The Windows API functions DeleteFile() and RemoveDirectory() functions are inherently asynchronous (emphasis added):

The DeleteFile function marks a file for deletion on close. Therefore, the file deletion does not occur until the last handle to the file is closed. Subsequent calls to CreateFile to open the file fail with ERROR_ACCESS_DENIED."

The RemoveDirectory function marks a directory for deletion on close. Therefore, the directory is not removed until the last handle to the directory is closed."

rd /s fails to account for this asynchronous behavior, which has two implications:

  • Problem (a): Trying to delete a nonempty directory (which invariably requires recursive deletion of its content first) can fail - infrequently, but it does happen, and the failure is not reflected in the exit code (although a message is issued to stderr).

  • Problem (b): Trying to recreate a successfully deleted directory or a file immediately afterwards may fail - intermittently (easier to provoke than the other problem, but a more exotic use case overall).

    • Note: You could argue that it's not rd's responsibility to create a synchronous experience in this case, given that the underlying system API is asynchronous; however, it certainly makes for more robust, predictable programs if synchronicity is ensured - as is the case on Unix-like platforms, where the system APIs for file removal are synchronous.

Problem (a) is due to using depth-first recursion without accounting for the asynchronous deletion behavior; the problem, along with the solution, is described in this YouTube video (starts at 7:35).

Important:

  • For convenience, Windows PowerShell is used to reproduce the problem below.
  • The tests create temp. dir. $HOME/tmpDir - remove it manually afterwards, if still present.

Steps to reproduce

Setup:

  • Open a PowerShell console.
  • Paste and submit the code with the two function definitions below at the prompt to define the functions used in the tests.

Problem (a):

Assert-ReliableDirRemoval cmd

Problem (b):

Assert-SyncDirRemoval cmd

Expected behavior

The functions should loop indefinitely (terminate them with Ctrl+C), emitting a . in each iteration.

Actual behavior

Eventually - and that there is no predictable time frame is indicative of the problem - an error will occur, on the order of minutes or even longer.

Problem (a):

Reliable dir-tree-removal test: Repeatedly creating and removing a directory subtree, using cmd for removal.
.................................tmpDir\sub - The directory is not empty.
Deletion failed quietly or with exit code 0, tmpDir still has content: sub

That is, recursive removal of the target dir's content failed due to async timing issues; while a stderr message was issued, the exit code did not reflect failure (was still 0), and both the target directory and remnants inside it lingered.

Problem (b):

Synchronous dir-removal test: Repeatedly creating and removing a directory, using cmd for removal.
...................New-Item : Access is denied
At line:25 char:15
+       $null = New-Item -Type Directory tmpDir
+               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : PermissionDenied: (C:\Users\jdoe\tmpDir:String) [New-Item], UnauthorizedAccessException
    + FullyQualifiedErrorId : ItemExistsUnauthorizedAccessError,Microsoft.PowerShell.Commands.NewItemCommand

That is, recreating the target dir failed, because the prior removal hadn't yet completed.

Aux. function definitions:

Functions Assert-ReliableDirRemoval and Assert-SyncDirRemoval

You can paste the entire code at the prompt in order to define these functions.

function Assert-ReliableDirRemoval {
  param([parameter(Mandatory)] [ValidateSet('cmd', '.net', 'ps')] [string] $method)
  Write-Host "Reliable dir-tree-removal test: Repeatedly creating and removing a directory subtree, using $method for removal."
  # Treat all errors as fatal.
  $ErrorActionPreference = 'Stop'
  Set-Location $HOME  # !! Seemingly, a dir. in the user's home dir. tree reproduces the symptom fastest - $env:TEMP does not.
  # Remove a preexisting directory first, if any.
  Remove-Item -EA Ignore -Recurse tmpDir
  While ($true) {
    Write-Host -NoNewline .
    # Create a subdir. tree.
    $null = New-Item -Type Directory tmpDir, tmpDir/sub, tmpDir/sub/sub, tmpDir/sub/sub/sub
    # Fill each subdir. with 1000 empty test files.
    "tmpDir", "tmpDir/sub", "tmpDir/sub/sub", "tmpDir/sub/sub/sub"| ForEach-Object {
      $dir = $_
      1..1e3 | ForEach-Object { $null > "$dir/$_" }
    }
    # Now remove the entire dir., which invariably involves deleting its contents
    # recursively first.
    switch ($method) {
      'ps'  { Remove-Item -Recurse tmpDir }
      '.net' { [System.IO.Directory]::Delete((Convert-Path tmpDir), $true) }
      'cmd'  { cmd /c rd /s /q tmpDir; if ($LASTEXITCODE) { Throw } }
      # !! If rd /s fails during recursive removal due to async issues, it emits a stderr line, but 
      # !! does NOT report a nonzero exit code. We detect this case below.
    }
    # Does the dir. unexpectedly still exist?
    # This can happen for two reasons:
    #  - The removal of the top-level directory itself has been requested, but isn't complete yet.
    #  - Removing the content of the top-level directory failed quietly.
    while (Test-Path tmpDir) {
      if ($output = Get-ChildItem -EA Ignore -Force -Name tmpDir) { # Still has content?
        Throw "Deletion failed quietly or with exit code 0, tmpDir still has content: $output"
      } else {
        # Top-level directory removal isn't complete yet.
        # Wait for removal to complete, so we can safely recreate the directory in the next iteration.
        # This loop should exit fairly quickly.
        Write-Host -NoNewline !; Start-Sleep -Milliseconds 100
      }      
    }    
  }
}

function Assert-SyncDirRemoval { 
  param([parameter(Mandatory)] [ValidateSet('cmd', '.net', 'ps')] [string] $method)
  Write-Host "Synchronous dir-removal test: Repeatedly creating and removing a directory, using $method for removal."
  # Treat all errors as fatal.
  $ErrorActionPreference = 'Stop'
  Set-Location $HOME  # !! Seemingly, a dir. in the user's home dir. tree reproduces the symptom fastest - $env:TEMP does not.
  While ($true) {
    Write-Host -NoNewline .
    # Remove the test dir., which invariably involves deleting its contents recursively first.
    # Note: This could itself fail intermittently, but with just 10 files and no subdirs. is unlikely to.
    if (Test-Path tmpDir) {
      switch ($method) {
        'ps' { Remove-Item -Recurse tmpDir }
        '.net' { [System.IO.Directory]::Delete((Convert-Path tmpDir), $true) }
         'cmd'  { cmd /c rd /s /q tmpDir; if ($LASTEXITCODE) { Throw } }
         # !! If rd /s fails during recursive removal due to async issues, it emits a stderr line, but 
         # !! does NOT report a nonzero exit code. We detect this case below.
      }
    }
    # (Re-)create the dir. with 10 empty test files.
    # If the previous removal hasn't fully completed yet, this will fail.
    # Note: [System.IO.Directory]::Delete() could have quietly failed to remove the contents of the dir.
    #       due to async timing accidents, but, again, this is unlikely with 1- files and 
    try {
      $null = New-Item -Type Directory tmpDir
    } catch {
      # Handle the case where removal failed
      # quietly due to async vagaries while removing the dir's *content*, which
      # can happen with [System.IO.Directory]::Delete().
      # Note that if removal succeeded but is pending - the scenario we're trying to
      # get to occur - Get-ChildItem reports an access-denied error, which we ignore
      # in favor of reporting the original error from the dir. re-creation attempt.
      if ($output = Get-ChildItem -EA Ignore -Force -Name tmpDir) { # Still has content?
        Write-Warning "Ignoring failed content removal, retrying..."
        # Simply try again
        Start-Sleep -Milliseconds 500; Remove-item tmpDir -Recurse
        continue
      }
      # Re-creation failed due to async removal.
      Throw $_
    }
    1..10 | ForEach-Object { $null > tmpDir/$_ }
  }
}

Environment data

cmd /c ver output:

Microsoft Windows [Version 10.0.17134.407]
@miniksa miniksa added the Product-Cmd.exe The issue is related to the legacy command interpreter, CMD.exe. label Jan 18, 2019
@ghost ghost added the Needs-Tag-Fix Doesn't match tag requirements label May 17, 2019
@miniksa miniksa added Area-Interop Communication between processes Issue-Bug It either shouldn't be doing this or needs an investigation. and removed Mass-Chaos labels May 17, 2019
@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label May 18, 2019
@WSLUser WSLUser mentioned this issue Jun 2, 2020
2 tasks
@mklement0
Copy link
Author

mklement0 commented Nov 6, 2020

Note:

  • The following update is based on running the Assert-ReliableDirRemoval .net and Assert-ReliableDirRemoval ps tests above successfully in Windows 10 version 20H2 and the information gleaned from this Stack Overflow post.

  • As of this writing, the documentation of the underlying DeleteFile Windows API makes no mention of the change in behavior.


Starting with Windows 10 version 1909, (at least) build 18363.657 (I don't know that Windows Server version and build that corresponds to; run winver.exe to check your version and build), the DeleteFile Windows API function now exhibits synchronous behavior at the Windows API level, which implicitly solves the problems with PowerShell's Remove-Item and .NET's System.IO.File.Delete / System.IO.Directory.Delete.

However, even on such Windows 10 version rd /s still fails intermittently (that is, Assert-ReliableDirRemoval cmd still fails after a while), suggesting that something other than DeleteFile is used (perhaps it is SetFileInformationByHandle, to which FILE_DISPOSITION_POSIX_SEMANTICS would have to be passed).

This means that rd /s now could be made synchronous on recent enough Windows 10 versions, but - unlike in PowerShell and .NET - that would require additional work.

@zadjii-msft
Copy link
Member

As mentioned in dotnet/runtime#27958 and PowerShell/PowerShell#8211

Starting with Windows 10 version 1909, (at least) build 18363.657 (I don't know that Windows Server version and build that corresponds to; run winver.exe to check your version and build), the DeleteFile Windows API function now exhibits synchronous behavior at the Windows API level, which implicitly solves the problems with PowerShell's Remove-Item and .NET's System.IO.File.Delete / System.IO.Directory.Delete (but, curiously, not with cmd.exe's rd /s).

Unfortunately, we won't be updating cmd.exe anytime soon. If you want a modern, updated shell on Windows, you should try out PowerShell 7.

Sorry to let this one linger on the backlog so long. It was filed long before we really got our github muscles flexed, and got a little lost.

@ghost ghost added the Needs-Tag-Fix Doesn't match tag requirements label Dec 7, 2021
@zadjii-msft zadjii-msft added the Resolution-Won't-Fix We're just really obstinate about this. There's probably a good reason. label Dec 7, 2021
@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Interop Communication between processes Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Cmd.exe The issue is related to the legacy command interpreter, CMD.exe. Resolution-Won't-Fix We're just really obstinate about this. There's probably a good reason.
Projects
None yet
Development

No branches or pull requests

3 participants