Wait for the overlapped operation to be signaled as canceled #326

razvan-pricope · 2023-05-12T13:18:20Z

let the OS set the internal flag to STATUS_CANCELED). If the wil::unique_folder_change_reader_nothrow handle is destroyed before doing that, a random memory corruption or crash might occur.

… OS set the `internal` flag to STATUS_CANCELED). If the wil::unique_folder_change_reader_nothrow handle is destroyed before that, a random memory corruption or crash might occur.

jonwis · 2023-05-22T15:55:03Z

Neat, thanks - MSDN says

The cancel operation for all pending I/O operations issued by the calling process for the specified file handle was successfully requested. The application must not free or reuse the OVERLAPPED structure associated with the canceled I/O operations until they have completed. The thread can use the GetOverlappedResult function to determine when the I/O operations themselves have been completed.

And passing TRUE there means "wait for callbacks to complete."

Good catch!

One common problem with these callback types is if they are destroyed on the same thread as the callback. So like if the watcher is invoked on a thread, then as part of its operation it initiates teardown, that can cause these "block for completion" parts of destructors to hang. A common workaround is to track the thread ID in the "handle callback" and if the destroying thread is the same as the callback thread, don't block.

I haven't done enough overlapped IO to know if GetOverlappedResult has that problem.

include/wil/filesystem.h

razvan-pricope · 2023-05-23T08:23:11Z

One common problem with these callback types is if they are destroyed on the same thread as the callback. So like if the watcher is invoked on a thread, then as part of its operation it initiates teardown, that can cause these "block for completion" parts of destructors to hang. A common workaround is to track the thread ID in the "handle callback" and if the destroying thread is the same as the callback thread, don't block.

I haven't done enough overlapped IO to know if GetOverlappedResult has that problem.

I don't think that GetOverlappedResult is affected by this issue because you are not creating any threads when calling ReadDirectoryChangesW. You are esentially "stealing" whatever thread you want to wait for the operation completion and are responsible for the lifetime of the thread.
However, WaitForThreadpoolIoCallbacks might deadlock in the above scenario and there is a comment just before calling this function that describes the problem.

DHowett · 2023-11-01T17:11:36Z

This might seem wild, but I think we (Windows Terminal) are hitting deadlocks on shutdown after upgrading to a version of WIL that has this change.

The callstack points to...

03 00000055`5c4ff900 00007ffe`906cadc6     KERNELBASE!GetOverlappedResult+0xcf [minkernel\kernelbase\error.c @ 678] 
04 00000055`5c4ff940 00007ffe`906cb3a6     TerminalApp!wil::details::folder_change_reader_state::~folder_change_reader_state+0x8e [C:\__w\1\s\packages\Microsoft.Windows.ImplementationLibrary.1.0.230824.2\include\wil\filesystem.h @ 636] 
05 00000055`5c4ff980 00007ffe`906cab18     TerminalApp!wil::details::folder_change_reader_state::`scalar deleting destructor'+0xe

How are we holding it wrong? I'd love to hold it right :)

razvan-pricope · 2023-11-02T13:39:49Z

We are seeing the same behavior during our internal tests and I analyzed a memory dump. What I found is that GetLastError() is ERROR_NOT_FOUND. I don't know if GetOverlappedResults (or the WaitForSingleObject inside it) sets the last error value, but peeking through the assembly seems to suggest that it doesn't. So, most probably, CancelIoEx returned FALSE && GetLastError() == ERROR_NOT_FOUND
The documentation for CancelIoEx suggests that this error code can be triggered when there is no pending operation to be canceled.

Based on the above, I reproduced synthetically a scenario when the above hang can take place. There is a race condition where if you receive a monitor notification and you delete the directory before issuing a new call to StartIo, the ReadDirectoryChanges will return ACCESS_DENIED. In this case, no pending asynchronous operation is in progress, so GetOverlappedResult hangs.
If seems that the fix would be to check the return value of CancelIoEx and only in case of success to call GetOverlappedResult.
I will create a pull request with a fix.

Wait for the overlapped operation to be signaled as canceled (let the…

868e653

… OS set the `internal` flag to STATUS_CANCELED). If the wil::unique_folder_change_reader_nothrow handle is destroyed before that, a random memory corruption or crash might occur.

jonwis requested changes May 22, 2023

View reviewed changes

include/wil/filesystem.h Outdated Show resolved Hide resolved

Changed variable name to bytesTransferredIgnored

eebfd6d

jonwis approved these changes May 23, 2023

View reviewed changes

jonwis merged commit 56b2de5 into microsoft:master May 31, 2023

razvan-pricope mentioned this pull request Nov 2, 2023

Fix folder_change_reader_state destructor hang. #386

Merged

DHowett mentioned this pull request Dec 12, 2023

WindowsTerminal.exe leak (never exit) after the wtd.exe new-tab ... command line process exits. microsoft/terminal#16456

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait for the overlapped operation to be signaled as canceled #326

Wait for the overlapped operation to be signaled as canceled #326

razvan-pricope commented May 12, 2023

jonwis commented May 22, 2023

razvan-pricope commented May 23, 2023

DHowett commented Nov 1, 2023

razvan-pricope commented Nov 2, 2023 •

edited

Loading

Wait for the overlapped operation to be signaled as canceled #326

Wait for the overlapped operation to be signaled as canceled #326

Conversation

razvan-pricope commented May 12, 2023

jonwis commented May 22, 2023

razvan-pricope commented May 23, 2023

DHowett commented Nov 1, 2023

razvan-pricope commented Nov 2, 2023 • edited Loading

razvan-pricope commented Nov 2, 2023 •

edited

Loading