Skip to content
This repository has been archived by the owner on May 4, 2018. It is now read-only.

win: cause uv_read_stop to immediately stop reading the stream #1377

Closed
wants to merge 5 commits into from

Conversation

vtjnash
Copy link
Contributor

@vtjnash vtjnash commented Jul 21, 2014

this implements locking around the blocking call to ReadFile to get
around a Windows kernel bug where a blocking ReadFile operation on a
stream can deadlock the thread. this allows uv_read_stop to immediately
cancel a pending IO operation, and allows uv_pipe_getsockname to
"pause" any pending read (from libuv) while it retrieves the
sockname information

if unsupported by the OS (pre-Vista), this reverts to the old
(e.g. deadlock-prone) behavior

ref. issue #1313

@saghul
Copy link
Contributor

saghul commented Jul 26, 2014

@vtjnash thanks for this. I'll review it as soon as I can. Others are also welcome to do so :-)

@indutny
Copy link
Contributor

indutny commented Jul 28, 2014

@vtjnash is there any way to add a regression test for this? It'd simplify understanding of the problem...

@vtjnash
Copy link
Contributor Author

vtjnash commented Jul 28, 2014

Here's the start of a simple test:

uv_loop_t* loop = uv_default_loop();
uv_pipe_t read, write;
uv_pipe_init(loop, &read, UV_PIPE_READABLE | UV_PIPE_SPAWN_SAFE); /* synchronous readable pipe end */
uv_pipe_init(loop, &write, UV_PIPE_WRITABLE | UV_PIPE_SPAWN_SAFE); /* synchronous writable pipe end */
uv_pipe_link(&read, &write); /* libuv equivalent of posix function `pipe` */
uv_read_start(&read, alloc_fn, read_fn);
Sleep(1000); /* wait for uv_read_start to start it's ReadFile thread */
//optional: uv_read_stop(read); /* necessary if some other process will be using this pipe */
uv_pipe_getsockname(&read); /* Windows kernel hangs here without patch */

Note: I'm using libuv fixes introduced in pending pull-request #451, so that I don't need to reimplement them when creating the read & write ends of the pipe. (it would be nice if someone could comment on that one too -- it's been over two years since we proposed that fix)

@saghul
Copy link
Contributor

saghul commented Jul 28, 2014

Note: I'm using libuv fixes introduced in pending pull-request #451, so that I don't need to reimplement them when creating the read & write ends of the pipe. (it would be nice if someone could comment on that one too -- it's been over two years since we proposed that fix)

I don't think we can have #451 merged before v0.12, alas. I know it has been quite some time, somehow nobody got around reviewing it, sorry :-/

Is there a way to test this with the current APIs? Note that since this is a Windows only test we could play with internals a bit if necessary ;-)

@vtjnash
Copy link
Contributor Author

vtjnash commented Jul 28, 2014

Is there a way to test this with the current APIs? Note that since this is a Windows only test we could play with internals a bit if necessary ;-)

Sure, you just need to call CreatePipe and then turn that into a uv_pipe_t* pair. I just don't know off-hand the exact incantation for doing that.

I don't think we can have #451 merged before v0.12, alas. I know it has been quite some time, somehow nobody got around reviewing it, sorry :-/

No, I wouldn't have expected that anyways. I would just like to see it making progress (non-stylistic comments) towards gettings it merged.

this implements locking around the blocking call to ReadFile to get
around a Windows kernel bug where a blocking ReadFile operation on a
stream can deadlock the thread. this allows uv_read_stop to immediately
cancel a pending IO operation, and allows uv_pipe_getsockname to
"pause" any pending read (from libuv) while it retrieves the
sockname information

if unsupported by the OS (pre-Vista), this reverts to the old
(e.g. deadlock-prone) behavior

ref. issue #1313
@vtjnash
Copy link
Contributor Author

vtjnash commented Aug 1, 2014

Is this expected to merge for v0.12?

I added a test. The expectation is that this test will hang the process without this patch or Windows == XP (because of the call to uv_read_start before uv_pipe_getsockname), and that it will succeed with the patch (on Windows >= Vista).

@saghul
Copy link
Contributor

saghul commented Aug 1, 2014

@vtjnash Thanks for adding the test! I'll take a look today or during the weekend, but we should be able to land it for 0.12, I think.

@saghul
Copy link
Contributor

saghul commented Aug 1, 2014

Quick comment: the test is a Windows only test, needs some ifdefs.

@saghul saghul added this to the v0.12 milestone Aug 1, 2014
@saghul
Copy link
Contributor

saghul commented Aug 4, 2014

I didn't get much time over the weekend, sorry. Will review tonight.

@@ -443,7 +443,8 @@ RB_HEAD(uv_timer_tree_s, uv_timer_s);
int queue_len; \
} pending_ipc_info; \
uv_write_t* non_overlapped_writes_tail; \
void* reserved;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave this one as the last field in the structure, IIRC we reserved it for fixing sending handles over threads during 0.12.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would have been good to have a comment to that effect when it was named "reserved"

@saghul
Copy link
Contributor

saghul commented Aug 4, 2014

I'd like to see some changes, in order to have this better integrated and avoid a lot of checking for the mutex:

  • uv__pipe_lock_read: check the handle flag which is set if we support canceling and it's a non-verlapped pipe, and get the mutex, do nothing otherwise.
  • uv__pipe_unlock_read: reverse of the above.
  • uv__pipe_pause_read: do what uv_pipe_readfile_pause does right now (if the handle flag is set).

Thanks! And sorry it took me so long to review this.

@@ -181,6 +182,8 @@ int uv_pipe_write(uv_loop_t* loop, uv_write_t* req, uv_pipe_t* handle,
int uv_pipe_write2(uv_loop_t* loop, uv_write_t* req, uv_pipe_t* handle,
const uv_buf_t bufs[], unsigned int nbufs, uv_stream_t* send_handle,
uv_write_cb cb);
void uv__pipe_unlock_read(const uv_pipe_t* handle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are internal, you can get rid of the const to vaoid casting in several places down below.

@@ -1846,7 +1934,8 @@ int uv_pipe_getsockname(const uv_pipe_t* handle, char* buf, size_t* len) {
name_info = malloc(name_size);
if (!name_info) {
*len = 0;
return UV_ENOMEM;
err = UV_ENOMEM;
goto error1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need, free(NULL) is guaranteed to be fine.

@saghul
Copy link
Contributor

saghul commented Aug 5, 2014

Had another round at it. This is just a couple of things away from getting merged: take out the cancling logic from uv__pipe_read_lock, and some style fixes.

@vtjnash
Copy link
Contributor Author

vtjnash commented Aug 5, 2014

It isn't canceling the read request, it is obtaining a lock on the kernel file lock via temporarily stopping the ReadFile call. When unlock is called, the ReadFile call is resumed (unless it detects that the READING flag is no longer set, then it just returns quietly)

@saghul
Copy link
Contributor

saghul commented Aug 5, 2014

I see. Maybe we can find a better name: uv__pipe_pause_read, or uv__pipe_interrupt_read ?

@vtjnash
Copy link
Contributor Author

vtjnash commented Aug 6, 2014

that's your call. I've considered also adding the following function, just so the effect of these three consecutive operations is more clear:

void uv__pipe_stop_read(uv_pipe_t* handle) {
  handle->flags &= ~UV_HANDLE_READING;
  uv__pipe_lock_read((uv_pipe_t*)handle);
  uv__pipe_unlock_read((uv_pipe_t*)handle);
}

@saghul
Copy link
Contributor

saghul commented Aug 6, 2014

that's your call. I've considered also adding the following function, just so the effect of these three consecutive operations is more clear:

Looks good 👍 Lets call the other one uv__pipe_pause_read then.

@vtjnash
Copy link
Contributor Author

vtjnash commented Aug 6, 2014

would uv__pipe_unlock_read be called uv__pipe_unpause_read then?

@saghul
Copy link
Contributor

saghul commented Aug 6, 2014

My initial idea (based on your code) was to have 3 functions: #1377 (comment)

@piscisaureus
Copy link

I object to this patch.

The CancelSynchronousIo operation is not thread safe.
It is possible that CancelSynchronousIo is called after ReadFile has already returned, causing another unrelated operation to be canceled. Or even something internal to windows.

@saghul
Copy link
Contributor

saghul commented Aug 6, 2014

I object to this patch.

The CancelSynchronousIo operation is not thread safe.
It is possible that CancelSynchronousIo is called after ReadFile has already returned, causing another unrelated operation to be canceled. Or even something internal to windows.

Ouch. Windows internals are not my thing, do you have any suggestion on how this can be addressed?

@vtjnash
Copy link
Contributor Author

vtjnash commented Aug 6, 2014

the function containing ReadFile is not permitted to return until it has reacquired the mutex and cleared the volatile HANDLE readfile_thread variable

@vtjnash
Copy link
Contributor Author

vtjnash commented Aug 6, 2014

@piscisaureus to elaborate, the situation you describe is expected to be handled by the mutex object, such that it is not possible for an unrelated IO operations to be canceled. that is why this pull request is somewhat more complicated than simply calling CancelSynchronousIo(hThread).

there are 4 checkpoints in the code, to ensure that we never call CancelSynchronousIo except when we know exactly the state of the ReadFile thread (and conversely, that we are always certain of the exact state of the ReadFile thread, with respect to all of the checkpoints):

  1. on entry to the ReadFile function: acquires the mutex, sets the hThread variable, then releases the mutex (ensures synchronous setting of the hThread variable and that operation is not paused)
  2. on exit from the ReadFile function: clears the hThread variable, then acquires the mutex, deletes hThread, then releases the mutex (ensures that no other thread is still using the hThread variable -- pointer size memory access is atomic http://msdn.microsoft.com/en-us/library/windows/desktop/ms684122(v=vs.85).aspx, saving an extra mutex here)
  3. on entry to the code needing a CancelSynchronousIo function call: acquires the mutex (ensures that no other thread is actively setting the hThread variable) and waits until the hThread variable is cleared (ensures that the ReadFile has been confirmed canceled. note that if hThread was set, we are holding the mutex, so ReadFile cannot exit until after we release the mutex.)
  4. on exit from the code that needed a CancelSynchronousIo function call: releases the mutex (allows the ReadFile function to acquire the mutex, and exit or restart finally)

@saghul
Copy link
Contributor

saghul commented Aug 7, 2014

@piscisaureus ping

@piscisaureus
Copy link

@vtjnash I see now, sorry for not reading your code thourougly enough.

I will not veto this patch or anything - it's really up to @saghul and @indutny. But I will make the case that this patch is not the way to go forward.

  • The patch adds a lot of complexity.
  • The patch also adds a lot of synchronization and syscalls. There's two mutex lock/unlock pairs per ReadFile operation added, as well as two syscalls (DuplicateHandle and CloseHandle). This affects everyone, even projects that won't ever care about retrieving the pipe name.
  • The only thing this patch enables is to retrieve the pipe name for (blocking) pipes. It's not a terribly useful function. Besides, most of the time blocking pipes are created by a CreatePipe call done by the parent process, and they won't have a name at all.
  • A far easier solution would be to just retrieve the pipe name when it is opened. I can't change anyway so this would be completely safe. This costs some memory (a few bytes, a malloc call) but it saves a ton of complexity.
  • In general, I think that the code in uv-win is too complicated, especially the IPC channel and the emulate-iocp mode event notification mode makes it messy. In the long term I would like to see the pipe implementation cleaned up and simplified. This patch is a move in the opposite direction.

@saghul
Copy link
Contributor

saghul commented Aug 7, 2014

Thanks for your feedback, @piscisaureus. I think @vtjnash proposed that in the beginning and I rejected it, because sometimes I'm dumb like that :-S I'm really sorry to have wasted your time @vtjnash.

In the end, caching the pipe name seems to be a good solution. We could store a pointer which we malloc when the pipe is bound/created, only if we are emulating IOCP. Then in uv_pipe_getsockname we would return that value for emulated-IOCP pipes and do the real thing for normal ones.

@piscisaureus
Copy link

I'm really sorry to have wasted your time @vtjnash

When code that has been written ends up being thrown away, it usually hurts a little.

However I would not say the time was wasted; we learned that a solution that uses CancelSynchronousIo can be implemented but it requires some serious hoop-jumping.
You should be proud of yourself and realize that you can now make the best decision in an informed way, because you actually tried the other thing.

Most of libuv code has been (re)written many times; the git commit log speaks of it, and in fact if you look in my private node fork you'll find even more history. This is normal and healthy. Throwing away questionable code, making things simpler, re-evaluating approaches all help make things better in the future. You learn more and more about the problem space while avoiding carrying your mistakes forward forever.

I'm glad to see this happen. Forward!

@piscisaureus
Copy link

Thanks, @vtjnash, for the time you invested in making libuv better.

@vtjnash
Copy link
Contributor Author

vtjnash commented Aug 7, 2014

Thanks for your feedback, @piscisaureus. I think @vtjnash proposed that in the beginning and I rejected it, because sometimes I'm dumb like that :-S I'm really sorry to have wasted your time @vtjnash.

You didn't reject it, I did, after testing that approach and finding that it only addresses one cause of the deadlock. If a pipe is only used internal to libuv, this patch would not be needed, since we could ensure that once we have started a read on the pipe we aren't interested in reading any of the properties of the pipe anymore (via any call to NtQueryInformationFile). But for pipes that are internal to libuv, the pipe would be opened for async operation, and none of this new code would be used.

However, without this patch, spawning child processes with a stdin handle inherited from cygwin hangs if it tries to inspect the stdin handle (which was when I realized that having uv_read_stop to actually stop the read operation was the only viable solution).

The patch also adds a lot of synchronization and syscalls. There's two mutex lock/unlock pairs per ReadFile operation added, as well as two syscalls (DuplicateHandle and CloseHandle). This affects everyone, even projects that won't ever care about retrieving the pipe name.

This bug affects everyone, even projects that don't care about retrieving the pipe name, since NtQueryInformationFile is used frequently (in libuv and in child processes) for other tasks. Furthermore, this mutex is only used when the pipe is opened in synchronous access mode. In this mode, uv_read already requires 2 thread switches (possibly starting a new thread) and a completion event post (and an extra loop through uv_run?). Adding two mutex synchronization events and syscalls seemed like a pretty minor amount of additional overhead.

The patch adds a lot of complexity.

I really wish that Window's didn't force this complexity on us, but I can't find a way around it (e.g. a select function or a way to call ReadFile that doesn't hold the kernel lock). It's not even an entirely complete solution, since it depends on the user calling uv_pipe_stop before passing this pipe anywhere, and it imposes additional arbitrary limits on having a pipe read end in use in multiple processes (although thats not a great idea anyways, since it's essentially a race condition).

But without an alternative solution proposed, I can't accept that doing nothing is better than having a (complex) solution.

@piscisaureus
Copy link

However, without this patch, spawning child processes with a stdin handle inherited from cygwin hangs if it tries to inspect the stdin handle

But this patch doesn't fix that, as it can only cancel ReadFile operations that were started by libuv - not those that are started by other processes. If we would retrieve the information we need when opening the pipe, that would solve the same problem.

@vtjnash
Copy link
Contributor Author

vtjnash commented Aug 7, 2014

But this patch doesn't fix that, as it can only cancel ReadFile operations that were started by libuv - not those that are started by other processes. If we would retrieve the information we need when opening the pipe, that would solve the same problem.

true, but we can't retrieve that information as soon as the pipe is opened (because we may not have opened the pipe, it might just have been handed to us), and that's only part of the concern. while we can't control what other processes do, we can make libuv behave better. like you said, we can't control the actions of other processes – instead it requires diligence from every process (including libuv) to try to avoid this situation.

@saghul
Copy link
Contributor

saghul commented Aug 9, 2014

@vtjnash I happened to meet Bert a couple of days ago and we discussed this issue. While it ultimately doesn't solve all problems, it does fix some, and even if the code looks quite complex, it doesn't penalize the usual execution path, since those functions are basically noops in that case.

So, I'm landing it. Thanks a lot for your patches!

@saghul
Copy link
Contributor

saghul commented Aug 9, 2014

Landed in ✨ 837c62c ✨ Thanks @vtjnash and @piscisaureus!

@saghul saghul closed this Aug 9, 2014
@vtjnash
Copy link
Contributor Author

vtjnash commented Aug 9, 2014

Thank you

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants