-
-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement cancelable WaitForSingleObject for Windows #575
Conversation
BTW: about that question on stackoverflow: I added tests to verify that it stops waiting when the handle is closed. I am not sure if closing sets the handle, it can also be that |
Codecov Report
@@ Coverage Diff @@
## master #575 +/- ##
==========================================
+ Coverage 99.27% 99.31% +0.04%
==========================================
Files 89 91 +2
Lines 10628 10823 +195
Branches 747 753 +6
==========================================
+ Hits 10551 10749 +198
+ Misses 59 56 -3
Partials 18 18
Continue to review full report at Codecov.
|
The "polling loop with short sleep" trick can be a really useful hack, but unfortunately doesn't play nicely with modern computers. The problem is that each time we poll, we have to wake up the CPU, usually to do nothing. Imagine a process that has 10 tasks blocked in |
Linking to the more specific open issue: #233 |
Thanks for the explanation. In that case I will do the thread-thing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good! There's a bunch of somewhat-fiddly comments below, but the basic structure is solid.
trio/_core/_io_windows.py
Outdated
handle_arr = ffi.new("HANDLE[{}]".format(n)) | ||
for i in range(n): | ||
handle_arr[i] = handles[i] | ||
timeout = 1000 * 60 * 60 * 24 # todo: use INF here, whatever that is, and ditch the while |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows defines the special value INFINITE
to be 0xFFFFFFFF
. We should probably add this as a constant in _windows_cffi.py
, next to the error codes.
trio/_core/_io_windows.py
Outdated
def release_on_behalf_of(self, x): | ||
pass | ||
|
||
async def acquire_on_behalf_of(self, x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since nothing here depends on access to the internals of the WindowsIOManager
, I think we can move it out into some place like trio/_wait_for_single_object.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. This also gets rid of that circular import.
trio/_core/_io_windows.py
Outdated
n, handle_arr, False, timeout | ||
) | ||
if retcode != ErrorCodes.WAIT_TIMEOUT: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs say that WaitForMultipleObjects
can return WAIT_FAILED
, and that in this case we should check GetLastError
. I can't think of why this would ever fail, but probably best to check anyway to be safe. Fortunately _windows_cffi.py
already has the machinery you need; just do something like:
if retcode == WAIT_FAILED:
_windows_cffi.raise_winerror()
trio/_core/tests/test_windows.py
Outdated
@@ -1,13 +1,16 @@ | |||
import os | |||
from threading import Thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've learned the hard way that it's usually a bad idea to use threading.Thread
directly in tests – it can do things like swallow exceptions :-). If you need threads, then Trio's primitives are actually easier to use and more reliable, so ... that's what I use now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests a synchronous function, so afaik I cannot use a Trio primitive here, right? (I am not that familiar with Trio yet though :)).
I now changed the test, so that the main thread waits for the handle, and a thread is spawned only to set the handle after a short timeout (there is thus very little that can go wrong in that thread).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use an async test function to test a synchronous function :-).
async def test_...():
async with trio.open_nursery() as nursery:
# This runs 'sync_fn' in a background thread
nursery.start_soon(trio.run_sync_in_worker_thread, sync_fn)
# ... we can do other stuff here while it's running ...
# When we de-dent to close the nursery, it automatically joins the background thread
So this automatically makes sure that you join the background thread (it can't accidentally keep running while other tests go, even if this test crashes), it automatically detects exceptions in the background thread and makes sure that if they happen it causes the test to fail, and it's actually simpler to use (IMO) than the raw threading.Thread
API :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course. I don't understand why I did not think to use run_sync_in_worker_thread
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, don't feel bad, this concurrency/IO stuff is complicated. I didn't think of it either when I was writing the trio.ssl
testsuite, even though I wrote run_sync_in_worker_thread
like the month before... so you get to learn from my fail :-)
trio/_core/tests/test_windows.py
Outdated
t.start() | ||
kernel32.SetEvent(handle1) | ||
t.join() # the test succeeds if we do not block here :) | ||
kernel32.CloseHandle(handle1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this test needs a thread at all – there's no particular reason to think that the thread is actually running before we call SetEvent
, so you can capture the important part of the test by doing:
handle1 = kernel32.CreateEventA(ffi.NULL, True, False, ffi.NULL)
kernel32.SetEvent(handle1)
WaitForMultipleObjects_sync(handle1)
kernel32.CloseHandle(handle1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no particular reason to think that the thread is actually running before we call
SetEvent
Oh dear, I overlooked that :)
Signaling before waiting is one option (and I added it for the fast tests), but I think it would be good to also test actually "acquiring the wait" and then setting the handle to "release the wait". Put that in a @slow
test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense :-)
trio/_core/tests/test_windows.py
Outdated
kernel32.SetEvent(handle2) | ||
t.join() # the test succeeds if we do not block here :) | ||
kernel32.CloseHandle(handle1) | ||
kernel32.CloseHandle(handle2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment applies.
trio/_core/tests/test_windows.py
Outdated
|
||
pass | ||
|
||
# Test 3, cancelation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling: should be cancellation
(In American spelling, there's some inconsistency about whether you use one or two l's in words like "cancelled". But in trio we always use two, because that's acceptable everywhere, and it's easier to remember than trying to keep track of which words have two and which have one.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
trio/_core/tests/test_windows.py
Outdated
|
||
t1 = _core.current_time() | ||
assert (t1 - t0) < 0.5 * TIMEOUT | ||
print('test_WaitForSingleObject test 5 OK') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this one actually require a timeout? I would have thought that WaitForSingleObject
would fail immediately, before even starting a thread...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, was mostly there as a fail-safe. Removed it because it makes the test much shorter.
trio/_core/tests/test_windows.py
Outdated
# Set the timeout used in the tests. The resolution of WaitForSingleObject | ||
# is 0.01 so anything more than a magnitude larger should probably do. | ||
# If too large, the test become slow and we might need to mark it as @slow. | ||
TIMEOUT = 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment about the resolution is out of date, right?
If this is a test that has to sleep – and I think it probably is – then we're probably going to mark it @slow
:-). I try to keep the quick test run you do over and over while hacking as fast as possible, so right now the slowest single test is <0.3s. We always run the full test suite during CI runs though, so all the @slow
tests still get run on every commit.
Maybe we could have one test that's not marked @slow
and checks the trivial cases of (1) WaitForSingleObject
on an already-ready object, (2) WaitForSingleObject
that gets preemptively cancelled, and then a second test that is marked @slow
and has the tests for (3) we wait a bit and then signal the object, (4) we wait a bit and then cancel the call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that comment was outdated.
trio/_core/tests/test_windows.py
Outdated
kernel32.CloseHandle(handle) | ||
t1 = _core.current_time() | ||
assert (t1 - t0) < 0.5 * TIMEOUT | ||
print('test_WaitForSingleObject test 4 OK') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already signaled, right, not already cancelled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
@njsmith I addressed your comments, and the tests are now split into slow and fast tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting closer! Left a few more comments below. Also, some bookkeeping:
-
We need to export our new function, so people can use it! This is low-level, so it should be part of the
trio.hazmat
namespace. Unfortunately this is a little more complicated than it should be, because previous to thistrio.hazmat
only re-exported stuff fromtrio._core
, so the code will need a bit of tweaking. Feel free to ask here or in gitter if you're not sure what to do. -
We want to tell people about this feature in our release notes! We do that by making a file in the
newsfragments/
directory, named<issue number>.feature.rst
. Here we have an issue – Provide an API that does WaitForSingleObject on Windows #233 – so it'd be233.feature.rst
. (If we don't have an issue, we use the PR number instead.) The contents of this file get rolled into our release notes. Seenewsfragments/README.rst
for more details. -
We need to document the new feature so people can find it! You'll want to add some docs to the
docs/source/reference-hazmat.rst
file, in the "Windows-specific API" section.
trio/_wait_for_single_object.py
Outdated
# Quick check; we might not even need to spawn a thread. The zero | ||
# means a zero timeout; this call never blocks. We also exit here | ||
# if the handle is already closed for some reason. | ||
retcode = kernel32.WaitForSingleObject(handle, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have an if retcode == ErrorCodes.WAIT_FAILED: raise_winerror()
branch too, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I initially though that just returning on closed handled would be practical, but if someone somehow passes a wrong/invalid handle, they'd want to know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I was also confused about this until I thought about it more, and remembered that there isn't really such a thing as a "closed handle" – once you close a handle, the next new handle you open has a chance of being assigned the same handle id. So doing anything with a handle after you close it is really error prone – you might end up doing it on some other random object without realizing it! Better to get an error early in testing and fix the bug.
trio/_wait_for_single_object.py
Outdated
for i in range(n): | ||
handle_arr[i] = handles[i] | ||
timeout = 0xffffffff # INFINITE | ||
kernel32.WaitForMultipleObjects(n, handle_arr, False, timeout) # blocking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this should check for WAIT_FAILED
too.
handle2 = kernel32.CreateEventA(ffi.NULL, True, False, ffi.NULL) | ||
kernel32.CloseHandle(handle2) | ||
WaitForMultipleObjects_sync(handle1, handle2) | ||
kernel32.CloseHandle(handle1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect that if we do check for WAIT_FAILED
, then these will start raising errors, which is good.
WaitForMultipleObjects_sync(handle1) | ||
t1 = _core.current_time() | ||
assert TIMEOUT <= (t1 - t0) < 1.2 * TIMEOUT | ||
kernel32.CloseHandle(handle1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... this is weird. If you look at the Appveyor results, the sync_slow
test failed on a bunch of trials, though the other tests passed, including the other _slow
test below. And the reason they failed is that t1 - t0
ended up being like 0.297, which is less than 0.3. Which seems like it should be impossible.
I think that's going on is that on Windows, time.sleep
actually uses a slightly different clock than trio does. The Windows clock APIs are really complicated, but trio currently uses time.monotonic
, which is not quite the same as other clocks. If you use trio.sleep
, that's always fully in sync with trio.current_time
, so the assertion should pass – and that's what we see in the other slow test below. But if you use time.sleep
, then apparently there can be a bit of skew.
So, I think you should rewrite this test to use await trio.sleep
, like the other one below. Fortunately this fits in nicely with my other comment about not using the raw threading API :-).
Also, I'd switch all the timeouts checks to use < 2 * TIMEOUT
, based on what we've ended up doing in test_run.py
and test_ki.py
. (We used to use smaller values and then would get false failures when some CI server was running slow, and eventually ended up bumping the values up to that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time's weird.
Sorry, I don't understand how I can use trio.sleep()
inside a synchronous function (running in a thread). The WaitForMultipleObjects_sync()
is synchronous, so in my understanding we cannot use async to set the handle here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think I fixed it (yet unpushed) by checking time.monotonic
. Am happy to change to make it more trio-ish, though not sure how that would work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, you can't use trio.sleep()
inside a sync function. But you can move trio.sleep()
into the main thread, and put the WaitForMultipleObjects_sync
into a thread :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right :)
trio/_wait_for_single_object.py
Outdated
pass | ||
|
||
|
||
async def WaitForSingleObject(handle): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, now that this is going to be a public API, we should think about what kind of object people will pass in as this handle
object. Internally we use our ffi object's HANDLE objects, but the ffi object isn't part of our public API, so we can't expect users to do that. We should also let users pass in plain old integers, and then cast them to our HANDLE objects internally to use them.
The _io_windows.py
has a _handle
helper function that does this. We should move that helper into _windows_cffi.py
(since it's part of our general "accessing the Windows API" code, not part of WindowsIOManager
specifically), and then in this function add something like handle = _handle(handle)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Just wondering if this might be a hint that it should be in |
That's a great question! I thought about that too. But it turns out these are slightly different distinctions. The point of having The point of |
Ok, I've made the requested changes. A few points:
Just to be sure, these don't count as deep internals? I added module docstings to I renamed the new module to This new feature is Windows only. I've added this to the function docstring. What to do wrt exposing? I now wrote it such that the function is only in Related, in I understand the reason for the trio/_core/hazmat namespaces. But the loops that the code has to jump through is not the prettiest part of Trio :) This is outside of the scope of this PR, but has it been considered to do this differently? For instance with a script that generates |
Nope! The raw Win32 bindings aren't connected to Trio's internals at all. In a perfect world they'd be part of the stdlib (similar to how the
Mostly I just don't use them myself and forget to write them :-). I guess another reason is that my manuals tend to have a lot more narrative text around each module than I would want to put in a module docstring, and keeping module docstrings up to date is a hassle. But I don't have any objection to adding docstrings to public module that are like, a few sentences long and give a brief overview of what the module's purpose is.
There are arguments either way, but for now Trio follows the stdlib's lead here and makes it so that system-specific functions only exist where they're supported. It's kind of convenient because you can do
They are all platform specific. The
It's, umm, like you said, not the prettiest part of Trio. They're defined in
Yeah, this dates back to the very beginning of trio, and has turned out to be pretty overengineered. We really should simplify it, which is #542. |
Thanks for your explanations and patience, Nathaniel! This PR looks good from my end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! A few more comments below, but at this point we've got all the high-level stuff dialed in, and have addressed all the medium-level issues, so now it's just final nitpicks and polish. Thanks again for your patience :-).
await WaitForSingleObject("not a handle") # Wrong type | ||
# with pytest.raises(OSError): | ||
# await WaitForSingleObject(99) # If you're unlucky, it actually IS a handle :( | ||
print('test_WaitForSingleObject not a handle OK') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this was the only test that checked passing a integer to WaitForSingleObject
. Since "pass an integer" is the official way for external users to call this, we should have a test that does that :-).
To avoid the "unlucky" issue, how about: create a handle with CreateEvent
, set it, and then extract the integer value from the cffi HANDLE
object, and test that when we pass this integer value into WaitForSingleObject
it returns immediately without error.
To convert a HANDLE
to a Python integer, I think the incantation is: int(ffi.cast("intptr_t", handle_object))
(This will also make codecov stop complaining, since the "convert integer → HANDLE" code in _handle
is the thing that's not covered currently.)
docs/source/reference-hazmat.rst
Outdated
@@ -268,6 +268,8 @@ anything real. See `#26 | |||
.. function:: monitor_completion_key() | |||
:with: queue | |||
|
|||
.. function:: WaitForSingleObject() | |||
:async: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we can't use sphinx's autodoc
feature here, because for that sphinx has to import trio
and then look at the function's docstring... and the function doesn't exist on Linux, which is what readthedocs.org uses to build our docs. So, we need to type some actual words into this file :-). (monitor_completion_key
isn't documented because it's an unfinished stub... see the TODO at the top of the section.)
I'd move this up to the top of the Windows-specific API
section (so above the "TODO", since this function is no longer a TODO!), and then if you scroll up a bit in the file you can see the docs for wait_writable
as an example of how to write the docs directly in reference-hazmat.rst
. (Unfortunately it works a bit differently than for docstrings, because the sphinx-napoleon extension that knows how to interpret the friendly Google-docstring format only works on actual docstrings; when typing into the .rst file you have to use the lower-level ReST markup directly.)
If you want to look at the docs locally, you can do:
pip install -r ci/rtd-requirements.txt
cd docs
sphinx-build -nW -b html source build
and then they're in docs/build/
newsfragments/233.feature.rst
Outdated
@@ -0,0 +1 @@ | |||
Add ``trio.hazmat.WaitForSingleObject()`` async function to await Windows handles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file will eventually be incorporated into the docs, so you can use Sphinx markup, like:
:func:`trio.hazmat.WaitForSingleObject`
trio/hazmat.py
Outdated
This namespace represents low-level functionality not intended for daily use, | ||
but useful for extending Trio's functionality. It is the union of | ||
a subset of trio/_core/ and some things from trio/*.py. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd drop the second sentence here, because the intended audience for a public module docstring is the users, and it's none of their business which files we put things in internally :-)
Done! |
Noticed two more little doc nits, and just pushed changes myself – do they look good to you? (The cffi thing is because our |
Yes! (was not sure if you'd seen the thumbsup :)) |
Oh thanks, I probably would have noticed the thumbsup eventually but it would have taken a bit :-). Okay, let's do this! |
And as per our our contributing documentation, I also sent you an invitation to join the project :-). Up to you whether you accept or not, but I hope you do! Are you planning to keep working on this, e.g. by moving on to |
Not now; I should really focus on other things for the moment (and holidays). I got excited about Trio by @touilleMan's talk at EuroPython and decided to joint the sprint to familiarize myself with it more. I should congratulate you; the codebase is really nice, and the people even more so! So it's likely that when I have some free cycles I'll contribute more :) |
Makes sense! Have a great holiday, and come back any time :-). |
This is a stab at step 1 in #4 (comment).
Right now, it is implemented by calling
kernel32.WaitForSingleObject ()
with a zero timeout. This avoids the need for a second event object + thread etc. as mentioned in the referenced comment. But perhaps there was a good reason for that suggested approach -- maybe avoiding the arbitrarysleep
reolution? If there is, I'm happy to dive a bit deeper and implement that. If not, I can remove some of the kernel32 stuff.The tests make use of
trio.sleep()
. The docs mention something about anautojump_clock
fixture, but I could not find what that does. Could that be of use here to make the sleeping in this test not affect the test duration?The test also do some timing measurements and assert based on the result. There is a pretty good margin, but it feels a bit flaky (e.g. what if the CI it runs on is busy with other things?) Any ideas to improve that would be good.
PS:
running the auto style formatter made a few changes is unrelated files.the latest yapf formats differently in a handful of places than the one used by CI.