-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timeout option on Client #46
Conversation
Thank you! Perhaps there could be a more descriptive error message. Otherwise, it's fine for me |
Codecov Report
@@ Coverage Diff @@
## master #46 +/- ##
=========================================
+ Coverage 76.44% 77.04% +0.6%
=========================================
Files 6 6
Lines 467 488 +21
=========================================
+ Hits 357 376 +19
- Misses 110 112 +2
Continue to review full report at Codecov.
|
@cydanil Error message is in + small unit test |
@@ -66,10 +69,15 @@ def __init__(self, endpoint, sock='REQ', ser='msgpack'): | |||
else: | |||
raise NotImplementedError('socket is not supported:', str(sock)) | |||
|
|||
if timeout is not None: | |||
self._socket.setsockopt(zmq.RCVTIMEO, timeout) | |||
self._recv_ready = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit wary of this extra state variable. I haven't thought through in detail, but I think it might end up with both server and client thinking it's their turn to listen for a message, and it won't be obvious why nothing is getting sent.
I don't have a better idea in the short term, but maybe this is the real reason to use another socket type like PUSH/PULL - it avoids having client & server state which can get out of step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't like it so much... But is has proven reliable at least in use with karaboFAI.
I should spend more time playing with PUSH/PULL and other patterns, but (PUSH/PULL) seems to bring also some more complication:
- handling properly message queues, since messages are very large it can easily crash in client applications
- PUSH/PULL doesn't allow load balancing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can constrain the memory usage easily enough in our client code by setting a HWM. Not sure what you mean about the load balancing. I'm thinking about how the bridge can work most efficiently; maybe this is something we can work on when I'm there next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PUSH socket does not allow load balancing if you have several clients (PULL sockets), it is shared in round robin manner, so if a client is slow or not ready it will still receive data that will be queued. It is generally bad if you have some client that is used for monitoring once in a while what is in the data, in that case it will consume 1/nth train even if not using anything - also thinking what happens if one client is not consuming data, and the input queue is full, it this blocking the whole interface?)... Also for tools like onda, I believe the master process is waiting for processed train/pulses to arrive in order so if a slow worker still queue data at its input it can increase delay quite a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I never managed to make use of the HWM, if you have an example if it working I'm highly interested!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks,
I tried it quickly, it does seem to work! I however hit into something weird:
It happened that exiting a PULL socket while receiving data could crash the PUSH socket:
sent 177
Assertion failed: !more (src/lb.cpp:110)
zsh: abort (core dumped) python pushpull.py push
Bug is known and fixed in libzmq 4.2.x
zeromq/libzmq#1588
I had to update pyzmq>=17.1.0
to be built against the lib 4.2, but then the weird behavior is: when closing the PULL socket while receiving data the PUSH is not crashing anymore, but the next data received (after restarting a PULL) starts from the last part not sent. :
% python pushpull.py pull
RCVHWM 1
RCVBUF -1
# parts: 2
# parts: 2
# parts: 2
^C%
tmichela@machine ~/projects/tmp/pushpull
% python pushpull.py pull
RCVHWM 1
RCVBUF -1
# parts: 1
# parts: 2
# parts: 2
# parts: 2
you can find the code for the example here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it also seem that closing a PULL socket affects other PULL sockets:
% python pushpull.py pull
RCVHWM 1
RCVBUF -1
9 - # parts: 2
10 - # parts: 3
11 - # parts: 3
12 - # parts: 3
13 - # parts: 3
14 - # parts: 3
16 - # parts: 3
19 - # parts: 3
20 - # parts: 3
23 - # parts: 3
25 - # parts: 3
26 - # parts: 2
27 - # parts: 3
29 - # parts: 3
31 - # parts: 2
32 - # parts: 3
33 - # parts: 3
34 - # parts: 3
36 - # parts: 3
38 - # parts: 2
39 - # parts: 3
Here I receive partial messages when an other PULL socket is terminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we move this discussion to a separate issue? I think it's possible that's a bug with ZeroMQ or pyzmq.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened an issue on pyzmq for it: zeromq/pyzmq#1244
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Do we have an idea how long it takes to transfer a train of stacked detector data? It would be good to document this, because you probably don't want to set the timeout lower than the time to transfer one message. |
Yes,
|
👍 let's mention a couple of those times somewhere (docstring?) |
@takluyver Can I merge this? It will be required for one instrument. |
Yes, go for it! Sorry, I forgot about this PR. |
Close/reopen to rerun appveyor tests |
Sorry about that - too many tabs open with pull requests, I was thinking of a different one. We don't have or need Appveyor tests. |
No worries, Thanks! |
No description provided.