Add timeout option on Client #46

tmichela · 2018-11-28T18:52:37Z

No description provided.

cydanil · 2018-11-28T20:02:08Z

Thank you! Perhaps there could be a more descriptive error message. Otherwise, it's fine for me

codecov-io · 2018-11-29T14:05:39Z

Codecov Report

Merging #46 into master will increase coverage by 0.6%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master      #46     +/-   ##
=========================================
+ Coverage   76.44%   77.04%   +0.6%     
=========================================
  Files           6        6             
  Lines         467      488     +21     
=========================================
+ Hits          357      376     +19     
- Misses        110      112      +2

Impacted Files	Coverage Δ
karabo_bridge/client.py	`89.33% <100%> (+1.27%)`	⬆️
karabo_bridge/cli/monitor.py	`81.25% <0%> (+2.3%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80a2361...f978e7d. Read the comment docs.

tmichela · 2018-11-29T14:12:14Z

@cydanil Error message is in + small unit test

karabo_bridge/client.py

karabo_bridge/tests/test_client.py

setup.py

karabo_bridge/client.py

takluyver · 2018-12-06T12:07:00Z

karabo_bridge/client.py

@@ -66,10 +69,15 @@ def __init__(self, endpoint, sock='REQ', ser='msgpack'):
 else:
 raise NotImplementedError('socket is not supported:', str(sock))

+ if timeout is not None:
+ self._socket.setsockopt(zmq.RCVTIMEO, timeout)
+ self._recv_ready = False


I'm a bit wary of this extra state variable. I haven't thought through in detail, but I think it might end up with both server and client thinking it's their turn to listen for a message, and it won't be obvious why nothing is getting sent.

I don't have a better idea in the short term, but maybe this is the real reason to use another socket type like PUSH/PULL - it avoids having client & server state which can get out of step.

I also don't like it so much... But is has proven reliable at least in use with karaboFAI.
I should spend more time playing with PUSH/PULL and other patterns, but (PUSH/PULL) seems to bring also some more complication:

handling properly message queues, since messages are very large it can easily crash in client applications

PUSH/PULL doesn't allow load balancing

I think we can constrain the memory usage easily enough in our client code by setting a HWM. Not sure what you mean about the load balancing. I'm thinking about how the bridge can work most efficiently; maybe this is something we can work on when I'm there next week.

The PUSH socket does not allow load balancing if you have several clients (PULL sockets), it is shared in round robin manner, so if a client is slow or not ready it will still receive data that will be queued. It is generally bad if you have some client that is used for monitoring once in a while what is in the data, in that case it will consume 1/nth train even if not using anything - also thinking what happens if one client is not consuming data, and the input queue is full, it this blocking the whole interface?)... Also for tools like onda, I believe the master process is waiting for processed train/pulses to arrive in order so if a slow worker still queue data at its input it can increase delay quite a bit.

And I never managed to make use of the HWM, if you have an example if it working I'm highly interested!

Thanks,

I tried it quickly, it does seem to work! I however hit into something weird:
It happened that exiting a PULL socket while receiving data could crash the PUSH socket:

sent 177 Assertion failed: !more (src/lb.cpp:110) zsh: abort (core dumped) python pushpull.py push

Bug is known and fixed in libzmq 4.2.x
zeromq/libzmq#1588

I had to update pyzmq>=17.1.0 to be built against the lib 4.2, but then the weird behavior is: when closing the PULL socket while receiving data the PUSH is not crashing anymore, but the next data received (after restarting a PULL) starts from the last part not sent. :

% python pushpull.py pull RCVHWM 1 RCVBUF -1 # parts: 2 # parts: 2 # parts: 2 ^C% tmichela@machine ~/projects/tmp/pushpull % python pushpull.py pull RCVHWM 1 RCVBUF -1 # parts: 1 # parts: 2 # parts: 2 # parts: 2

you can find the code for the example here

it also seem that closing a PULL socket affects other PULL sockets:

% python pushpull.py pull RCVHWM 1 RCVBUF -1 9 - # parts: 2 10 - # parts: 3 11 - # parts: 3 12 - # parts: 3 13 - # parts: 3 14 - # parts: 3 16 - # parts: 3 19 - # parts: 3 20 - # parts: 3 23 - # parts: 3 25 - # parts: 3 26 - # parts: 2 27 - # parts: 3 29 - # parts: 3 31 - # parts: 2 32 - # parts: 3 33 - # parts: 3 34 - # parts: 3 36 - # parts: 3 38 - # parts: 2 39 - # parts: 3

Here I receive partial messages when an other PULL socket is terminated.

Shall we move this discussion to a separate issue? I think it's possible that's a bug with ZeroMQ or pyzmq.

Opened an issue on pyzmq for it: zeromq/pyzmq#1244

takluyver · 2018-12-06T12:08:23Z

Do we have an idea how long it takes to transfer a train of stacked detector data? It would be good to document this, because you probably don't want to set the timeout lower than the time to transfer one message.

tmichela · 2018-12-06T13:15:19Z

Do we have an idea how long it takes to transfer a train of stacked detector data? It would be good to document this, because you probably don't want to set the timeout lower than the time to transfer one message.

Yes,
This is dependent on the kind of data, on the 10G network, it should take about:

~0.1s for 32 pulses
~0.4s for 128 pulses
~1s for 350 pulses

takluyver · 2018-12-06T15:37:05Z

👍 let's mention a couple of those times somewhere (docstring?)

tmichela · 2019-03-01T08:36:36Z

@takluyver Can I merge this? It will be required for one instrument.

takluyver · 2019-03-01T09:54:07Z

Yes, go for it! Sorry, I forgot about this PR.

takluyver · 2019-03-01T09:56:29Z

Close/reopen to rerun appveyor tests

takluyver · 2019-03-01T09:57:51Z

Sorry about that - too many tabs open with pull requests, I was thinking of a different one. We don't have or need Appveyor tests.

tmichela · 2019-03-01T10:13:07Z

No worries, Thanks!
I'll probably make a new release soon.

add timeout option on Client

7dc1e65

tmichela requested review from cydanil and zhujun98 November 28, 2018 18:52

tmichela added 3 commits November 29, 2018 11:47

add error message

ace6171

add simple test for timeout

b35132a

fix timeout string; test indentation

e69d0cb

zhujun98 reviewed Nov 30, 2018

View reviewed changes

karabo_bridge/client.py Show resolved Hide resolved

karabo_bridge/tests/test_client.py Show resolved Hide resolved

tmichela added 2 commits November 30, 2018 13:30

improve test

febdaa1

new msgpack version breaks test for now...

a8765a7

tmichela commented Nov 30, 2018

View reviewed changes

setup.py Outdated Show resolved Hide resolved

binary type max len for unpacking

b8b7bd9

tmichela commented Dec 2, 2018

View reviewed changes

karabo_bridge/client.py Show resolved Hide resolved

takluyver reviewed Dec 6, 2018

View reviewed changes

karabo_bridge/client.py Outdated Show resolved Hide resolved

takluyver reviewed Dec 6, 2018

View reviewed changes

tmichela added 2 commits December 7, 2018 19:31

timeout property in seconds; update docstring

7b88aa6

revtimeo wants an int

f978e7d

takluyver closed this Mar 1, 2019

takluyver reopened this Mar 1, 2019

tmichela merged commit 6889339 into master Mar 1, 2019

tmichela deleted the timeout branch March 1, 2019 10:13

tmichela mentioned this pull request Nov 22, 2019

Timeout client European-XFEL/karabo-bridge-cpp#24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add timeout option on Client #46

Add timeout option on Client #46

tmichela commented Nov 28, 2018

cydanil commented Nov 28, 2018

codecov-io commented Nov 29, 2018 •

edited

Loading

tmichela commented Nov 29, 2018

takluyver Dec 6, 2018

tmichela Dec 6, 2018

takluyver Dec 6, 2018

tmichela Dec 7, 2018

tmichela Dec 7, 2018

tmichela Dec 7, 2018 •

edited

Loading

tmichela Dec 7, 2018 •

edited

Loading

takluyver Dec 7, 2018

takluyver Dec 7, 2018

tmichela Dec 7, 2018

takluyver commented Dec 6, 2018

tmichela commented Dec 6, 2018

takluyver commented Dec 6, 2018

tmichela commented Mar 1, 2019

takluyver commented Mar 1, 2019

takluyver commented Mar 1, 2019

takluyver commented Mar 1, 2019

tmichela commented Mar 1, 2019

Add timeout option on Client #46

Add timeout option on Client #46

Conversation

tmichela commented Nov 28, 2018

cydanil commented Nov 28, 2018

codecov-io commented Nov 29, 2018 • edited Loading

Codecov Report

tmichela commented Nov 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tmichela Dec 7, 2018 • edited Loading

Choose a reason for hiding this comment

tmichela Dec 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

takluyver commented Dec 6, 2018

tmichela commented Dec 6, 2018

takluyver commented Dec 6, 2018

tmichela commented Mar 1, 2019

takluyver commented Mar 1, 2019

takluyver commented Mar 1, 2019

takluyver commented Mar 1, 2019

tmichela commented Mar 1, 2019

codecov-io commented Nov 29, 2018 •

edited

Loading

tmichela Dec 7, 2018 •

edited

Loading

tmichela Dec 7, 2018 •

edited

Loading