Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider high water mark for sending messages #161

Closed
mschubert opened this issue Jul 11, 2019 · 3 comments
Closed

Consider high water mark for sending messages #161

mschubert opened this issue Jul 11, 2019 · 3 comments

Comments

@mschubert
Copy link
Owner

mschubert commented Jul 11, 2019

From ropensci/drake#933 (comment):

The problem is that all of these transfers are being started concurrently. If Qsys$private$send() blocked execution until the transfer was complete, then the memory requirements would be limited to the dependencies for one target. However, the current situation is that Qsys$private$send() returns as soon as the transfer is scheduled (because rzmq::send.socket() returns as soon as the transfer is scheduled), and so drake schedules the next target, which means filling the buffer for the next transfer before the first has finished. The result is that dependencies for many targets (whether they are the same data or not) are buffered on the master at the same time.

This could be addressed using ZMQ_HWM.

Alternative is using blocked sending: ropensci/drake#933 (comment)

It looks like pbdZMQ uses blocking connections by default. This is also the behavior of the rzmq-compatibility wrapper function, so blocking will come along for the ride by default if clustermq switches to using pbdZMQ.

@wlandau
Copy link
Contributor

wlandau commented Jul 12, 2019

Do you plan to expose the high water mark? Some workflows are not memory intensive and could still benefit from the existing non-blocking behavior.

@mschubert
Copy link
Owner Author

I wasn't planning on to, but will have to investigate how ZeroMQ treats this exactly with its IO threads.

Generally, I wouldn't expect this to be an issue because we're blocking on receiving anyway, and the workers will be saturated in either case.

If there's a problem with that approach that needs user/package intervention I will expose it, otherwise not.

@mschubert
Copy link
Owner Author

@brendanf This may be already fixed in the v0.9 branch, please test if you have the time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants