Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket monitor on TCP sockets uses internal Pair socket from multiple threads #2158

Closed
jens-auer opened this issue Oct 11, 2016 · 10 comments
Closed

Comments

@jens-auer
Copy link
Contributor

When running a zmq_socket_monitor on TCP connections events are sent from multiple IO threads concurrently on the ZMQ_PAIR socket. This violates the single-thread usage assumption of ZeroMQ sockets and can crash the application (or worse continue with corrupted monitor data or corrupted socket states).

Each stream_engine instance calls socket_base_t::monitor_event to send monitor events on the ZMQ_PAIR socket. This happens in the thread context of the stream_engine and there is no protection around the Pair socket, so it can be accessed from multiple threads.

@somdoron
Copy link
Member

Actually, by default zeromq has one IO Thread.

The problem is that monitor events comes from Reaper thread (thread responsible for killing threads), IO thread and user thread.

@jens-auer Do you want to make a PR that create an option to use radio-dish instead of pair?

@jens-auer
Copy link
Contributor Author

@somdoron I probably have time next week to do a PR, but not this week.

I would also like to make zmq_socket_monitor incompatible in this case such that old code would not compile anymore as it would pass the wrong socket type. I would propose to change it to remove the events parameter and use the msg_group instead as it is used by the radio-dish. This would change signature and thus break compilation.

@jens-auer
Copy link
Contributor Author

@somdoron I also have a patch for 4.1.5 which adds a mutex around the monitor socket. Do you want a PR for this?

@somdoron
Copy link
Member

Send the PR for master, we can backport it later.

On Oct 11, 2016 15:20, "Jens Auer" notifications@github.com wrote:

@somdoron https://github.com/somdoron I also have a patch for 4.1.5
which adds a mutex around the monitor socket. Do you want a PR for this?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2158 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AClv9n1x4PkcozEuiRMsrTM_WbTXUcmaks5qy38CgaJpZM4KTXRI
.

@bluca
Copy link
Member

bluca commented Oct 11, 2016

Please don't break existing API. We have a process for this, and first of all it must be declared as deprecated, and then much later we can remove it.

Instead of changing the existing function, add a new one with a different name. This way users have time to migrate their code.

Also for the mutex, is that around any socket operation? I'm concerned about the performance implications if that's the case

@somdoron
Copy link
Member

@bluca The performance penalty is only when using monitor. It shouldnt be big one as monitored events do not happen alot. Anyway I prefer to use new socket type option, however we can only use it for 4.2

@jens-auer
Copy link
Contributor Author

@bluca I'm ok with a new function. Any idea for a name? As Doron said the mutex is only around access to the monitor socket and the integer storing the monitored event mask.

An alternative design for 4.1 could use PUB/SUB sockets. Each engine creates a PUB socket which sends events to a SUB. The SUB could be in the socket_base and then it can forward the events to the PAIR which would be fully compatible with the existing API. It is also possible to change this more aggressively and let the SUB socket be created by the application code and just pass the address, similar to the existing mechanism with the pair. This would work on 4.1 and 4.2, but then use PUB/SUB on 4.2 which if I understand correctly should be replaced with Radio/Dish in the long run.

@bluca
Copy link
Member

bluca commented Oct 11, 2016

Ah if the mutex is only for the monitor events then no worries :-)

For the function name, perhaps "zmq_socket_monitor_group" ?

For backporting, we are preparing to release 4.2, so if it becomes too complicated I would say just don't bother with 4.1 and 4.0. If someone faces the issue there they can upgrade. Given we are not breaking API (and ABI still not certain) it will be at most a rebuild.
If on the other hand it's simple they by all means let's backport it.

@jens-auer
Copy link
Contributor Author

@bluca I have a PR (#2159) for 4.2, and I can submit one for 4.1 if wanted because I am going to patch 4.1 locally. Just take a look.

Changing the socket type to Radio/Dish or Pub/Sub will make this change obsolete. I probably have time to do it next week and it shouldn't be a big change. I am a bit unsure which socket to use. Radio/Dish seems to be the long-term solution, but Pub/Sub can be backported to 4.1. I think I would only implement this for 4.2 since 4.1 is fixed with the mutex and the new socket type only adds new behavior which can replace the old implementation. And then Radio/Dish because it is the long-term solution?

bluca added a commit that referenced this issue Oct 11, 2016
#2158: Add mutex for monitor socket
bluca pushed a commit that referenced this issue Oct 12, 2016
* - Fixed windows build errors
- Extended monitor lock scope to prevent race-condition between
  process_stop and monitor

* - Fixed windows build errors
- Extended monitor lock scope to prevent race-condition between
  process_stop and monitor
@sigiesec
Copy link
Member

This was fixed by #2159, so I am closing this.

If there are still issues, feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants