Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: In zmq::socket_base_t::send method with or without ZMQ_ROUTER_MANDATORY option set #2348

Closed
reza-ebrahimi opened this issue Feb 22, 2017 · 6 comments

Comments

@reza-ebrahimi
Copy link
Contributor

reza-ebrahimi commented Feb 22, 2017

This is the send method in socket_base_t class, and there is a bug here, in the case of ROUTER socket, we have two processes, one is ROUTER process and another is DEALER process, code below is the ROUTER process:

//  Send a message to an unknown peer with mandatory routing
//  This will fail
int mandatory = 1;
int rc = zmq_setsockopt (router, ZMQ_ROUTER_MANDATORY, &mandatory, sizeof (mandatory));
assert (rc == 0);

while (true) {
	//  Get message from dealer to know when connection is ready
	char buffer[255];
	rc = zmq_recv(router, buffer, 255, 0); // for identity
	rc = zmq_recv(router, buffer, 255, 0); // for another part of message

	Sleep(6000); // in this time gap we close DEALER process

	//  Send a message to disconnected dealer now
        //  and it will block here in zmq_send method
	rc = zmq_send(router, "X", 1, ZMQ_SNDMORE);
	assert(rc == 1);
	rc = zmq_send(router, "Hello", 5, 0);
	assert(rc == 5);
}

and there problem is in below send method, due to ZMQ_ROUTER_MANDATORY option, it might return EHOSTUNREACH or -1, but it goes to last while loop to process remained commands and blocks in select system call untill another DEALER process coming up!

int zmq::socket_base_t::send (msg_t *msg_, int flags_)
{

scoped_optional_lock_t sync_lock(thread_safe ? &sync : NULL);

//  Check whether the library haven't been shut down yet.
if (unlikely (ctx_terminated)) {
    errno = ETERM;
    return -1;
}

//  Check whether message passed to the function is valid.
if (unlikely (!msg_ || !msg_->check ())) {
    errno = EFAULT;
    return -1;
}

//  Process pending commands, if any.
int rc = process_commands (0, true);
if (unlikely (rc != 0)) {
    return -1;
}

//  Clear any user-visible flags that are set on the message.
msg_->reset_flags (msg_t::more);

//  At this point we impose the flags on the message.
if (flags_ & ZMQ_SNDMORE)
    msg_->set_flags (msg_t::more);

msg_->reset_metadata ();

//  Try to send the message using method in each socket class
rc = xsend (msg_);
if (rc == 0) {
    return 0;
}
if (unlikely (errno != EAGAIN)) {
    return -1;
}

//  In case of non-blocking send we'll simply propagate
//  the error - including EAGAIN - up the stack.
if (flags_ & ZMQ_DONTWAIT || options.sndtimeo == 0) {
    return -1;
}

//  Compute the time when the timeout should occur.
//  If the timeout is infinite, don't care.
int timeout = options.sndtimeo;
uint64_t end = timeout < 0 ? 0 : (clock.now_ms () + timeout);

//  Oops, we couldn't send the message. Wait for the next
//  command, process it and try to send the message again.
//  If timeout is reached in the meantime, return EAGAIN.
while (true) {
    if (unlikely (process_commands (timeout, false) != 0)) {
        return -1;
    }
    rc = xsend (msg_);
    if (rc == 0)
        break;
    if (unlikely (errno != EAGAIN)) {
        return -1;
    }
    if (timeout > 0) {
        timeout = (int) (end - clock.now_ms ());
        if (timeout <= 0) {
            errno = EAGAIN;
            return -1;
        }
    }
}

return 0;

}
  1. if ZMQ_ROUTER_MANDATORY option is set, it should return before reach to last while loop, is still processing command for a disconnected socket necessary? or if ZMQ_ROUTER_MANDATORY option is set?

  2. blocking in send method is unacceptable by any reason, in this scenario it should return actual error code with its string to the caller.

@reza-ebrahimi reza-ebrahimi changed the title Problem: In zmq::socket_base_t::send method with or without ZMQ_ROUTER_MANDATORY option Problem: In zmq::socket_base_t::send method with or without ZMQ_ROUTER_MANDATORY option set Feb 22, 2017
@somdoron
Copy link
Member

@reza-ebrahimi I don't think there is a bug, or at least not the one you describe. In case of EHOSTUNREACH the method will exit without going to the while loop. Only when EAGAIN is returned this while loop will be reached.

However there is an issue here, if the peer is not active but do exist the method will block.

Two options here, first just specify DONT_WAIT flag, it will never block.
The second option is to port a fix from NetMQ, as this bug was already fixed there.

You can find the commit here:
zeromq/netmq@30095a7

Do you want to port the fix and make a pull request?

@reza-ebrahimi
Copy link
Contributor Author

@somdoron Yes, only with EAGAIN will reach to while loop and blocks.

I don't understand it, why that could be an EAGAIN error, while DEALER process completely terminated? what does that means?

About DONT_WAIT option, I think that is suitable for recv method not send, a blocking send could cause problems in application specific logic,

I think select is in main thread that zeromq created on and it was wrong, those mechanism should move to IO thread or another appropriate thread, and communicate with socket object via signalers, this mechanism have some troubles.

I'll review your mentioned commit.

@somdoron
Copy link
Member

DONT_WAIT works on sending as well.

Regarding dealer being terminate, it can take some time for the dealer to be entirely deleted from the list of pipes, that is why it is blocking. To fix you can just use DONT_WAIT or port the fix

@hoditohod
Copy link
Contributor

I bumped into this behavior a few days ago too: https://lists.zeromq.org/pipermail/zeromq-dev/2017-February/031413.html
I ended up using DONTWAIT as @somdoron suggested which works fine in my case (check both EHOSTUNREACH and EAGAIN errno codes). The mentioned netmq commit is the same as the one I had in mind, I can send a PR later this week if that's ok.

@bluca
Copy link
Member

bluca commented Feb 22, 2017

Yes please, if you can send a PR it would be great

somdoron added a commit that referenced this issue Feb 24, 2017
@bluca
Copy link
Member

bluca commented Feb 24, 2017

Fixed by #2352 and #2353

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants