Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: sometimes the library get stuck on select(). #323

Closed
chanilino opened this issue Aug 25, 2017 · 5 comments
Closed

Bug: sometimes the library get stuck on select(). #323

chanilino opened this issue Aug 25, 2017 · 5 comments

Comments

@chanilino
Copy link

I was runing a ruby script like daemon, and it gets stuck. I attach to the proccess with gdb.

I have here results of my investigation:

$ backtrace
#0  00007f82cc8198e3 uin select () at ../sysdeps/unix/syscall-template.S:81                                                                                                                                    │··     
#1  0x00007f82cb1ae65b in curb_select (args=<optimized out>) at curb_multi.c:553                                                                                                                                │··
#2  0x00007f82cccbe602 in call_without_gvl (fail_if_interrupted=0, data2=<optimized out>, ubf=<optimized out>, data1=0x7fff96adb6c0, func=0x7f82cb1ae640 <curb_select>) at thread.c:1294                        │··
#3  rb_thread_call_without_gvl (func=func@entry=0x7f82cb1ae640 <curb_select>, data1=data1@entry=0x7fff96adb6c0, ubf=ubf@entry=0xffffffffffffffff, data2=data2@entry=0x0) at thread.c:1404                       │··
#4  0x00007f82cb1af43f in ruby_curl_multi_perform (argc=<optimized out>, argv=<optimized out>, self=67108600) at curb_multi.c:650                                                                               │··
#5  0x00007f82cccef411 in vm_call_cfunc_with_frame (ci=0x1b715a0, cc=<optimized out>, calling=<optimized out>, reg_cfp=0x7f82cd205c50, th=0x17d95d0) at vm_insnhelper.c:1752                                    │··
#6  vm_call_cfunc (th=0x17d95d0, reg_cfp=0x7f82cd205c50, calling=<optimized out>, ci=0x1b715a0, cc=<optimized out>) at vm_insnhelper.c:1847                                                                     │··
#7  0x00007f82cccf9409 in vm_exec_core (th=th@entry=0x17d95d0, initial=initial@entry=0) at insns.def:1066  

I see that was stuck in a select. The I try to get the parameters pass to the select:

$ f 4
3f in ruby_curl_multi_perform (argc=<optimized out>, argv=<optimized out>, self=67108600) at curb_multi.c:650                                                                               │··                  
650           rc = (int)(VALUE) rb_thread_call_without_gvl((void *(*)(void *))curb_select, &fdset_args, RUBY_UBF_IO, 0);                                                                                        │··
(gdb) p maxfd                                                                                                                                                                                                   │··
$13 = -1 

So I see that we are checking in select for the fd -1, that is wrong. I go to the code of curb and I see this:

https://github.com/taf2/curb/blob/master/ext/curb_multi.c#L630-L636

We check for return code, but not for maxfd value. If we go to libcurl doc:

https://curl.haxx.se/libcurl/c/curl_multi_fdset.html

We see this:

If no file descriptors are set by libcurl, max_fd will contain -1 when this function returns. Otherwise it will contain the highest descriptor number libcurl set. When libcurl returns -1 in max_fd, it is because libcurl currently does something that isn't possible for your application to monitor with a socket and unfortunately you can then not know exactly when the current action is completed using select(). You then need to wait a while before you proceed and call curl_multi_perform anyway. How long to wait? We suggest 100 milliseconds at least, but you may want to test it out in your own particular conditions to find a suitable value.

So if maxfd = -1 we need to wait some time and expect this function returns an valid file descriptor. A patch could be something like:

      /* load the fd sets from the multi handle */
     while (1){
      mcode = curl_multi_fdset(rbcm->handle, &fdread, &fdwrite, &fdexcep, &maxfd);
      if (mcode != CURLM_OK) {
        raise_curl_multi_error_exception(mcode);
    }
    if (maxfd != -1){
       break;
    }
    yield();
}

Another solution could be raise an exception like this:

      /* load the fd sets from the multi handle */
      mcode = curl_multi_fdset(rbcm->handle, &fdread, &fdwrite, &fdexcep, &maxfd);
      if (mcode != CURLM_OK) {
        raise_curl_multi_error_exception(mcode);
       }
      if (maxfd == -1) {
        raise_curl_multi_error_exception(whatever_code);
       }
@taf2
Copy link
Owner

taf2 commented Aug 25, 2017

Thanks this makes a lot of sense. I'll do some testing today

@taf2
Copy link
Owner

taf2 commented Aug 25, 2017

I think the solution here is maybe like this:

diff --git a/ext/curb_multi.c b/ext/curb_multi.c
index 6d6cb2f..a85a8f7 100644
--- a/ext/curb_multi.c
+++ b/ext/curb_multi.c
@@ -634,6 +634,15 @@ VALUE ruby_curl_multi_perform(int argc, VALUE *argv, VALUE self) {
         raise_curl_multi_error_exception(mcode);
       }

+      if (maxfd == -1) {
+        /* libcurl recommends sleeping for 100ms */
+        rb_thread_wait_for(rb_time_timeval(DBL2NUM(0.1)));
+        rb_curl_multi_run( self, rbcm->handle, &(rbcm->running) );
+        rb_curl_multi_read_info( self, rbcm->handle );
+        if (block != Qnil) { rb_funcall(block, rb_intern("call"), 1, self);  }
+        continue;
+      }
+
 #ifdef _WIN32
       create_crt_fd(&fdread, &crt_fdread);
       create_crt_fd(&fdwrite, &crt_fdwrite);

the above is how we should handle based on the description provided here:

If no file descriptors are set by libcurl, max_fd will contain -1 when this function returns. Otherwise it will contain the highest descriptor number libcurl set. When libcurl returns -1 in max_fd, it is because libcurl currently does something that isn't possible for your application to monitor with a socket and unfortunately you can then not know exactly when the current action is completed using select(). You then need to wait a while before you proceed and call curl_multi_perform anyway. How long to wait? We suggest 100 milliseconds at least, but you may want to test it out in your own particular conditions to find a suitable value.

@taf2 taf2 closed this as completed in b3d265b Aug 25, 2017
@taf2
Copy link
Owner

taf2 commented Aug 25, 2017

@chanilino, i've put b3d265b into master, if you are able to try again to let me know if this resolves your issue. If not we can reopen and keep trying

@chanilino
Copy link
Author

It seems to me that solves the issue. The monday I Will test It harder!! Thanks!!

@taf2
Copy link
Owner

taf2 commented Aug 26, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants