Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastRouter segfaults #146

Closed
prymitive opened this issue Feb 3, 2013 · 43 comments
Closed

FastRouter segfaults #146

prymitive opened this issue Feb 3, 2013 · 43 comments

Comments

@prymitive
Copy link
Contributor

Current version from git master branch.

Feb  3 23:24:29 b99 uwsgi: *** glibc detected *** uwsgi: free(): invalid next size (normal): 0x0000000000ea6e50 ***
Feb  3 23:24:29 b99 uwsgi: ======= Backtrace: =========
Feb  3 23:24:29 b99 uwsgi: /lib/libc.so.6(+0x78bb6)[0x7f2e27c81bb6]
Feb  3 23:24:29 b99 uwsgi: /lib/libc.so.6(cfree+0x73)[0x7f2e27c88483]
Feb  3 23:24:29 b99 uwsgi: uwsgi(uwsgi_buffer_destroy+0x11)[0x438b71]
Feb  3 23:24:29 b99 uwsgi: uwsgi(uwsgi_cr_peer_del+0x54)[0x45a064]
Feb  3 23:24:29 b99 uwsgi: uwsgi(corerouter_close_peer+0x119)[0x45a359]
Feb  3 23:24:29 b99 uwsgi: uwsgi(uwsgi_corerouter_loop+0x4b5)[0x45aaf5]
Feb  3 23:24:29 b99 uwsgi: uwsgi(gateway_respawn+0x214)[0x443204]
Feb  3 23:24:29 b99 uwsgi: uwsgi(master_loop+0x3dd)[0x4273dd]
Feb  3 23:24:29 b99 uwsgi: uwsgi(uwsgi_start+0x139e)[0x44754e]
Feb  3 23:24:29 b99 uwsgi: uwsgi(main+0xf90)[0x44aa30]
Feb  3 23:24:29 b99 uwsgi: /lib/libc.so.6(__libc_start_main+0xfd)[0x7f2e27c27c4d]
Feb  3 23:24:29 b99 uwsgi: uwsgi[0x416819]
@prymitive
Copy link
Contributor Author

I've recompiled whole uWSGI to make sure it's not build issue. I'll report if it still occurs.

@prymitive
Copy link
Contributor Author

After clean install still happens, 100% sure it's not a build issue:

Feb  3 23:42:35 b99 uwsgi: *** glibc detected *** uwsgi: free(): invalid next size (normal): 0x000000000262b1a0 ***
Feb  3 23:42:35 b99 uwsgi: ======= Backtrace: =========
Feb  3 23:42:35 b99 uwsgi: /lib/libc.so.6(+0x78bb6)[0x7f15ee273bb6]
Feb  3 23:42:35 b99 uwsgi: /lib/libc.so.6(cfree+0x73)[0x7f15ee27a483]
Feb  3 23:42:35 b99 uwsgi: uwsgi(uwsgi_buffer_destroy+0x11)[0x438b71]
Feb  3 23:42:35 b99 uwsgi: uwsgi(uwsgi_cr_peer_del+0x54)[0x45a064]
Feb  3 23:42:35 b99 uwsgi: uwsgi(corerouter_close_peer+0x119)[0x45a359]
Feb  3 23:42:35 b99 uwsgi: uwsgi(uwsgi_corerouter_loop+0x4b5)[0x45aaf5]
Feb  3 23:42:35 b99 uwsgi: uwsgi(gateway_respawn+0x214)[0x443204]
Feb  3 23:42:35 b99 uwsgi: uwsgi(master_loop+0x3dd)[0x4273dd]
Feb  3 23:42:35 b99 uwsgi: uwsgi(uwsgi_start+0x139e)[0x44754e]
Feb  3 23:42:35 b99 uwsgi: uwsgi(main+0xf90)[0x44aa30]
Feb  3 23:42:35 b99 uwsgi: /lib/libc.so.6(__libc_start_main+0xfd)[0x7f15ee219c4d]
Feb  3 23:42:35 b99 uwsgi: uwsgi[0x416819]

@unbit
Copy link
Owner

unbit commented Feb 4, 2013

do you have a minimal command line to reproduce it ? Thanks

@prymitive
Copy link
Contributor Author

This is the config I'm using:

[uwsgi]
logger = syslog
threaded-logger = true
fastrouter = /var/run/uwsgi.socket
chmod-socket = 666
fastrouter-subscription-server = :2626
fastrouter-stats = :2580
fastrouter-processes = 8
subscription-tolerance = 60
master = true
listen = 1000
fastrouter-events = 256
fastrouter-timeout = 600
subscription-dotsplit = true
subscription-algo = lrc

I'm also trying to reproduce it on my dev cluster but so far I can't trigger it with synthetic benchmarks, maybe it happens when client disconnects in the middle or something.

@unbit
Copy link
Owner

unbit commented Feb 4, 2013

i will investigate on that tomorrow, sorry today was a busy day (i skipped 4 releases :P)

@xrmx
Copy link
Collaborator

xrmx commented Feb 22, 2013

There are two uwsgi_buffer_destroy() in uwsgi_cr_del_peer, since it looks like out_need_free is never set it should be peer->in. Or we are corrupting memory. Any chance you can compile with debug = True and run it under valgrind --tool=memcheck ?

@prymitive
Copy link
Contributor Author

I would feel bad doing this on my production routers, but I couldn't trigger it on my dev cluster, so I fear it's the only way. I'll see what I can do.

@xrmx
Copy link
Collaborator

xrmx commented Feb 22, 2013

I'd try with debug + valgrind on your dev cluster, maybe we get more luck :) Otherwise you can wait for Roberto since it is way more clueful than me on this issue.

@prymitive
Copy link
Contributor Author

So far only those two errors are repeating, I'm waiting for segfault:

==19834== Invalid read of size 2
==19834==    at 0x45D312: fr_recv_uwsgi_vars (fastrouter.c:183)
==19834==    by 0x45CD51: uwsgi_corerouter_loop (corerouter.c:855)
==19834==    by 0x447F75: gateway_respawn (gateway.c:78)
==19834==    by 0x4276DA: master_loop (master.c:450)
==19834==    by 0x44C4B8: uwsgi_start (uwsgi.c:2643)
==19834==    by 0x44F394: main (uwsgi.c:1981)
==19834==  Address 0x9470611 is 1 bytes inside a block of size 4,096 free'd
==19834==    at 0x4C275A2: realloc (vg_replace_malloc.c:525)
==19834==    by 0x43BFC2: uwsgi_buffer_fix (buffer.c:20)
==19834==    by 0x45D2D4: fr_recv_uwsgi_vars (fastrouter.c:177)
==19834==    by 0x45CD51: uwsgi_corerouter_loop (corerouter.c:855)
==19834==    by 0x447F75: gateway_respawn (gateway.c:78)
==19834==    by 0x4276DA: master_loop (master.c:450)
==19834==    by 0x44C4B8: uwsgi_start (uwsgi.c:2643)
==19834==    by 0x44F394: main (uwsgi.c:1981)

@prymitive
Copy link
Contributor Author

I'm running it on my less busy production cluster, so I might need to move it to more loaded one (or just wait a while)

@xrmx
Copy link
Collaborator

xrmx commented Feb 22, 2013

Il 22/02/2013 09:49, Łukasz Mierzwa ha scritto:

So far only those two errors are repeating, I'm waiting for segfault:

==19834== Invalid read of size 2
==19834==    at 0x45D312: fr_recv_uwsgi_vars (fastrouter.c:183)
==19834==    by 0x45CD51: uwsgi_corerouter_loop (corerouter.c:855)
==19834==    by 0x447F75: gateway_respawn (gateway.c:78)
==19834==    by 0x4276DA: master_loop (master.c:450)
==19834==    by 0x44C4B8: uwsgi_start (uwsgi.c:2643)
==19834==    by 0x44F394: main (uwsgi.c:1981)
==19834==  Address 0x9470611 is 1 bytes inside a block of size 4,096 free'd
==19834==    at 0x4C275A2: realloc (vg_replace_malloc.c:525)
==19834==    by 0x43BFC2: uwsgi_buffer_fix (buffer.c:20)
==19834==    by 0x45D2D4: fr_recv_uwsgi_vars (fastrouter.c:177)
==19834==    by 0x45CD51: uwsgi_corerouter_loop (corerouter.c:855)
==19834==    by 0x447F75: gateway_respawn (gateway.c:78)
==19834==    by 0x4276DA: master_loop (master.c:450)
==19834==    by 0x44C4B8: uwsgi_start (uwsgi.c:2643)
==19834==    by 0x44F394: main (uwsgi.c:1981)

This is interesting because confirms there's something around
main_peer->in going bad.

@prymitive
Copy link
Contributor Author

So far no segfault, only those errors above are showing from time to time. I'll leave it running, maybe I need more traffic (it's 10 AM so not much is going on right now). If I can't reproduce it I will disable debug and retry. If I still can't get any segfault than it might have been fixed already (or I'm out of luck).
If Roberto merges my --alert-segfault I will get an email once it anything segfaults in the future, so it's not gonna take me by surprise.

@prymitive
Copy link
Contributor Author

Side note:

[uwsgi-fastrouter client_addr: 0.0.0.0 client_port: 0] fr_write(): Broken pipe [plugins/fastrouter/fastrouter.c line 111]

"0.0.0.0" doesn't seems valid, it looks like it's currently hard-coded to this value. Same story with port.

@prymitive
Copy link
Contributor Author

So far no issues so it might be gone

@prymitive
Copy link
Contributor Author

No issues after 6 hours, I need to revert to 1.4.5. I don't want to leave it running unattended for whole weekend.

@prymitive
Copy link
Contributor Author

valgrind log, in case it's useful:

https://gist.github.com/prymitive/92d3bdc4f472b11bd8e6

@prymitive
Copy link
Contributor Author

doesn't happen with latest 1.9, closing

@prymitive
Copy link
Contributor Author

Apr 16 14:29:00 a115 uwsgi: *** glibc detected *** uwsgi: free(): invalid next size (normal): 0x0000000000c4c3f0 ***
Apr 16 14:29:00 a115 uwsgi: ======= Backtrace: =========
Apr 16 14:29:00 a115 uwsgi: /lib/libc.so.6(+0x78bb6)[0x7f7e1e079bb6]
Apr 16 14:29:00 a115 uwsgi: /lib/libc.so.6(cfree+0x73)[0x7f7e1e080483]
Apr 16 14:29:00 a115 uwsgi: uwsgi(uwsgi_buffer_destroy+0x11)[0x43c9b1]
Apr 16 14:29:00 a115 uwsgi: uwsgi(uwsgi_cr_peer_del+0x54)[0x4602f4]
Apr 16 14:29:00 a115 uwsgi: uwsgi(corerouter_close_peer+0x119)[0x460739]
Apr 16 14:29:00 a115 uwsgi: uwsgi(uwsgi_corerouter_loop+0x4b5)[0x460ed5]
Apr 16 14:29:00 a115 uwsgi: uwsgi(gateway_respawn+0x212)[0x44a162]
Apr 16 14:29:00 a115 uwsgi: uwsgi(master_loop+0x33d)[0x4276ed]
Apr 16 14:29:00 a115 uwsgi: uwsgi(uwsgi_start+0x14e7)[0x44eae7]
Apr 16 14:29:00 a115 uwsgi: uwsgi(main+0x1026)[0x451a76]
Apr 16 14:29:00 a115 uwsgi: /lib/libc.so.6(__libc_start_main+0xfd)[0x7f7e1e01fc4d]
Apr 16 14:29:00 a115 uwsgi: uwsgi[0x417099]

Issue is still present, reopening.

@unbit
Copy link
Owner

unbit commented Apr 19, 2013

can you confirm the backtrace is always the same ?

@prymitive
Copy link
Contributor Author

few examples

uwsgi_master_check_gateways_death() does not always shows in the backtrace, but beside that I don't see any differences. It always ends with:

uwsgi(uwsgi_buffer_destroy+0x11)[0x43d391]
uwsgi(uwsgi_cr_peer_del+0x54)[0x45d474]
uwsgi(corerouter_close_peer+0x119)[0x45db69]

@prymitive
Copy link
Contributor Author

I've got another segfault in FastRouter:

Apr 24 20:34:51 a226 uwsgi: !!! uWSGI process 6581 got Segmentation Fault !!!
Apr 24 20:34:51 a226 uwsgi: *** backtrace of 6581 ***
Apr 24 20:34:51 a226 uwsgi: uwsgi(uwsgi_backtrace+0x25) [0x44ec15]
Apr 24 20:34:51 a226 uwsgi: uwsgi(uwsgi_segfault+0x21) [0x44ecf1]
Apr 24 20:34:51 a226 uwsgi: /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f64efc9d4a0]
Apr 24 20:34:51 a226 uwsgi: uwsgi(uwsgi_hooked_parse+0x52) [0x41fd92]
Apr 24 20:34:51 a226 uwsgi: uwsgi() [0x461914]
Apr 24 20:34:51 a226 uwsgi: uwsgi(uwsgi_corerouter_loop+0x52e) [0x4612ae]
Apr 24 20:34:51 a226 uwsgi: uwsgi(gateway_respawn+0x1cb) [0x44a9db]
Apr 24 20:34:51 a226 uwsgi: uwsgi(master_loop+0x337) [0x427f97]
Apr 24 20:34:51 a226 uwsgi: uwsgi(uwsgi_start+0x130c) [0x450c0c]
Apr 24 20:34:51 a226 uwsgi: uwsgi(main+0x15b3) [0x416b23]
Apr 24 20:34:51 a226 uwsgi: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f64efc8876d]
Apr 24 20:34:51 a226 uwsgi: uwsgi() [0x416ba1]
Apr 24 20:34:51 a226 uwsgi: *** end of backtrace ***

This might be related to malloc() issues observed before (?)

@prymitive
Copy link
Contributor Author

Another crash (spotted thanks to airbrake), this time different but might be related:

May 14 15:57:59 a215 uwsgi: !!! uWSGI process 8743 got Segmentation Fault !!!
May 14 15:57:59 a215 uwsgi: *** backtrace of 8743 ***
uWSGI fastrouter 2(uwsgi_backtrace+0x29) [0x44fc79]
uWSGI fastrouter 2(uwsgi_segfault+0x21) [0x44fdf1]
/lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f38c57ef4a0]
/lib/x86_64-linux-gnu/libc.so.6(realloc+0xb5) [0x7f38c583c715]
uWSGI fastrouter 2(uwsgi_buffer_fix+0x5b) [0x43d38b]
uWSGI fastrouter 2() [0x4632ec]
uWSGI fastrouter 2(uwsgi_corerouter_loop+0x52e) [0x462d4e]
uWSGI fastrouter 2(gateway_respawn+0x1cb) [0x44b82b]
uWSGI fastrouter 2(master_loop+0x337) [0x4285f7]
uWSGI fastrouter 2(uwsgi_start+0x10d6) [0x451e46]
uWSGI fastrouter 2(main+0x15b3) [0x416ff3]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f38c57da76d]
uWSGI fastrouter 2() [0x417071]
*** end of backtrace ***
May 14 15:58:00 a215 uwsgi: respawned uWSGI fastrouter 2 (pid: 30620)

@prymitive
Copy link
Contributor Author

After looking at the code it seems that it's more and more likely what @xrmx already suggested: peer->in is somehow invalid, maybe it get's NULLed in this case?

@unbit
Copy link
Owner

unbit commented May 14, 2013

I have just changed the retry system to be usable in a thinnier window: the peer->can_retry is set before backend connection and set to 0 after connection. It looks like the connections hapens when the buffer is not initialized if an error came during first fastrouter bytes.

@prymitive
Copy link
Contributor Author

I'll push patched FastRouters to my production nodes tomorrow to verify if it helps

@prymitive
Copy link
Contributor Author

Another crash in FastRouter, I had another hang (#239) during high traffic, restarted FastRouter and soon after it was restarted I had this segfault:

May 17 21:43:08 a226 uwsgi: !!! uWSGI process 20515 got Segmentation Fault !!!
May 17 21:43:08 a226 uwsgi: *** backtrace of 20515 ***
uWSGI fastrouter 4(uwsgi_backtrace+0x29) [0x44fc79]
uWSGI fastrouter 4(uwsgi_segfault+0x21) [0x44fdf1]
/lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7fd0eae474a0]
uWSGI fastrouter 4(uwsgi_get_subscribe_slot+0x65) [0x430c75]
uWSGI fastrouter 4(uwsgi_add_subscribe_node+0x20) [0x431450]
uWSGI fastrouter 4(uwsgi_corerouter_manage_internal_subscription+0x89) [0x460649]
uWSGI fastrouter 4(uwsgi_corerouter_loop+0x572) [0x462d82]
uWSGI fastrouter 4(gateway_respawn+0x1cb) [0x44b82b]
uWSGI fastrouter 4(master_loop+0x337) [0x4285f7]
uWSGI fastrouter 4(uwsgi_start+0x10d6) [0x451e46]
uWSGI fastrouter 4(main+0x15b3) [0x416ff3]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fd0eae3276d]
uWSGI fastrouter 4() [0x417071]
*** end of backtrace ***
May 17 21:43:08 a226 uwsgi: respawned uWSGI fastrouter 4 (pid: 20569)

Is it safe to use low fastrouter-timeout (like 15 seconds)? Won't it affect client connections to FastRouter?

@unbit
Copy link
Owner

unbit commented May 18, 2013

is the frequency of crashes reduced after my patch ? Is it possibile we have found another bug while the previous one has been solved ?

@prymitive
Copy link
Contributor Author

I believe so, previous crash did not occur since last patch and clearly this one is in different place.
We are progressing.

@prymitive
Copy link
Contributor Author

Another segfault:

May 24 22:00:26 a215 uwsgi: !!! uWSGI process 31317 got Segmentation Fault !!!
May 24 22:00:26 a215 uwsgi: *** backtrace of 31317 ***
uWSGI fastrouter 1(uwsgi_backtrace+0x29) [0x44fc79]
uWSGI fastrouter 1(uwsgi_segfault+0x21) [0x44fdf1]
/lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f38fbcf14a0]
uWSGI fastrouter 1(uwsgi_hooked_parse+0x30) [0x420310]
uWSGI fastrouter 1() [0x4633a4]
uWSGI fastrouter 1(uwsgi_corerouter_loop+0x52e) [0x462d3e]
uWSGI fastrouter 1(gateway_respawn+0x1cb) [0x44b82b]
uWSGI fastrouter 1(master_loop+0x337) [0x4285f7]
uWSGI fastrouter 1(uwsgi_start+0x10d6) [0x451e46]
uWSGI fastrouter 1(main+0x15b3) [0x416ff3]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f38fbcdc76d]
uWSGI fastrouter 1() [0x417071]
*** end of backtrace ***
May 24 22:00:27 a215 uwsgi: respawned uWSGI fastrouter 1 (pid: 24915)

uwsgi_hooked_parse() are called during subscription packet parsing, so maybe realted to #239 (?)

@unbit
Copy link
Owner

unbit commented May 25, 2013

i suspect 146 and 239 are the same problem. What about

https://github.com/unbit/uwsgi/blob/master/plugins/corerouter/cr_common.c#L120

https://github.com/unbit/uwsgi/blob/master/plugins/corerouter/cr_common.c#L193

uwsgi_hooked_parse is not checked for error (it could return -1 if the packet is malformed)

Maybe there is the possibility of a corrupted uwsgi packet, can you retry adding an uwsgi_log if uwsgi_hooked_parse returns -1

@prymitive
Copy link
Contributor Author

I've added uwsgi_error in case of <0 return value, once it hangs again I'll check for such errors

@unbit
Copy link
Owner

unbit commented Jun 2, 2013

some news on this problem ? I was still not able to reproduce it :(

@prymitive
Copy link
Contributor Author

I was (un)lucky and this issue didn't occur since my last post here, still waiting for more data

@prymitive
Copy link
Contributor Author

Turns out I had those issues, both segfaults with *** glibc detected *** uwsgi: free(): invalid next size and hang, airbrake doesn't report that.

But no trace of those uwsgi_error() I've added to failing uwsgi_hooked_parse() [1], so it's something else.

1:

diff --git a/plugins/corerouter/cr_common.c b/plugins/corerouter/cr_common.c
index 0971695..31fd984 100644
--- a/plugins/corerouter/cr_common.c
+++ b/plugins/corerouter/cr_common.c
@@ -117,7 +117,9 @@ void uwsgi_corerouter_manage_subscription(struct uwsgi_corerouter *ucr, int id,
        ssize_t len = recv(ugs->fd, bbuf, 4096, 0);
        if (len > 0) {
                memset(&usr, 0, sizeof(struct uwsgi_subscribe_req));
-               uwsgi_hooked_parse(bbuf + 4, len - 4, corerouter_manage_subscription, &usr);
+               if (uwsgi_hooked_parse(bbuf + 4, len - 4, corerouter_manage_subscription, &usr) < 0) {
+                       uwsgi_error("uwsgi_corerouter_manage_subscription()/uwsgi_hooked_parse()");
+               }
                if (usr.sign_len > 0) {
                        // calc the base size
                        usr.base = bbuf + 4;
@@ -190,7 +192,9 @@ void uwsgi_corerouter_manage_internal_subscription(struct uwsgi_corerouter *ucr,
        ssize_t len = recv(fd, bbuf, 4096, 0);                                                                                                                                                                                                                                 
        if (len > 0) {                                                                                                                                                                                                                                                         
                memset(&usr, 0, sizeof(struct uwsgi_subscribe_req));                                                                                                                                                                                                           
-               uwsgi_hooked_parse(bbuf + 4, len - 4, corerouter_manage_subscription, &usr);                                                                                                                                                                                   
+               if (uwsgi_hooked_parse(bbuf + 4, len - 4, corerouter_manage_subscription, &usr) < 0) {                                                                                                                                                                         
+                       uwsgi_error("uwsgi_corerouter_manage_internal_subscription()/uwsgi_hooked_parse()");                                                                                                                                                                   
+               }                                                                                                                                                                                                                                                              

                // subscribe request ?                                                                                                                                                                                                                                         
                if (bbuf[3] == 0) {  

@prymitive
Copy link
Contributor Author

This time I've got some new errors:

Jun  8 23:18:57 a215 kernel: [4364304.295484] uwsgi[32667] trap stack segment ip:432007 sp:7fff5d7d94f0 error:0

Jun  8 23:29:01 a215 uwsgi: *** glibc detected *** uWSGI fastrouter 1: corrupted double-linked list: 0x00000000022b21d0 ***

but they probably orbit around the same core issue, hang from #239 occurs between hangs, so both issues are certainly connected.

Can you think of any more debug logs I could add?
The only pattern I see is that it's happening on my 2 fastrouter processes node, other node (4 or 8 processes) are having this issue much more rarely (requests are distributed in round robin fashion between them).

@unbit
Copy link
Owner

unbit commented Jun 9, 2013

I think the best approach at this point would be generating a coredump file you can inspect with gdb.

Feel free to send it to me if you want

@unbit
Copy link
Owner

unbit commented Jun 9, 2013

...oh and remember to add -g to the CFLAGS when you build, like

CFLAGS=-g make

this avoid fully enabling UWSGI_DEBUG loglines

@prymitive
Copy link
Contributor Author

I've set up everything so I should now get core dumps. I'll get back once there is some more info

@unbit
Copy link
Owner

unbit commented Jun 10, 2013

if it could be useful i have added --use-abort, i have noted on some system setting SIG_DFL to SEGV does not restore coredump generation, while abort() reliably works (at least on linux)

@prymitive
Copy link
Contributor Author

I didn't had any crash since I've pushed -O0 binaries, but I still had:

Jun 13 22:59:33 a226 uwsgi: *** HARAKIRI ON GATEWAY uWSGI fastrouter 1 (pid: 2255) ***
Jun 13 22:59:34 a226 uwsgi: respawned uWSGI fastrouter 1 (pid: 3632)
Jun 13 22:59:34 a226 uwsgi: *** fastrouter stats server enabled on :2580 fd: 23 ***

No other errors recorded, no malloc error, it just gets respawned (I do have fastrouter-timeout = 30).
Once it crashed I will send dump and binary.

@unbit
Copy link
Owner

unbit commented Jun 14, 2013

ok, thanks

@prymitive
Copy link
Contributor Author

I didn't had any malloc() error since I've recompiled with -O0, could this somehow be triggered by @unbit vs gcc optimizations trying to outsmart each other? Seems unlikely but I always had corrupted memory when hang occurred, but now I only get the hang.
But that would also mean that malloc issue and fastrouter hangs are two separate issues - unless HARAKIRI on my FastRouter is triggered by something else now - could slow client connection trigger it? What kind of requests/packets does fastrouter-timeout = 30 apply?
Or maybe memory still gets corrupted but now in a silent way.
Would I get core dump on FastRouter HARAKIRI with --use-abort?

@prymitive
Copy link
Contributor Author

AFAIR this was fixed in #415, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants