Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

"Too many open files" messages #411

Closed
bboreham opened this issue Feb 23, 2015 · 2 comments
Closed

"Too many open files" messages #411

bboreham opened this issue Feb 23, 2015 · 2 comments

Comments

@bboreham
Copy link
Contributor

A user writes:

We are getting weave routers stuck in a loop printing "weave 2015/02/23 06:51:31.081132 accept tcp4 0.0.0.0:6783: too many open files” in the logs. It’s filling up the disk space because it never stops logging that line again and again.

In this case, it’s using all 1024 available FDs:

    $ lsof -a -p 9821

    COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
    weaver  9821 root  cwd    DIR   0,40     4096     12 /home/weave
    weaver  9821 root  rtd    DIR   0,40     4096      2 /
    weaver  9821 root  txt    REG   0,40 11348166     20 /home/weave/weaver
    weaver  9821 root  mem    REG    0,7           18729 socket:[18729] (stat: No such file or directory)
    weaver  9821 root  mem    REG    0,7           18727 socket:[18727] (stat: No such file or directory)
    weaver  9821 root    0r   CHR    1,3      0t0  17540 /dev/null
    weaver  9821 root    1w  FIFO    0,8      0t0  17291 pipe
    weaver  9821 root    2w  FIFO    0,8      0t0  17292 pipe
    weaver  9821 root    3r   CHR    1,9      0t0  17671 /dev/urandom
    weaver  9821 root    4u  sock    0,7      0t0  18727 can't identify protocol
    weaver  9821 root    5u  sock    0,7      0t0  18729 can't identify protocol
    weaver  9821 root    6u  sock    0,7      0t0  18730 can't identify protocol
    weaver  9821 root    7u  0000    0,9        0   5258 anon_inode
    weaver  9821 root    8u  sock    0,7      0t0  18731 can't identify protocol
    weaver  9821 root    9u  sock    0,7      0t0  18733 can't identify protocol
    weaver  9821 root   10u  sock    0,7      0t0  20147 can't identify protocol
    [...]
    weaver  9821 root   50u  sock    0,7      0t0 901856 can't identify protocol
    weaver  9821 root   51u  sock    0,7      0t0 901884 can't identify protocol
    weaver  9821 root   52u  sock    0,7      0t0 902019 can't identify protocol
    weaver  9821 root   53u  sock    0,7      0t0 902049 can't identify protocol
    weaver  9821 root   54u  sock    0,7      0t0 902100 can't identify protocol
    […]
@bboreham
Copy link
Contributor Author

Logs from the weave process:

{"log":"weave 2015/02/22 21:14:50.606176 ->[169.54.194.116:34228] connection accepted\
","stream":"stderr","time":"2015-02-22T21:14:50.606841817Z"}
{"log":"weave 2015/02/22 21:15:12.244215 ->[54.171.215.239:52754] connection accepted\
","stream":"stderr","time":"2015-02-22T21:15:12.244847838Z"}
{"log":"weave 2015/02/22 21:15:20.610539 ->[169.54.194.116:34234] connection accepted\
","stream":"stderr","time":"2015-02-22T21:15:20.611197317Z"}
{"log":"weave 2015/02/22 21:15:42.489248 ->[54.171.215.239:52760] connection accepted\
","stream":"stderr","time":"2015-02-22T21:15:42.489899216Z"}
{"log":"weave 2015/02/22 21:15:50.613010 ->[169.54.194.116:34238] connection accepted\
","stream":"stderr","time":"2015-02-22T21:15:50.613633129Z"}

Basically, every 30 seconds it is accepting a new TCP connection from two different peers, then not doing anything else.

Because the disk filled up with excess error messages, we are unable to get a stack-trace dump to see what is going on.

Inspection of the code that follows this "connection accepted" message suggests that it is hanging in handshake(), and this could possibly be because the Peers data structure is locked. Another theory is that the initial handshake data send has blocked. Attempts to force this latter condition in test have failed.

@rade
Copy link
Member

rade commented Mar 6, 2015

I rather do wonder whether the deadlock issues we fixed recently, referenced above, have resolved this. How can we convince ourselves of that? Or failing that, what else can we do here?

@rade rade closed this as completed Apr 11, 2015
@rade rade added this to the 0.10.0 milestone Apr 18, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants