Skip to content
This repository has been archived by the owner on Oct 6, 2018. It is now read-only.

It congests the web traffic on the node after running for a while #30

Open
siyuanh opened this issue Aug 7, 2014 · 12 comments
Open

It congests the web traffic on the node after running for a while #30

siyuanh opened this issue Aug 7, 2014 · 12 comments
Labels

Comments

@siyuanh
Copy link

siyuanh commented Aug 7, 2014

After running this web-console for a while, the connection table is full of connections to the brokers.

@siyuanh
Copy link
Author

siyuanh commented Aug 7, 2014

What I understand is you don't have to connect to brokers for what you did in the UI for now. You can get everything from zookeeper, correct?

@unclebilly
Copy link

I have seen this as well - after running the application for several days, the output of netstat showed many hundreds of open connections to kafka brokers. Eventually the application ceased to function because it could no longer open any new sockets - the process max file limit was reached.

@cjmamo cjmamo added the bug label Aug 7, 2014
@cjmamo
Copy link
Owner

cjmamo commented Aug 9, 2014

The console regularly connects to brokers to retrieve partition log sizes so that you can view partition size over time. I've checked the code and the client is closed after retrieving the log size. This will require further investigation.

@cjmamo
Copy link
Owner

cjmamo commented Aug 9, 2014

@siyuanh & @unclebilly, as a temporary workaround, you can increase the Offset Fetch Interval in Settings to reduce the rate at which connections are created.

@PAStheLoD
Copy link

root@queue3:/opt/kafka-web-console# ps aux | grep kafka-web | grep -v grep | awk '{ print $2 }'
23907
root@queue3:/opt/kafka-web-console# lsof -n | grep kafka-console | grep IPv6 | wc -l
4601013
root@queue3:/opt/kafka-web-console# lsof -n | grep kafka-console | grep IPv6 | head
java      23907       kafka-console  142u     IPv6           38815657       0t0        TCP *:9000 (LISTEN)
java      23907       kafka-console  147u     IPv6           38927934       0t0        TCP 192.168.115.43:50926->192.168.115.41:9092 (ESTABLISHED)
java      23907       kafka-console  148u     IPv6           38949202       0t0        TCP 192.168.115.43:50319->192.168.115.41:9092 (ESTABLISHED)
java      23907       kafka-console  149u     IPv6           38829209       0t0        TCP 192.168.115.43:50320->192.168.115.41:9092 (ESTABLISHED)
....

Also have a stacktrace, if you are interested.

I'm using systemd to limit file descriptors to 65000, but interestingly the highest in the lsof output I've been able to found was 9999.

Oh, sorry for the edit after edit, but this is just lsof's royal stupidity, it uses an asterisk and 3 digits (*123) when it's above 9999.

java      23907       kafka-console 9996u     IPv6           38962963       0t0        TCP 192.168.115.43:59860->192.168.115.41:9092 (ESTABLISHED)
java      23907       kafka-console 9997u     IPv6           38962964       0t0        TCP 192.168.115.43:59861->192.168.115.41:9092 (ESTABLISHED)
java      23907       kafka-console 9998u     IPv6           38962965       0t0        TCP 192.168.115.43:38020->192.168.115.42:9092 (ESTABLISHED)
java      23907       kafka-console 9999u     IPv6           38962966       0t0        TCP 192.168.115.43:59863->192.168.115.41:9092 (ESTABLISHED)
java      23907       kafka-console *000u     IPv6           38965025       0t0        TCP 192.168.115.43:59864->192.168.115.41:9092 (ESTABLISHED)
java      23907       kafka-console *001u     IPv6           38965026       0t0        TCP 192.168.115.43:38023->192.168.115.42:9092 (ESTABLISHED)
java      23907       kafka-console *002u     IPv6           38965742       0t0        TCP 192.168.115.43:37670->192.168.115.43:9092 (ESTABLISHED)
java      23907       kafka-console *003u     IPv6           38965027       0t0        TCP 192.168.115.43:38025->192.168.115.42:9092 (ESTABLISHED)
java      23907       kafka-console *004u     IPv6           38965028       0t0        TCP 192.168.115.43:59868->192.168.115.41:9092 (ESTABLISHED)
java      23907       kafka-console *005u     IPv6           38966541       0t0        TCP 192.168.115.43:59869->192.168.115.41:9092 (ESTABLISHED)

@ibanner56
Copy link

I think I fixed the issue. I'm going to let it sit for a little while longer, but I'm not seeing any unbounded increase in the number of open files. If there aren't any additional problems, I'll push my changes to my fork tomorrow.

Update: When I run lsof -n | grep <PID> | wc -l the number isn't changing by more than ~10, and it goes back down after a bit. When I run netstat -an | grep ESTA | grep 9092 | wc -l the number continues to rise...

Update 2: Ah, wait, the netstat return value just dropped by 1000...

@ibanner56
Copy link

Alright, after >12 hours the netstat is still returning below 4000, the lsof below 200. I think this works.

ibanner56 pushed a commit to ibanner56/kafka-web-console that referenced this issue Dec 9, 2014
@ibanner56
Copy link

Well, this only fixed part of the issue, apparently. I didn't notice because I was only checking for connections on Port 9092, but it's still leaving open sockets on 9091 and 9090. The number of connections on port 9092 doesn't increase without bound anymore, however.

The open file count is still less than 200, so that exception is gone, but instead of crashing it just continuously fails to open a socket.

@ibanner56
Copy link

Alright, I replaced the finagle-kafka library with a different kafka connection system and the number of open connections is consistently remaining below 20. I can only assume that okapies' library wasn't properly closing connections.

https://github.com/ibanner56/kafka-web-console

@PAStheLoD
Copy link

Thanks for the fix, works wonderfully for us. Let's hope it gets merged.

@foovungle
Copy link

Has this been merged? Or should we keep using @ibanner56 's fork?

ches added a commit to playbasis/kafka-web-console that referenced this issue Apr 12, 2015
* ibanner56/master:
  Sped up LowLevelConsumer, since we don't need to worry about writes.
  Uncomment a logger statement
  Removed unnecessary line.
  future is deprecated, use Future
  Removed unused imports
  Added author attributions
  Actually fixes cjmamo#30.
  Typo.
  Closed on the wrong side of the braces.
  Fixes cjmamo#30
  Forgot to remove a debug line
  Forgot a failure case
  Modified delete workflow to prevent incomplete deletes.
@marcinszymaniuk
Copy link

I'm running your fork and initially it looked way better (I could use the app for more than a minute which was the case before) but after a weekend being up and running it hangs again with lot of open files. Haven't had time to investigate it but let me know if you need some additional info.
$ lsof | grep 31434 |wc -l
327756

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

7 participants