Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track client connection information reported from zookeeper cons/stat command #10475

Closed
matschaffer opened this issue Feb 1, 2019 · 7 comments
Labels
Metricbeat Metricbeat module Team:Integrations Label for the Integrations team

Comments

@matschaffer
Copy link
Contributor

#8938 was originally opened to address the need to track zkid from the stat command. The srvr command also exposes this data and tracking is implemented in #10341

To get a more complete view of zookeeper activity we'd also like to track the cons (or possibly the first part of stat, still need to confirm) output which displays information about all connected clients.

@matschaffer
Copy link
Contributor Author

matschaffer commented Feb 4, 2019

Just confirming that cons does indeed look the same as the top half of stat.

Since we haven't yet had historical tracking of this data, it's hard to say 100% for sure how we'd use it. But sure looks useful.

@pmoust may have some thoughts here as well

$ echo cons | nc -q -1 localhost $(get_zk_client_port)
 /172.17.42.1:45386[1](queued=0,recved=623,sent=623,sid=0xa000003f177000a,lop=PING,est=1549248011213,to=40000,lcxid=0x5,lzxid=0x10000046c,lresp=8531400,llat=0,minlat=0,avglat=0,maxlat=2)
 /127.0.0.1:39716[1](queued=0,recved=2427,sent=2464,sid=0xa000003f177000d,lop=PING,est=1549248038923,to=15000,lcxid=0x31f,lzxid=0x10000046c,lresp=8537386,llat=0,minlat=0,avglat=1,maxlat=313)
 /127.0.0.1:40410[1](queued=0,recved=1113,sent=1395,sid=0xa000003f1770012,lop=PING,est=1549248258602,to=40000,lcxid=0x245,lzxid=0x10000046c,lresp=8536117,llat=0,minlat=0,avglat=0,maxlat=11)
 /127.0.0.1:39638[1](queued=0,recved=1765,sent=1788,sid=0xa000003f1770004,lop=PING,est=1549247983643,to=15000,lcxid=0x72,lzxid=0x10000046c,lresp=8535730,llat=0,minlat=0,avglat=0,maxlat=135)
 /0:0:0:0:0:0:0:1:38220[1](queued=0,recved=1661,sent=1664,sid=0xa000003f1770007,lop=PING,est=1549248006538,to=15000,lcxid=0xe,lzxid=0x10000046c,lresp=8536614,llat=0,minlat=0,avglat=0,maxlat=10)
 /127.0.0.1:39708[1](queued=0,recved=1656,sent=1657,sid=0xa000003f177000c,lop=PING,est=1549248036898,to=15000,lcxid=0xf,lzxid=0x10000046c,lresp=8536669,llat=0,minlat=0,avglat=0,maxlat=54)
 /0:0:0:0:0:0:0:1:38236[1](queued=0,recved=1677,sent=1690,sid=0xa000003f1770009,lop=PING,est=1549248008677,to=15000,lcxid=0x22,lzxid=0x10000046c,lresp=8536359,llat=0,minlat=0,avglat=0,maxlat=3)
 /127.0.0.1:39620[1](queued=0,recved=1974,sent=2022,sid=0xa000003f1770003,lop=PING,est=1549247974385,to=15000,lcxid=0x126,lzxid=0x10000046c,lresp=8534462,llat=0,minlat=0,avglat=0,maxlat=127)
 /127.0.0.1:39904[1](queued=0,recved=3031,sent=3614,sid=0xa000003f177000f,lop=PING,est=1549248086472,to=15000,lcxid=0x58d,lzxid=0x10000046c,lresp=8537784,llat=0,minlat=0,avglat=0,maxlat=86)
 /127.0.0.1:39990[1](queued=0,recved=1628,sent=1628,sid=0xa000003f1770010,lop=PING,est=1549248115384,to=15000,lcxid=0x2,lzxid=0x10000046c,lresp=8536879,llat=1,minlat=0,avglat=0,maxlat=3)
 /127.0.0.1:40148[1](queued=0,recved=663,sent=688,sid=0xa000003f1770011,lop=PING,est=1549248189998,to=40000,lcxid=0x40,lzxid=0x10000046c,lresp=8535390,llat=0,minlat=0,avglat=0,maxlat=23)
 /127.0.0.1:39840[1](queued=0,recved=1640,sent=1640,sid=0xa000003f177000e,lop=PING,est=1549248062896,to=15000,lcxid=0x4,lzxid=0x10000046c,lresp=8536052,llat=0,minlat=0,avglat=0,maxlat=16)
 /0:0:0:0:0:0:0:1:38210[1](queued=0,recved=1703,sent=1707,sid=0xa000003f1770006,lop=PING,est=1549248002096,to=15000,lcxid=0x33,lzxid=0x10000046c,lresp=8538453,llat=0,minlat=0,avglat=0,maxlat=23)
 /0:0:0:0:0:0:0:1:38228[1](queued=0,recved=2328,sent=2391,sid=0xa000003f1770008,lop=PING,est=1549248007279,to=15000,lcxid=0x2ba,lzxid=0x10000046c,lresp=8538434,llat=0,minlat=0,avglat=0,maxlat=309)
 /127.0.0.1:39594[1](queued=0,recved=1668,sent=1671,sid=0xa000003f1770000,lop=PING,est=1549247961372,to=15000,lcxid=0xe,lzxid=0x10000046c,lresp=8537725,llat=0,minlat=0,avglat=0,maxlat=76)
 /0:0:0:0:0:0:0:1:38204[1](queued=0,recved=1657,sent=1657,sid=0xa000003f1770005,lop=PING,est=1549248001999,to=15000,lcxid=0x5,lzxid=0x10000046c,lresp=8537508,llat=0,minlat=0,avglat=0,maxlat=3)
 /127.0.0.1:39698[1](queued=0,recved=1052,sent=1190,sid=0xa000003f177000b,lop=PING,est=1549248012150,to=30000,lcxid=0xf3,lzxid=0x10000046c,lresp=8531925,llat=0,minlat=0,avglat=0,maxlat=35)
 /172.17.42.1:41364[0](queued=0,recved=1,sent=0)

@sayden
Copy link
Contributor

sayden commented Mar 13, 2019

@matschaffer I'm finishing the PR to add cons too but I really can't figure out what's the number between brackets

172.17.42.1:45386[1](queued=0... that [1] I mean. Zookeeper docs only has this cons New in 3.3.0: List full connection/session details for all clients connected to this server. Includes information on numbers of packets received/sent, session id, operation latencies, last operation performed, etc...

@matschaffer
Copy link
Contributor Author

Wonder if @stejacks or @mattfield might know offhand. If not we can always go code-diving 🏊

@pmoust
Copy link
Member

pmoust commented Mar 14, 2019

This comes from Netty's Channel InterestOps, as seen in ZooKeeper's dumpConnectionInfo().

That said, please note exactly how getInterestOps() is implemented in ZooKeeper codebase. It is a bit weird.

I think you can name the field interestOps, unless somewhere else in Elastic Common Schema, or otherwise for Netty specifically there is a different naming for ChannelState/Operations.

@pmoust
Copy link
Member

pmoust commented Mar 14, 2019

/cc @phunt @ivmaykov @anmolnar
Does the above make sense? What would you suggest the naming and type for this cons output field be for Metricbeat?

@phunt
Copy link

phunt commented Mar 14, 2019

Yes, that makes sense. That field is the interest information on either NIO or Netty, hence the code is a bit obscure - interest on either the selectorkey or channel respectively. I've found it most useful in debugging during overload conditions (zk server will disable recv in order to put backpressure on clients). "interestOps" seems fine, it might be helpful to break out the values into human readable form, but unfortunately ZK itself is not doing a good job here.

@matschaffer
Copy link
Contributor Author

This should be fixed now that #11070 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Metricbeat Metricbeat module Team:Integrations Label for the Integrations team
Projects
None yet
Development

No branches or pull requests

7 participants