-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Any plans to add TCP server support?. #276
Comments
There's no immediate plans. The current timeline is:
At that point, it will probably make sense to re-assess the input layer, so we can have streams as an input, not just an output, which will hopefully make a TCP input better. Is there a specific thing you need TCP for? If it's reliability, could you use the HTTP forwarding with on-host collection? |
so we have local statsd installed in every machine already and we use repeater backend for sending metrics to aggregator, will try to use local gostatd and foward via HTTP, Question: so if we use gostatd forwarder > server (aggregation) does it aggregates locally and forward to server or will it send all the metrics without any aggregation just like statsd repeater backend |
If you use a local forwarder on each host, and a central aggregation server, then the forwarder tier does what I call "consolidation" (as opposed to aggregation, mainly because they are slightly different). Consolidation roughly means if you send statsd something like ..
.. then the forwarder tier would send it to the aggregation server as ..
(It's a bit more complicated than that, because a) it doesn't use statsd format, it has a protobuf format, and b) it needs to handle sample rates for both counters and timers) .. so the counters are combined in to a single value (just like aggregation), and the timers have all the values sent as an array (which is not aggregation, but is much more efficient to process). Gauges are sent as a single value, even if there are multiple, which is also just like aggregation. The aggregation tier can then do proper timer aggregation, because it sees data from all the hosts. You do need to have the flush interval on the forwarder tier more frequent - I suggest an order of magnitude faster than the flush interval on the aggregator. 1s vs 10s or 5s vs 1m, for example. Hopefully the answers your question, and helps solve the bigger picture question. Let me know if it doesn't, and I'll get you sorted though. Note that in forwarder mode, there's no backend on the forwarder layer. |
Thanks for the detailed explanation, we started replacing statsd with gostatd on Tier 3 apps [planning to use http forwarder in next quarter] and for old apps will be using local statd repeater backend > cental gostatd. So we had heka before for central statd aggregation and we do the testing with gostatd, comparing with heka, we notice it starts dropping UDP packets and had to increase
another thing I notice, eventhough stats^ @tiedotguy |
If it's dropping packets, that's because something isn't processing fast enough, and back-pressure is propagating all the way through the system. I need to do a proper documentation write-up on where the pressure points are and how to monitor them. I'll try and do a rough outline here (a bunch of it you may already know, since you're setting them already, but I'll probably turn this comment in to proper tuning documentation at some point), working backwards from the backend through to the network buffers in the kernel. I believe they are:
So having said all of that, there's generally 2 sources of packet loss:
When it's a hot aggregator, and you increase readers or parsers (or the aggregator channel size), what you're really doing is putting a little bit more buffer space in the system. Each reader and parser might be holding on to 10 datagrams, so by increasing the number of them, you're not actually improving throughput, you're just adding a bit of space to absorb the spike I mentioned previously. If you're that close to the edge that it helps, then it probably won't help for much longer. As a general rule, gostatsd will scale linearly on the number of cores it has, until it hits a hot aggregator, then CPU will flatline at that point. This can best be measured by watching the CPU (CPU spare with packet loss means a hot aggregator, no CPU spare with packet loss means you need more cores). This equation changes somewhat with HTTP forwarding. The collection will scale linearly, because the aggregators are purely for the backend. Because of the consolidation done on the collection hosts, the aggregator host has less work. It still has the same bottlenecks, but there's less actual work done. There's also a more efficient data structure used. For future work, #210 should remove most (all?) of the required aggregator affinity, allowing the system to scale linearly until the CPU or network saturates. As for Hope that's useful - as I mentioned, I'll probably reformat this comment in to proper documentation at some point, so I apologize for redundancies. |
Closing this off for now, but feel free to re-open or open a new issue if you have further questions. |
No description provided.
The text was updated successfully, but these errors were encountered: