Error when trying to run tester #244

TomMizrachi · 2019-06-18T11:46:01Z

~/gostatsd/cmd/tester$ make setup

go get -u github.com/githubnemo/CompileDaemon
go get -u github.com/alecthomas/gometalinter
go: finding github.com/nicksnyder/go-i18n/i18n latest
go: finding github.com/alecthomas/units latest
build github.com/alecthomas/gometalinter: cannot load github.com/nicksnyder/go-i18n/i18n: cannot find module providing package github.com/nicksnyder/go-i18n/i18n
make: *** [setup] Error 1

Seems like the author of go-il8n decided to move the il8n folder to a new folder called v2

TomMizrachi · 2019-06-18T11:59:09Z

Would've opened a PR for this guy myself but his repo is archived:
https://github.com/alecthomas/gometalinter/blob/b242b54b75005af59cb3a06620085146709b598a/vendor/manifest

tiedotguy · 2019-06-18T12:05:42Z

Hi

Honestly, the entire tester section hasn't been maintained, and I'm not sure if it will work, even if it gets past linting. We switched to golangci-lint a couple of months back (#234), and this portion wasn't noticed.

make check in the project root will do an install of the binary, but it's more of a side-effect of the build/testing than anything though.

TomMizrachi · 2019-06-18T12:08:07Z

I see, so is there any other way I can test the throughput of gostatsd on linux?
Or you can add to the README file some more information regarding this subject?

Anyway, thanks for the quick response :)

tiedotguy · 2019-06-18T12:36:08Z

I used to use an internal load tester, but I don't have access to it anymore. @aelse, would you mind opening the repo? I had some local changes which I didn't save, but it's probably a better start than the tester binary.

I can tell you a bit about my experience scaling and running it in production. Generally it scales linearly on the number of cores you give it, at about 15-25k/metrics/second/core. On the high end of that you'll likely hit problems with packet loss, so it's important to watch that at the host layer. It's also packet/second intensive much more so than raw bandwidth, so you may find your PPS plateaus even when you have CPU spare. You'll also get better throughput if the clients send multiple metrics/packet.

The big killer for performance is hot metrics. Metrics are distributed deterministically to aggregators on a hash of name+host (not tags, and if --ignore-host is used, it might be only name). If a single aggregator is overloaded, that causes back pressure through the system, and eventual packet loss. I have a plan to fix it (#210), but haven't had the time to get to it.

As the cardinality sent to the backend increases, so to does the time to flush - if it's over your flush interval, then it will skip that flush. This can lead to unexpected behaviour, such as higher incoming throughput because because you're flushing half as much, and not using that CPU. Also it can be confusing when querying from the actual backend.

I always struggled to find a good load generator profile, because we had such a wide variety - some clients had 1 metric per packet, some had 80 (jumbo frames ftw). Some clients have everything on one metric (leading to hot aggregators), and some clients don't use tags, so the metrics are spread out very evenly.

In the end, we moved to the distributed model, with forwarder nodes that can do the majority of the work, and forward over HTTP for final aggregation. I'm not even sure what the raw metrics/second is now. It's still not horizontally scalable yet, but at least one bottleneck has been removed :)

Pretty much all of this is covered by internal metrics, but only the Datadog backend has good metrics for its behavior.

tiedotguy · 2019-06-18T12:40:53Z

Also on the note of updating docs - I really want to get the system horizontally scalable, and then rewrite them, documenting the different deployment models.

I want to remove limitations, not document them :)

TomMizrachi · 2019-06-18T13:09:58Z

Thanks for the detailed answer! :)
I'll close the issue

tiedotguy · 2020-07-13T06:05:02Z

Hi @TomMizrachi, quick FYI - If you're still interested in the topic, I've just pushed a branch with a new load tester on it (#332). Minimal deps, and should be much simpler to build, with simple command line options.

TomMizrachi closed this as completed Jun 18, 2019

tiedotguy mentioned this issue Nov 19, 2019

Overhaul documentation #264

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when trying to run tester #244

Error when trying to run tester #244

TomMizrachi commented Jun 18, 2019

TomMizrachi commented Jun 18, 2019

tiedotguy commented Jun 18, 2019

TomMizrachi commented Jun 18, 2019

tiedotguy commented Jun 18, 2019

tiedotguy commented Jun 18, 2019

TomMizrachi commented Jun 18, 2019

tiedotguy commented Jul 13, 2020

Error when trying to run tester #244

Error when trying to run tester #244

Comments

TomMizrachi commented Jun 18, 2019

TomMizrachi commented Jun 18, 2019

tiedotguy commented Jun 18, 2019

TomMizrachi commented Jun 18, 2019

tiedotguy commented Jun 18, 2019

tiedotguy commented Jun 18, 2019

TomMizrachi commented Jun 18, 2019

tiedotguy commented Jul 13, 2020