-
Notifications
You must be signed in to change notification settings - Fork 107
Conversation
Have you done any testing of this ? What kind ? With what kafka version ? |
yeah, tested with kafka master: https://snapshot.raintank.io/dashboard/snapshot/gXmIVHiTTDQSsJVXWddyboijdl2jdufg |
can we strip out the 13MiB of testdata from github.com/pierrec/lz4 (rewrite the commit that introduced it) |
scripts/build.sh
Outdated
@@ -1,4 +1,8 @@ | |||
#!/bin/bash | |||
|
|||
set -ex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think scripts should print out every single thing they do. that would only be useful during development of the script.
@@ -285,7 +285,7 @@ partition-scheme = bySeries | |||
# offset to start consuming from. Can be one of newest, oldest,last or a time duration | |||
# When using a duration but the offset request fails (e.g. Kafka doesn't have data so far back), metrictank falls back to `oldest`. | |||
# Should match your kafka-mdm-in setting | |||
offset = last | |||
offset = oldest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this results in a minute long of
metrictank_1 | 2018/05/08 09:17:42 [W] stats dialing localhost:2003 failed: dial tcp 127.0.0.1:2003: connect: connection refused. will retry
metrictank_1 | 2018/05/08 09:17:43 [W] stats dialing localhost:2003 failed: dial tcp 127.0.0.1:2003: connect: connection refused. will retry
statsdaemon_1 | 2018/05/08 09:17:43 WARN: dialing metrictank:2003 failed: dial tcp 172.19.0.11:2003: getsockopt: connection refused. will retry
followed by.
metrictank_1 | 2018/05/08 09:18:43 [W] kafka-cluster: Processing metricPersist backlog has taken too long, giving up lock after 1m0s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which is strange, though for doing the benchmarks with consuming backfilled data this is of course a useful change, but not necessarily something we should commit, or at least not with commit message 'update sarama'
so in your run it was about 880kHz to 830kHz (6%) |
note: when i change furthermore:
|
backfilling metricdata on my system : 350kHz |
seems to have a negligible / non-existant benefit. -> looks like sarama is the bottleneck, not this
we don't really seem to need them. this gives us about 5% ingest throughput improvement perhaps we can re-instate them when we do batched operations
@replay please approve of my changes. |
mdata/aggmetrics.go
Outdated
schema := Schemas.Get(schemaId) | ||
m = NewAggMetric(ms.store, ms.cachePusher, k, schema.Retentions, schema.ReorderWindow, &agg, ms.dropFirstChunk) | ||
ms.Metrics[key] = m | ||
metricsActive.Set(len(ms.Metrics)) | ||
ms.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't the .Unlock()
happen before metricsActive.Set()
? IIRC that's thread safe, just len()
probably needs to be in the locked block
mdata/aggmetrics.go
Outdated
} | ||
agg := Aggregations.Get(aggId) | ||
schema := Schemas.Get(schemaId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instantiation of schema.AMKey
, Aggregations.Get()
and Schemas.Get()
could all happen a little further up, after we know that we need to create an entry but before we acquire the write lock, to keep the write lock short
Updates the Sarama library from
1.10.1
to1.16.0
.