partial chunks too eagerly ignored, maybe enough to cover certain data requests #78

Dieterbe · 2015-12-16T21:12:58Z

woodsaj · 2015-12-18T08:47:42Z

This is low prio. It is a very minor performance issue that only affects queries that include data in the first chunk written. If we keep 6hours of data in memory, then the problem is only present for the first 6hours after a metric_tank process is started.

Given the limited impact, i am not even sure that the benefit of correcting this would ever be worth the cost to optimize the code.

Dieterbe · 2015-12-18T20:35:04Z

agreed on low prio & not sure how useful. i suggest to keep this open until we get familiar with the impact of upgrades and secondary->primary upgrades, we'll see if cassandra takes a hit or not.

this is an old optimization (?) that has been with us since a long time ago: #74 2029113 here's how it caused data loss at read time: - when only 1 chunk of data had been filled: the "update" of the field is a no-op because len(chunks) == 1, so oldPos goes back to 0 (not sure if intentional or a bug) so reading the first chunk worked. - once you have more than 1 chunk: update of oldPos works. we start hitting cassandra. depending on how long the chunk takes to get saved to cassandra, we will miss data at read time. also, our chunk cache does not cache absence of data, hitting cassandra harder during this period. - once the chunk is saved to cassandra the problem disappears - once the circular buffer recycles the first time (effectively removing the first chunk) this optimization no longer applies, but at that point we still hit cassandra just as before. This problem is now solved. However, removing that code enables another avenue for data loss at read time: - when a read node starts (without data backfill) - or a read node starts with data backfill, but the backfill doesn't have old data for the particular metric, IOW when the data only covers 1 chunk's worth - a read node starts with data backfill, but since backfilling starts at arbitrary positions, the first chunk will miss some data in the beginning. In all these cases, the first chunk is a partial chunk, whereas a full version of the chunk is most likely already in cassandra. To make sure this is not a problem, if the first chunk we used was partial, we set oldest to the first timestamp, so that the rest can be retrieved from cassandra. Typically, this will cause the "same" chunk (but a full version) to be retrieved from cassandra, which is then cached and seamlessly merged via Fix() fix #78 fix #988

Network opt tweaks

Dieterbe mentioned this issue Aug 10, 2018

new data doesn't show up immediately #988

Closed

Dieterbe mentioned this issue Aug 14, 2018

read from first chunks #994

Merged

Dieterbe closed this as completed in #994 Aug 14, 2018

Dieterbe pushed a commit that referenced this issue Mar 22, 2020

Merge pull request #78 from grafana/network_opt

ad4c21e

Network opt tweaks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

partial chunks too eagerly ignored, maybe enough to cover certain data requests #78

partial chunks too eagerly ignored, maybe enough to cover certain data requests #78

Dieterbe commented Dec 16, 2015

woodsaj commented Dec 18, 2015

Dieterbe commented Dec 18, 2015

partial chunks too eagerly ignored, maybe enough to cover certain data requests #78

partial chunks too eagerly ignored, maybe enough to cover certain data requests #78

Comments

Dieterbe commented Dec 16, 2015

woodsaj commented Dec 18, 2015

Dieterbe commented Dec 18, 2015