You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 23, 2023. It is now read-only.
this is an old optimization (?) that has been with us
since a long time ago: #742029113
here's how it caused data loss at read time:
- when only 1 chunk of data had been filled:
the "update" of the field is a no-op
because len(chunks) == 1, so oldPos goes back to 0
(not sure if intentional or a bug) so reading the
first chunk worked.
- once you have more than 1 chunk: update of oldPos works.
we start hitting cassandra.
depending on how long the chunk takes to get saved
to cassandra, we will miss data at read time.
also, our chunk cache does not cache absence of data,
hitting cassandra harder during this period.
- once the chunk is saved to cassandra the problem disappears
- once the circular buffer recycles the first time (effectively
removing the first chunk) this optimization no longer applies,
but at that point we still hit cassandra just as before.
This problem is now solved. However, removing that code
enables another avenue for data loss at read time:
- when a read node starts (without data backfill)
- or a read node starts with data backfill, but the backfill
doesn't have old data for the particular metric, IOW
when the data only covers 1 chunk's worth
- a read node starts with data backfill, but since backfilling starts
at arbitrary positions, the first chunk will miss some data in the
beginning.
In all these cases, the first chunk is a partial chunk, whereas
a full version of the chunk is most likely already in cassandra.
To make sure this is not a problem, if the first chunk we used was
partial, we set oldest to the first timestamp, so that the rest
can be retrieved from cassandra.
Typically, this will cause the "same" chunk (but a full version)
to be retrieved from cassandra, which is then cached and seamlessly
merged via Fix()
fix#78fix#988
likely due to https://github.com/grafana/metrictank/blob/master/mdata/aggmetric.go#L274
introduced in #74
seems like we can simply remove that code
(related: #78 )
The text was updated successfully, but these errors were encountered: