Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

kafka-cluster partitions don't necessarily match kafka-mdm-in partitions #950

Closed
shanson7 opened this issue Jun 25, 2018 · 2 comments
Closed

Comments

@shanson7
Copy link
Collaborator

Data comes into Metrictank on the data topic (let's call that mdm) and when chunks are persisted to cassandra, summaries of the chunk are sent to the persist topic (let's call that persist). On start up, MT uses persist to avoid overwriting chunks in cassandra that were already persisted. This means that it is important that the summaries in persist line up with the data in mdm for the instance handling a given partition.

  1. MT doesn't really enforce how the data in mdm is partitioned, just that partitioning is consistent.
  2. Partitioning in persist is either "ByOrg" or "BySeries" (see here and here). If this isn't how data in mdm is partitioned, there is trouble.

It seems to me that the simpler solution is to simply use def.Partition to put the summaries into persist.

I am rolling out this change on our side and can submit a cleanup PR. I'm not sure if removing the parameter entirely is backwards compatible or not, however.

@Dieterbe
Copy link
Contributor

It seems to me that the simpler solution is to simply use def.Partition to put the summaries into persist

+1. more robust and more efficient probably

I'm not sure if removing the parameter entirely is backwards compatible or not, however.

the parameters purpose is to try to make sure partitioning is consistent. your approach does that more accurately, so is better. parameter can be dropped.

@woodsaj
Copy link
Member

woodsaj commented Jun 26, 2018

+1 this seems like a much better approach.

I think the current approach was used as we were not storing the partition info at the time the feature was added.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants