Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

Use new partition methods #1427

Merged
merged 4 commits into from
Aug 15, 2019
Merged

Use new partition methods #1427

merged 4 commits into from
Aug 15, 2019

Conversation

robert-milan
Copy link
Contributor

@robert-milan robert-milan commented Aug 13, 2019

This PR replaces and supersedes #1282.

Now that raintank/schema#26 is merged we can switch over to using PartionID in Metrictank and tsdb-gw.

Remove the Partitioner interface in this PR or a later PR.

This PR adds a keyBySeriesWithTags option to partitioning.

After this is merged I think the following steps should be followed:

  1. Update tsdb-gw to use the new partitioning code.
  2. Move raintank/schema into Metrictank.
  3. Update tsdb-gw to use the vendored code in Metrictank.
  4. Implement EatDots in the validation methods.

See also: #1123, raintank/schema#26, raintank/tsdb-gw#138

@robert-milan robert-milan changed the title [WIP] Use new partition methods Use new partition methods Aug 13, 2019
@fkaleo
Copy link
Contributor

fkaleo commented Aug 14, 2019

This PR replaces and supersedes #1282.

Now that raintank/schema#26 is merged we can switch over to using PartionID in Metrictank and tsdb-gw.

Remove the Partitioner interface in this PR or a later PR.

This PR adds a keyBySeriesWithTags option to partitioning.

After this is merged I think the following steps should be followed:

1. Update `tsdb-gw` to use the new partitioning code.

2. Move `raintank/schema` into `Metrictank`.

From my grep the following projects rely on raintank/schema: fakemetrics, hosted-metrics-api, tsdb-gw, carbon-relay-ng

3. Update `tsdb-gw` to use the vendored code in `Metrictank`.

4. Implement `EatDots` in the validation methods.

See also: #1123, raintank/schema#26, raintank/tsdb-gw#138

@robert-milan robert-milan requested a review from woodsaj August 14, 2019 13:51
@robert-milan
Copy link
Contributor Author

Thanks, I'll look at updating those in the future as well.

- instead of comparing the partitionScheme string on every call, just
  keep a record of the schema.PartitionByMethod to use. This allows the
  Partition() method to just return m.PartitionID() directly.
- in fakemetrics/kafkamdm use a Partitioner Interface instead of using
  partition.Kafka directly.
Copy link
Member

@woodsaj woodsaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Dieterbe
Copy link
Contributor

Did we forget about PartitionBySeriesWithTagsFnv?

@robert-milan
Copy link
Contributor Author

Did we forget about PartitionBySeriesWithTagsFnv?

I left it out because according to the benchmarks in raintank/schema#26 it doesn't look like it performs well. Is there a reason we need both? Does one perform better in certain situations?

I can add an option to use the method, or just remove it from schema.

@Dieterbe
Copy link
Contributor

Dieterbe commented Aug 15, 2019

We deliberately included it for cases of existing datasets to not require migrations . The code should have comment somewhere explaining it. Both should be available through MT flags. Don't forget to update all config files using one of the dev scripts

@robert-milan
Copy link
Contributor Author

I guess I'm confused on that. Since all of these methods are new and have never been used how would someone already have existing datasets which use a method that hashes based on NameWithTaqs? Or is that assumption incorrect?

@Dieterbe
Copy link
Contributor

It allows to introduce tagging (with mediocre distribution ) without remapping non tagged series

@robert-milan
Copy link
Contributor Author

Thanks Dieter, that was the missing piece for me. I will implement in the next few PRs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants