Use new partition methods #1427

robert-milan · 2019-08-13T12:27:06Z

This PR replaces and supersedes #1282.

Now that raintank/schema#26 is merged we can switch over to using PartionID in Metrictank and tsdb-gw.

Remove the Partitioner interface in this PR or a later PR.

This PR adds a keyBySeriesWithTags option to partitioning.

After this is merged I think the following steps should be followed:

Update tsdb-gw to use the new partitioning code.
Move raintank/schema into Metrictank.
Update tsdb-gw to use the vendored code in Metrictank.
Implement EatDots in the validation methods.

See also: #1123, raintank/schema#26, raintank/tsdb-gw#138

fkaleo · 2019-08-14T13:49:45Z

This PR replaces and supersedes #1282.

Now that raintank/schema#26 is merged we can switch over to using PartionID in Metrictank and tsdb-gw.

Remove the Partitioner interface in this PR or a later PR.

This PR adds a keyBySeriesWithTags option to partitioning.

After this is merged I think the following steps should be followed:
1. Update `tsdb-gw` to use the new partitioning code.

2. Move `raintank/schema` into `Metrictank`.

From my grep the following projects rely on raintank/schema: fakemetrics, hosted-metrics-api, tsdb-gw, carbon-relay-ng

3. Update `tsdb-gw` to use the vendored code in `Metrictank`.

4. Implement `EatDots` in the validation methods.
See also: #1123, raintank/schema#26, raintank/tsdb-gw#138

robert-milan · 2019-08-14T14:02:22Z

Thanks, I'll look at updating those in the future as well.

cluster/partitioner/partitioner.go

stacktest/fakemetrics/out/kafkamdm/kafkamdm.go

- instead of comparing the partitionScheme string on every call, just keep a record of the schema.PartitionByMethod to use. This allows the Partition() method to just return m.PartitionID() directly. - in fakemetrics/kafkamdm use a Partitioner Interface instead of using partition.Kafka directly.

woodsaj

LGTM

Dieterbe · 2019-08-15T15:22:06Z

Did we forget about PartitionBySeriesWithTagsFnv?

robert-milan · 2019-08-15T15:29:02Z

Did we forget about PartitionBySeriesWithTagsFnv?

I left it out because according to the benchmarks in raintank/schema#26 it doesn't look like it performs well. Is there a reason we need both? Does one perform better in certain situations?

I can add an option to use the method, or just remove it from schema.

Dieterbe · 2019-08-15T15:46:33Z

We deliberately included it for cases of existing datasets to not require migrations . The code should have comment somewhere explaining it. Both should be available through MT flags. Don't forget to update all config files using one of the dev scripts

robert-milan · 2019-08-15T16:06:11Z

I guess I'm confused on that. Since all of these methods are new and have never been used how would someone already have existing datasets which use a method that hashes based on NameWithTaqs? Or is that assumption incorrect?

Dieterbe · 2019-08-15T19:03:03Z

It allows to introduce tagging (with mediocre distribution ) without remapping non tagged series

robert-milan · 2019-08-16T08:21:39Z

Thanks Dieter, that was the missing piece for me. I will implement in the next few PRs.

Use new partition methods

8d8900b

robert-milan mentioned this pull request Aug 13, 2019

partitionBy bySeriesWithTags (aka "shard by tag") #1282

Closed

robert-milan changed the title ~~[WIP] Use new partition methods~~ Use new partition methods Aug 13, 2019

robert-milan requested review from Dieterbe, replay and fkaleo August 13, 2019 13:34

robert-milan requested a review from woodsaj August 14, 2019 13:51