Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add partitioning of MetricData/MetricDefinitions #26

Merged
merged 7 commits into from
Jul 10, 2019
Merged

Conversation

woodsaj
Copy link
Contributor

@woodsaj woodsaj commented Apr 23, 2019

No description provided.

@woodsaj
Copy link
Contributor Author

woodsaj commented Apr 23, 2019

awoods@awoods-ThinkPad:~/go/src/github.com/raintank/schema$ go test -v -run none -bench BenchmarkPartitionBy -benchmem
goos: linux
goarch: amd64
pkg: github.com/raintank/schema
BenchmarkPartitionByOrg-8                       20000000                78.1 ns/op            16 B/op          3 allocs/op
BenchmarkPartitionBySeries-8                    20000000                60.8 ns/op            20 B/op          2 allocs/op
BenchmarkPartitionBySeriesWithTags-8             3000000               500 ns/op              32 B/op          1 allocs/op
BenchmarkPartitionBySeriesWithTagsFnv-8          1000000              1197 ns/op             488 B/op         23 allocs/op
PASS
ok      github.com/raintank/schema      6.511s

partition.go Outdated
case PartitionBySeriesWithTags:
h := xxhash.New()
h.WriteString(m.Name)
sort.Strings(m.Tags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perform a check to see if any tags exist? Even if tags don't exist I suppose we would still want to continue with the jump.Hash call on m.Name. I know in the end it doesn't make much of a difference computationally.

Copy link
Contributor

@replay replay Apr 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this procedure where the tags get sorted and then concatenated into a string (excluding name) is repeated multiple times, that could also just be in a function like getSortedTagString([]string) string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perform a check to see if any tags exist?

we probably should not spend any effort on this or on seeing whether we should, as this is unlikely to be an issue

Even if tags don't exist I suppose we would still want to continue with the jump.Hash call on m.Name

yes, in that case it's equivalent to PartitionBySeries, which is what we want

this procedure where the tags get sorted and then concatenated into a string (excluding name) is repeated multiple times, that could also just be in a function like getSortedTagString([]string) string

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also just be in a function like getSortedTagString([]string) string

We are deliberately optimising here to reduce allocations. We dont want to allocate memory for a new string only to then write that string to something else.

But we could use a func like writeSortedTagString(w io.StringWriter, []string) (n int, err error)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, that wont work, as we are using WriteString() when it is available and Write() otherwise.
so we would need to just use writeSortedTagString(w io.Writer, []string) (n int, err error) in which we can call io.WriteString(w, s) which will use the optimised WriteString() call if available and fall back to Write()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented that and pushed into this branch

Copy link
Contributor

@Dieterbe Dieterbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good. but needs some more tweaks.
please update your commit message to mention why you're removing the Key methods (we ditch them and restrict ourselves to never setting keys anymore, in exchange for a simpler partitioning system)

@@ -290,3 +247,30 @@ func ValidateTags(tags []string) bool {

return true
}

func writeSortedTagString(w io.Writer, name string, tags []string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we be performing tag validation here? Or leave that up to tsdb-gw ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a separate function to validate the tags, this is only supposed to write the name with tags into a writer

KeyBySeries([]byte) []byte
// using the provided partitionByMethod, and number of partitions being used
// generate and return the partition id that should be used with this metric.
PartitionID(method PartitionByMethod, partitions int32) (int32, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other projects aside from Metrictank and tsdb-gw that use this interface? I suppose we will need to update them all at the same time. I know there are some PRs open to deal with it, but I don't think the MT PR is ready yet #grafana/metrictank/pull/1282

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only aware of TSDB and MT (in multiple locations). Once this is merged, MT and TSDB can update their vendored schema and start using the new partitioning methods, but there's nothing stopping us from merging this PR already before MT & TSDB integrated it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's nothing stopping us from merging this PR already before MT & TSDB integrated it.

in fact that's why the mentioned MT pr is stale, because it's waiting for this one to be done so it can pull this code in cleanly.

metric.go Outdated
@@ -136,27 +113,26 @@ func (m *MetricDefinition) NameWithTags() string {
return m.nameWithTags
}

sort.Strings(m.Tags)
nameWithTagsBuffer := &bytes.Buffer{}
writeSortedTagString(nameWithTagsBuffer, m.Name, m.Tags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to check for the returned error here or not?

Copy link
Contributor

@replay replay Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, this is a bit funny. the only way how writeSortedTagString() would return an error would be if the first argument is an io.Writer that returned an error either from Write() or from WriteString(). Now, according to their signature, they can return an error. But in our code we only use 3 types of io.Writer, which are:

  1. bytes.Buffer
  2. xxhash.Digest
  3. fnv.Hash32

None of these 3 can return an error when you look at their code, they are all hard-coded to only return nil as error. So I'm not sure if we really have to check for an error here. I wouldn't like to modify NameWithTags() so it returns an error, because then every location that uses it would also have to check for that error again etc...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeSortedTagString() can't make assumptions about whether or not the passed in io.Writer actually will return an error or not, so it must check explicitly. (the current implementation does this right), and also return an error via its signature (regardless or whether it truly will at runtime or not)

I think the right solution here is that callers of writeSortedTagString() can choose to ignore the error if they know it'll never happen, which enables NameWithTags to not have to return an error in its signature. However:

  1. we should be explicit / a bit more clear in the code. this seems to compile fine:
-       writeSortedTagString(nameWithTagsBuffer, m.Name, m.Tags)
+       _ = writeSortedTagString(nameWithTagsBuffer, m.Name, m.Tags)
  1. we can obviously only do this at callsites where we know the io.Writer won't actually return an error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mauro points out, we only pass those 3 types of values into writeSortedTagsString and none of them returns errors.
this enables to change the signature of PartitionID to not return an error, at least if the caller promises to always provide a valid PartitionByMethod,.. i guess for now we can leave as is and revise later

Copy link
Contributor

@replay replay Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. We get the best of both versions, on one hand NameWithTags() does not return an error, on the other hand we still make sure that errors don't get swallowed.

@Dieterbe
Copy link
Contributor

Dieterbe commented Jul 9, 2019

I think this is ready for merging now.

partition.go Outdated
return partition, nil
}

func (m *MetricDefinition) Partition(method PartitionByMethod, partitions int32) (int32, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the MetricDefinition struct now has a property .Partition and a method .Partition()

Copy link
Contributor

@replay replay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from that it doesn't work because of the name conflict
if that's fixed then it should be good to go

@Dieterbe
Copy link
Contributor

LGTM apart from that it doesn't work because of the name conflict

oops i got a bit quixotic again

fixed and rebased on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants