-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add partitioning of MetricData/MetricDefinitions #26
Conversation
|
partition.go
Outdated
case PartitionBySeriesWithTags: | ||
h := xxhash.New() | ||
h.WriteString(m.Name) | ||
sort.Strings(m.Tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we perform a check to see if any tags exist? Even if tags don't exist I suppose we would still want to continue with the jump.Hash
call on m.Name
. I know in the end it doesn't make much of a difference computationally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this procedure where the tags get sorted and then concatenated into a string (excluding name
) is repeated multiple times, that could also just be in a function like getSortedTagString([]string) string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we perform a check to see if any tags exist?
we probably should not spend any effort on this or on seeing whether we should, as this is unlikely to be an issue
Even if tags don't exist I suppose we would still want to continue with the jump.Hash call on m.Name
yes, in that case it's equivalent to PartitionBySeries, which is what we want
this procedure where the tags get sorted and then concatenated into a string (excluding name) is repeated multiple times, that could also just be in a function like getSortedTagString([]string) string
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could also just be in a function like getSortedTagString([]string) string
We are deliberately optimising here to reduce allocations. We dont want to allocate memory for a new string only to then write that string to something else.
But we could use a func like writeSortedTagString(w io.StringWriter, []string) (n int, err error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, that wont work, as we are using WriteString() when it is available and Write() otherwise.
so we would need to just use writeSortedTagString(w io.Writer, []string) (n int, err error)
in which we can call io.WriteString(w, s)
which will use the optimised WriteString() call if available and fall back to Write()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented that and pushed into this branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good. but needs some more tweaks.
please update your commit message to mention why you're removing the Key methods (we ditch them and restrict ourselves to never setting keys anymore, in exchange for a simpler partitioning system)
@@ -290,3 +247,30 @@ func ValidateTags(tags []string) bool { | |||
|
|||
return true | |||
} | |||
|
|||
func writeSortedTagString(w io.Writer, name string, tags []string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we be performing tag validation here? Or leave that up to tsdb-gw ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a separate function to validate the tags, this is only supposed to write the name with tags into a writer
KeyBySeries([]byte) []byte | ||
// using the provided partitionByMethod, and number of partitions being used | ||
// generate and return the partition id that should be used with this metric. | ||
PartitionID(method PartitionByMethod, partitions int32) (int32, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any other projects aside from Metrictank and tsdb-gw that use this interface? I suppose we will need to update them all at the same time. I know there are some PRs open to deal with it, but I don't think the MT PR is ready yet #grafana/metrictank/pull/1282
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only aware of TSDB and MT (in multiple locations). Once this is merged, MT and TSDB can update their vendored schema and start using the new partitioning methods, but there's nothing stopping us from merging this PR already before MT & TSDB integrated it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's nothing stopping us from merging this PR already before MT & TSDB integrated it.
in fact that's why the mentioned MT pr is stale, because it's waiting for this one to be done so it can pull this code in cleanly.
metric.go
Outdated
@@ -136,27 +113,26 @@ func (m *MetricDefinition) NameWithTags() string { | |||
return m.nameWithTags | |||
} | |||
|
|||
sort.Strings(m.Tags) | |||
nameWithTagsBuffer := &bytes.Buffer{} | |||
writeSortedTagString(nameWithTagsBuffer, m.Name, m.Tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to check for the returned error here or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, this is a bit funny. the only way how writeSortedTagString()
would return an error would be if the first argument is an io.Writer
that returned an error either from Write()
or from WriteString()
. Now, according to their signature, they can return an error. But in our code we only use 3 types of io.Writer
, which are:
bytes.Buffer
xxhash.Digest
fnv.Hash32
None of these 3 can return an error when you look at their code, they are all hard-coded to only return nil
as error. So I'm not sure if we really have to check for an error here. I wouldn't like to modify NameWithTags()
so it returns an error, because then every location that uses it would also have to check for that error again etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writeSortedTagString() can't make assumptions about whether or not the passed in io.Writer actually will return an error or not, so it must check explicitly. (the current implementation does this right), and also return an error via its signature (regardless or whether it truly will at runtime or not)
I think the right solution here is that callers of writeSortedTagString() can choose to ignore the error if they know it'll never happen, which enables NameWithTags to not have to return an error in its signature. However:
- we should be explicit / a bit more clear in the code. this seems to compile fine:
- writeSortedTagString(nameWithTagsBuffer, m.Name, m.Tags)
+ _ = writeSortedTagString(nameWithTagsBuffer, m.Name, m.Tags)
- we can obviously only do this at callsites where we know the io.Writer won't actually return an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mauro points out, we only pass those 3 types of values into writeSortedTagsString and none of them returns errors.
this enables to change the signature of PartitionID to not return an error, at least if the caller promises to always provide a valid PartitionByMethod,.. i guess for now we can leave as is and revise later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me. We get the best of both versions, on one hand NameWithTags()
does not return an error, on the other hand we still make sure that errors don't get swallowed.
I think this is ready for merging now. |
partition.go
Outdated
return partition, nil | ||
} | ||
|
||
func (m *MetricDefinition) Partition(method PartitionByMethod, partitions int32) (int32, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the MetricDefinition struct now has a property .Partition
and a method .Partition()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart from that it doesn't work because of the name conflict
if that's fixed then it should be good to go
Co-Authored-By: Robert Milan <42070645+robert-milan@users.noreply.github.com>
oops i got a bit quixotic again fixed and rebased on master. |
No description provided.