-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support tagfiltertree for fast matching metricIDs to queries #4310
base: master
Are you sure you want to change the base?
Conversation
} | ||
|
||
// IsVarTagValue returns true if the value is a variable tag value. | ||
func IsVarTagValue(value string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some explanation? e.g. I am not sure why len<4 means false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a Var value type is of the form "{{...}}" so the minimum is 4 chars. Anything lesser and we know it's not a Var value type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add comments to explain what Var Tag means. No reviewer can correlate "VarTagValue" with "...{{...}}..." pattern
type Tag struct { | ||
Name string | ||
Val string | ||
Var string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the usage of Var? I don't find any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Var will be used in Namespace Attribution to support capture-groups like functionality in Regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it will not be used in query matching? nit: add some comments to make the code more readable
return false | ||
} | ||
|
||
// IsMatchNoneTag returns true if the tag is a match none tag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsNegation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done.
} | ||
tagValue, tagNameFound := tags[name] | ||
absVal, absValFound := node.AbsoluteValues[tagValue] | ||
if tagNameFound && absValFound { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isMatchAny==false, then within this if condition, you will never return true? is that correct logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha! good catch! yeah this is a bug. We cannot rely on the returned "matched" bool flag. I'll fix it.
@@ -186,7 +186,6 @@ linters: | |||
- gci | |||
- goconst | |||
- gocritic | |||
- golint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
golint is deprecated now and mainly not detecting the type correctly when using Generics.
src/metrics/tagfiltertree/README.md
Outdated
a set of tag filters. One such use-case is metric attribution to namespaces. | ||
Iterating through each filter individually and matching them is extremely expensive | ||
since it has to be done on each incoming metricID. Therefore, this data structure | ||
pre-compiles a set of tag filters in order to optimize matches against an input metricID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand this paragraph.
- "attribution to namespaces" - what does this mean? What is a namespace in this context?
- How does this pre-compiled data structure prevent you from having to do matching on each incoming metricID?
Perhaps a diagram or example would help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the Readme
pre-compiles a set of tag filters in order to optimize matches against an input metricID. | ||
|
||
## Usage | ||
First create a trie using New() and then add tagFilters using AddTagFilter(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then I guess you use Match
somehow? A code example here would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
This PR introduces a new data structure which we call "tagfiltertree".
A query consists of several tag filters. A tag filter is nothing but a tag and value pair where the value is a pattern much like a regex. For instance, "service:foo" is a valid tag filter. A query is nothing but a list of such tag filters that need to be matched with Conjunction within an incoming metricID.
The key observation behind creating a tree structure is that the presence of a tag in a metricID and a query has the potential to drastically prune the search space of the possible queries going forward. So much so that within a couple comparisons we expect an incoming metricID to output the entire list of matched queries.
This package optimizes CPU and memory for match time as opposed to creation of the tree itself. It is not thread-safe so make sure to protect the tree appropriately if it is going to be used in a multithreaded context.
Refer to the README.md for more details.
What this PR does / why we need it:
Use this data structure when you want to quickly match an input metricID to a list of queries.
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: