Move tag query evaluation logic into tag expressions #1373

replay · 2019-06-29T01:42:18Z

Moves the tag query expression evaluation logic into the expression types. This provides us more flexibility to combine expressions from meta records and tag queries when we have to take both indexes into account.

One notable change to the logic is that MetricDefinition filters now assume that the tag index they're currently looking at isn't necessarily the only tag index.
F.e. if an expression says tag1!=abc and we see that metric1 does not have tag1 defined via its metric (intrinsic) tags, then we can't be sure yet whether another index (f.e. meta tag index) does assign tag1=abc to metric1. So in this case the filter would return a None decision, which means that then other index(es) would need to get checked for whether this expression is satisfied or not.
On the other hand if the metric (intrinsic) tags would assign tag1=abc to metric1, then it would directly return a conclusive decision Fail and sub-subsequent index(es) can be ignored. If the metric (intrinsic) tags would assign tag1=otherValue to metric1, then it would directly return a conclusive decision Pass because the indexes are checked in order of their priority of conflict resolution.

Currently the lookups in the meta tag index are not implemented yet, this will be coming in the next PR.

The current status of this PR is that it works fine in my manual tests and the unit tests pass. I'm going to add more benchmarks and potentially add some performance improvements, but I'm not expecting any major changes.

This implements step 2) of this list: #660 (comment)

UPDATE, trying to explain the above stuff a bit better:

Filter Functions (Matchers):
The given query expressions, once parsed into the types implementing tagquery.Expression, now provide a method called GetMetricDefinitionFilter(). This method returns a filter function (in other contexts this is also called a "matcher") which can be passed a MetricDefinition and it will return a decision which is either Pass, Fail or None (inconclusive). This allows us to build a chain of filter functions, first from the given expressions, but also from the query expressions associated with meta records which match the given expressions.
So let's say we have an expression tag=~val.* and want to apply it as a filter (not to build the initial result set). Once it's parsed into the according tagquery.Expression we call GetMetricDefinitionFilter() in TagQueryContext.prepareExpressions() to obtain a filter function which we'll use to filter down the result set.
Here comes the important part ->
Additionally, once we want to use the meta tag index, we can use the same tagquery.Expression instance to lookup meta tag records (tagquery.MetaTagRecord) which match the expression tag=~val.* from the meta tag index (UnpartitionedMemoryIdx.metaTags). When we do that lookup from the meta tag index we use the tagquery.Expression instance not as a filter, but to build a set of meta tag records which match the expression tag=~val.*. We do that via its methods GetKey() and ValuePasses() (because the =~ operator matches against the tag value of a key).
The resulting meta tag records come with one or multiple sets of query expressions (MetaTagRecord.Expressions). If any of these sets of query expressions match a MetricDefinition this means that the meta tags associated with the meta tag record get assigned to that Metric and we already know that one of the meta tags matches the expression tag=~val.*, which means that the expression tag=~val.* matches this metric (via the meta tag that gets assigned to it).
Concretely, to implement this, prepareExpressions() will lookup those meta tag records as described and build a set of filter chains from them (because there are multiple sets of expressions associated with each meta tag record, and one expression can result in multiple meta tag records). These filter chains can then be used by testByAllExpressions(). If a MetricDefinition passes any one of the meta tag record based filter chains then it should be part of the result set (so there's a logical OR there).
There will probably be some tuning regarding which filter chain is cheaper to evaluate and which is more expensive, we should try to filter the result set down using the cheap filters first before applying the expensive ones, that's why tagquery.Expression has a GetCostMultiplier() method, to estimate cost of evaluation.

Filter Decisions:
The metric definition filters (tagquery.MetricDefinitionFilter) return a filter decision (tagquery.FilterDecision) which can be one of Fail, Pass or None. The Fail and Pass value simply mean that a filter has been able to conclusively decide that a given MetricDefinition has either passed the filter or has been disqualified by it. The None case is necessary because in certain cases a filter cannot make this decision without also looking at the other indexes that need to be taken into account. Let's take the example tag!=value:
Once we use the meta tag index, the function testByAllExpressions() will first apply a filter function that's been generated from the expression tag!=value to each MetricDefinition of the result set. If a MetricDefinition comes with a tag tag=value, the filter can directly return Fail. If the MetricDefinition comes with a tag tag=otherValue, the filter can directly return Pass. But if the MetricDefinition does not have the tag tag defined at all, then this filter can't make a final decision because it is possible that one of the meta tag records assigns the tag tag=value to the given MetricDefinition, so it returns None. In the case of None, testByAllExpressions() will have to run the given MetricDefinition through the filter chains that have been generated from the meta tag records assigning the tag tag=value (as described above). If one of them matches the given MetricDefinition, then it can be disqualified at that point and we can interrupt the filter chains. If none of them match the given MetricDefinition, the tagquery.Expression object's GetDefaultDecision() is used to make the final decision whether the MetricDefinition matches this expression or not. In the case of the type tagquery.expressionNotEqual (representing !=) the return value of GetDefaultDecision() would be Pass. In the case of other expression types, like for example tagquery.expressionEqual (=) the return value of GetDefaultDecision() would be Fail. Because if a tag does not get assigned to a metric and we're evaluating a = expression, then the decision should be Fail. If a tag does not get assigned to a metric and we're evaluating a != expression, then the result should be Pass.

replay · 2019-07-16T17:40:20Z

Just FYI, those are the current benchmark results compared between master and this branch:

replay@mst-nb:~$  benchcmp /tmp/bench_master  /tmp/bench_branch
benchmark                                                        old ns/op     new ns/op     delta
BenchmarkTagQueryFilterAndIntersect/partitioned-8                11478558      40059410      +248.99%
BenchmarkTagQueryFilterAndIntersect/unPartitioned-8              47054050      179546200     +281.57%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/partitioned-8       31461560      44589916      +41.73%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/unPartitioned-8     156438270     200434630     +28.12%
BenchmarkTagQueryKeysByPrefixSimple/partitioned-8                15807         15468         -2.14%
BenchmarkTagQueryKeysByPrefixSimple/unPartitioned-8              2261          2217          -1.95%
BenchmarkTagQueryKeysByPrefixExpressions/partitioned-8           19747003      9970316       -49.51%
BenchmarkTagQueryKeysByPrefixExpressions/unPartitioned-8         66596885      10764884      -83.84%
BenchmarkTagQueryFilterByEqualExpression/partitioned-8           9365065       9606763       +2.58%
BenchmarkTagQueryFilterByEqualExpression/unPartitioned-8         28489946      34461254      +20.96%
BenchmarkTagQueryFilterByMatchExpression/partitioned-8           49737850      42749714      -14.05%
BenchmarkTagQueryFilterByMatchExpression/unPartitioned-8         297312400     244258900     -17.84%
BenchmarkTagQueryFilterByHasTagExpression/partitioned-8          62335045      57052060      -8.48%
BenchmarkTagQueryFilterByHasTagExpression/unPartitioned-8        359227266     335862066     -6.50%

benchmark                                                        old allocs     new allocs     delta
BenchmarkTagQueryFilterAndIntersect/partitioned-8                7690           5233           -31.95%
BenchmarkTagQueryFilterAndIntersect/unPartitioned-8              3338           883            -73.55%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/partitioned-8       235127         34635          -85.27%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/unPartitioned-8     205174         4759           -97.68%
BenchmarkTagQueryKeysByPrefixSimple/partitioned-8                32             32             +0.00%
BenchmarkTagQueryKeysByPrefixSimple/unPartitioned-8              4              4              +0.00%
BenchmarkTagQueryKeysByPrefixExpressions/partitioned-8           26992          983            -96.36%
BenchmarkTagQueryKeysByPrefixExpressions/unPartitioned-8         3460           206            -94.05%
BenchmarkTagQueryFilterByEqualExpression/partitioned-8           17050          17009          -0.24%
BenchmarkTagQueryFilterByEqualExpression/unPartitioned-8         16615          16607          -0.05%
BenchmarkTagQueryFilterByMatchExpression/partitioned-8           370968         34930          -90.58%
BenchmarkTagQueryFilterByMatchExpression/unPartitioned-8         369369         33379          -90.96%
BenchmarkTagQueryFilterByHasTagExpression/partitioned-8          501437         164938         -67.11%
BenchmarkTagQueryFilterByHasTagExpression/unPartitioned-8        500794         164792         -67.09%

benchmark                                                        old bytes     new bytes     delta
BenchmarkTagQueryFilterAndIntersect/partitioned-8                709176        511772        -27.84%
BenchmarkTagQueryFilterAndIntersect/unPartitioned-8              310965        131580        -57.69%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/partitioned-8       18413345      2381103       -87.07%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/unPartitioned-8     16453477      419914        -97.45%
BenchmarkTagQueryKeysByPrefixSimple/partitioned-8                2624          2624          +0.00%
BenchmarkTagQueryKeysByPrefixSimple/unPartitioned-8              352           352           +0.00%
BenchmarkTagQueryKeysByPrefixExpressions/partitioned-8           1704966       211470        -87.60%
BenchmarkTagQueryKeysByPrefixExpressions/unPartitioned-8         240589        36368         -84.88%
BenchmarkTagQueryFilterByEqualExpression/partitioned-8           4691539       4687239       -0.09%
BenchmarkTagQueryFilterByEqualExpression/unPartitioned-8         4068719       4067700       -0.03%
BenchmarkTagQueryFilterByMatchExpression/partitioned-8           36166214      9278016       -74.35%
BenchmarkTagQueryFilterByMatchExpression/unPartitioned-8         35042364      8164630       -76.70%
BenchmarkTagQueryFilterByHasTagExpression/partitioned-8          68448358      41549945      -39.30%
BenchmarkTagQueryFilterByHasTagExpression/unPartitioned-8        63688786      36831418      -42.17%

Most of them are looking good, but two show a significant slow-down. I'll investigate where that slow down comes from and try to fix it.

replay · 2019-07-16T23:48:57Z

The good news is that I now know why some benchmarks are performing worse, the bad news is that I have no good idea how to improve that.

In the current master we only have one tag index. This means that to evaluate the condition tag=value we only needs to check whether a certain id exists in the tag index under the keys [tag][value]. If it does the condition is satisfied, otherwise it is not.
We now need to work based on the assumption that there are multiple indexes that can assign tags to a metric, with a defined order of precedence. This means that when we see that a metric does not exist in the primary tag index in [tag][value], then we need to iterate over its tags and check if any of them assign the tag tag with a different value to it (f.e. tag=anothervalue). If that's the case then we can directly return saying that the condition is not satisfied, because we check the different tag indexes in decreasing order of precedence so tag=anothervalue would have precedence even if a subsequent tag index would assign tag=value to this metric. Only if by iterating over a metric's .Tags property we don't see the tag that we're looking for assigning a different value, then we go on to check the next index (meta tag index).

The difference in these benchmarks comes from the additional looping over the .Tags property if there was no exact match for tag=value, to check whether there's another entry beginning with tag=.

replay · 2019-07-17T15:13:54Z

I think to solve this above described issue with the performance regression I'll add another option similar to tag-support = true/false, I'll call it meta-tag-support = true/false. That way it should be possible to avoid performance regressions at least when meta tag support is turned off.

replay · 2019-07-18T00:09:32Z

With the latest 3 commits the performance has gotten better, but not as good as master yet:

benchmark                                                        old ns/op     new ns/op     delta
BenchmarkTagQueryFilterAndIntersect/partitioned-8                40323084      31360866      -22.23%
BenchmarkTagQueryFilterAndIntersect/unPartitioned-8              180773119     149006891     -17.57%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/partitioned-8       45430404      45239445      -0.42%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/unPartitioned-8     199925766     195143375     -2.39%
BenchmarkTagQueryKeysByPrefixSimple/partitioned-8                17038         17804         +4.50%
BenchmarkTagQueryKeysByPrefixSimple/unPartitioned-8              2291          2328          +1.62%
BenchmarkTagQueryKeysByPrefixExpressions/partitioned-8           9962729       9969944       +0.07%
BenchmarkTagQueryKeysByPrefixExpressions/unPartitioned-8         10184006      10596547      +4.05%
BenchmarkTagQueryFilterByEqualExpression/partitioned-8           11629725      11567934      -0.53%
BenchmarkTagQueryFilterByEqualExpression/unPartitioned-8         34105115      29431685      -13.70%
BenchmarkTagQueryFilterByMatchExpression/partitioned-8           47443844      47445716      +0.00%
BenchmarkTagQueryFilterByMatchExpression/unPartitioned-8         238119214     239374546     +0.53%
BenchmarkTagQueryFilterByHasTagExpression/partitioned-8          72438808      74312474      +2.59%
BenchmarkTagQueryFilterByHasTagExpression/unPartitioned-8        317500930     323824766     +1.99%

benchmark                                                        old allocs     new allocs     delta
BenchmarkTagQueryFilterAndIntersect/partitioned-8                5159           5123           -0.70%
BenchmarkTagQueryFilterAndIntersect/unPartitioned-8              825            822            -0.36%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/partitioned-8       34615          34623          +0.02%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/unPartitioned-8     4758           4759           +0.02%
BenchmarkTagQueryKeysByPrefixSimple/partitioned-8                32             32             +0.00%
BenchmarkTagQueryKeysByPrefixSimple/unPartitioned-8              4              4              +0.00%
BenchmarkTagQueryKeysByPrefixExpressions/partitioned-8           984            991            +0.71%
BenchmarkTagQueryKeysByPrefixExpressions/unPartitioned-8         206            207            +0.49%
BenchmarkTagQueryFilterByEqualExpression/partitioned-8           17009          17000          -0.05%
BenchmarkTagQueryFilterByEqualExpression/unPartitioned-8         16610          16611          +0.01%
BenchmarkTagQueryFilterByMatchExpression/partitioned-8           34920          34927          +0.02%
BenchmarkTagQueryFilterByMatchExpression/unPartitioned-8         33355          33357          +0.01%
BenchmarkTagQueryFilterByHasTagExpression/partitioned-8          164903         164917         +0.01%
BenchmarkTagQueryFilterByHasTagExpression/unPartitioned-8        164780         164776         -0.00%

benchmark                                                        old bytes     new bytes     delta
BenchmarkTagQueryFilterAndIntersect/partitioned-8                496563        496810        +0.05%
BenchmarkTagQueryFilterAndIntersect/unPartitioned-8              104305        101565        -2.63%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/partitioned-8       2357742       2358837       +0.05%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/unPartitioned-8     395076        388860        -1.57%
BenchmarkTagQueryKeysByPrefixSimple/partitioned-8                2624          2624          +0.00%
BenchmarkTagQueryKeysByPrefixSimple/unPartitioned-8              352           352           +0.00%
BenchmarkTagQueryKeysByPrefixExpressions/partitioned-8           210184        210344        +0.08%
BenchmarkTagQueryKeysByPrefixExpressions/unPartitioned-8         33915         33963         +0.14%
BenchmarkTagQueryFilterByEqualExpression/partitioned-8           4687368       4687182       -0.00%
BenchmarkTagQueryFilterByEqualExpression/unPartitioned-8         4068257       4068524       +0.01%
BenchmarkTagQueryFilterByMatchExpression/partitioned-8           9274441       9274677       +0.00%
BenchmarkTagQueryFilterByMatchExpression/unPartitioned-8         8120561       8120910       +0.00%
BenchmarkTagQueryFilterByHasTagExpression/partitioned-8          41543443      41544661      +0.00%
BenchmarkTagQueryFilterByHasTagExpression/unPartitioned-8        36818300      36818546      +0.00%

Dieterbe · 2019-07-18T06:47:44Z

iterating over a metric's .Tags property

This should be really fast. Do we do more than merely iterate and check for equality ? Any parsing or more logic going on ?

replay · 2019-07-18T11:35:53Z

@Dieterbe no, it just used to compare the strings for equality, while now it is doing a lookup by id from a map: 39e35e2#diff-cb68716cfded2a2336a3778f548d7a17R41
keep in mind that within one query this can happen many millions of times, so an insignificant seeming difference can have impact

this allows us to minimize the number of ids that need to get filtered by the filters, while at the same time still taking the execution cost of each filter into account. f.e. if we have the choice between first calling a filter which uses regex or first calling a filter that doesn't use regex, we'd first want to call the one that's not using regex and let it reduce the size of the potential result set, that way the regex-using filter would later only get applied on a smaller set of potential results.

also fix some confusing terminology in variable names

this makes HAS_TAG correctly indicate that its value matches exactly, and it also uses the optimized lookup when doing the lookup of the initial ID set in getInitialByTag()

this renames some methods to make it clearer what they do. it also adds another explanator comment.

this changes the sortByCost logic so it puts more weight on the expression type, and only takes cardinality into account when it has to sort two expressions of the same operator cost. in the benchmarks this seems to lead to better results.

api/cluster.go

api/graphite.go

robert-milan · 2019-08-07T14:56:28Z

idx/memory/memory.go

@@ -84,6 +82,8 @@ func ConfigSetup() {
 	memoryIdx.DurationVar(&findCacheBackoffTime, "find-cache-backoff-time", time.Minute, "amount of time to disable the findCache when the invalidate queue fills up.")
 	memoryIdx.StringVar(&indexRulesFile, "rules-file", "/etc/metrictank/index-rules.conf", "path to index-rules.conf file")
 	memoryIdx.StringVar(&maxPruneLockTimeStr, "max-prune-lock-time", "100ms", "Maximum duration each second a prune job can lock the index.")
+	memoryIdx.IntVar(&tagquery.MatchCacheSize, "match-cache-size", 1000, "size of regular expression cache in tag query evaluation")


I don't think we should be loading variables into another package from here. Can we move the variable initialization code into tagquery? Same would apply to tagquery.MetaTagSupport.

actually that's what i've done at first, then @Dieterbe asked me to keep the settings in the memory-idx (which i think makes sense). It's just unfortunate that now we have to set this setting across packages
#1373 (comment)

Is there something else we can do here then? Like keep them completely contained within the index and then after they are loaded initialize them into tagquery with its own variables to hold them? Although then if we ever want to change them at runtime it would be a bit trickier.

It just feels like we are creating a lot of codependency here.

Unfortunately the package tagquery can't import the package memory, this would result in a cycle. We could initialize a setting in memory, and then in ConfigProcess() just copy that value into tagquery.MetaTagSupport / tagquery.MatchCacheSize. Then both of these two packages would hold a copy of the setting.

Yeah something like that.

done: aefbc3c

expr/tagquery/expression.go

Co-Authored-By: Robert Milan <42070645+robert-milan@users.noreply.github.com>

robert-milan

LGTM

replay changed the title ~~[WIP] Move evaluation logic into tag expressions~~ [WIP] Move tag query evaluation logic into tag expressions Jul 1, 2019

replay force-pushed the move_evaluation_logic_into_tag_expressions branch 12 times, most recently from 3e06b88 to ea77afc Compare July 6, 2019 19:52

replay force-pushed the move_evaluation_logic_into_tag_expressions branch 6 times, most recently from 1ccbefd to cda3515 Compare July 16, 2019 01:20

replay changed the title ~~[WIP] Move tag query evaluation logic into tag expressions~~ Move tag query evaluation logic into tag expressions Jul 16, 2019

replay force-pushed the move_evaluation_logic_into_tag_expressions branch 3 times, most recently from 2eacdba to 13609e1 Compare July 16, 2019 17:26

replay force-pushed the move_evaluation_logic_into_tag_expressions branch from 13609e1 to 3738209 Compare July 16, 2019 21:24

replay and others added 20 commits August 7, 2019 10:02

use direct key lookup if possible

4a5892b

fix tests

dee5a34

cleaner way to build error

480706f

better comments

d5da8f1

remove dead code 'HasRe'

e3be86e

move tag query options into memory index

e748c64

add comparison methods to meta tag records and expression types

667e2f6

also fix some confusing terminology in variable names

add HashExpression method to meta tag record type

f19b46d

add comment

954172b

adding comments

6a53c6f

update comment

5935500

faster meta tag record comparison

13059a0

update docs to reflect config parameter changes

b1df23b

more comments to explain type Expression

bafc023

fix json response format bug

830f42f

fix bug when initial expression has type HAS_TAG

590aaa5

this makes HAS_TAG correctly indicate that its value matches exactly, and it also uses the optimized lookup when doing the lookup of the initial ID set in getInitialByTag()

better naming and additional comment

9d516c8

this renames some methods to make it clearer what they do. it also adds another explanator comment.

performance tuning

bc6c8b7

this changes the sortByCost logic so it puts more weight on the expression type, and only takes cardinality into account when it has to sort two expressions of the same operator cost. in the benchmarks this seems to lead to better results.

fix benchmark TagQueryKeysByPrefixSimple

142a893

replay force-pushed the move_evaluation_logic_into_tag_expressions branch from fe39ea0 to 142a893 Compare August 7, 2019 14:07

robert-milan reviewed Aug 8, 2019

View reviewed changes

Apply suggestions from code review

e565d55

Co-Authored-By: Robert Milan <42070645+robert-milan@users.noreply.github.com>

replay force-pushed the move_evaluation_logic_into_tag_expressions branch from 345ffae to e565d55 Compare August 8, 2019 13:58

initialize index settings into local variables

aefbc3c

robert-milan approved these changes Aug 8, 2019

View reviewed changes

bugfix in expression_not_has_tag

3103854

replay mentioned this pull request Aug 9, 2019

Implement series lookup and filtering by meta tag #1423

Merged

replay merged commit 494c5f8 into master Aug 9, 2019

replay deleted the move_evaluation_logic_into_tag_expressions branch August 9, 2019 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move tag query evaluation logic into tag expressions #1373

Move tag query evaluation logic into tag expressions #1373

replay commented Jun 29, 2019 •

edited

Loading

replay commented Jul 16, 2019

replay commented Jul 16, 2019 •

edited

Loading

replay commented Jul 17, 2019 •

edited

Loading

replay commented Jul 18, 2019

Dieterbe commented Jul 18, 2019

replay commented Jul 18, 2019 •

edited

Loading

robert-milan Aug 7, 2019

replay Aug 8, 2019 •

edited

Loading

robert-milan Aug 8, 2019

replay Aug 8, 2019 •

edited

Loading

robert-milan Aug 8, 2019

replay Aug 8, 2019

robert-milan left a comment

Move tag query evaluation logic into tag expressions #1373

Move tag query evaluation logic into tag expressions #1373

Conversation

replay commented Jun 29, 2019 • edited Loading

replay commented Jul 16, 2019

replay commented Jul 16, 2019 • edited Loading

replay commented Jul 17, 2019 • edited Loading

replay commented Jul 18, 2019

Dieterbe commented Jul 18, 2019

replay commented Jul 18, 2019 • edited Loading

robert-milan Aug 7, 2019

Choose a reason for hiding this comment

replay Aug 8, 2019 • edited Loading

Choose a reason for hiding this comment

robert-milan Aug 8, 2019

Choose a reason for hiding this comment

replay Aug 8, 2019 • edited Loading

Choose a reason for hiding this comment

robert-milan Aug 8, 2019

Choose a reason for hiding this comment

replay Aug 8, 2019

Choose a reason for hiding this comment

robert-milan left a comment

Choose a reason for hiding this comment

replay commented Jun 29, 2019 •

edited

Loading

replay commented Jul 16, 2019 •

edited

Loading

replay commented Jul 17, 2019 •

edited

Loading

replay commented Jul 18, 2019 •

edited

Loading

replay Aug 8, 2019 •

edited

Loading

replay Aug 8, 2019 •

edited

Loading