Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

Move tag query evaluation logic into tag expressions #1373

Merged
merged 40 commits into from
Aug 9, 2019

Conversation

replay
Copy link
Contributor

@replay replay commented Jun 29, 2019

Moves the tag query expression evaluation logic into the expression types. This provides us more flexibility to combine expressions from meta records and tag queries when we have to take both indexes into account.

One notable change to the logic is that MetricDefinition filters now assume that the tag index they're currently looking at isn't necessarily the only tag index.
F.e. if an expression says tag1!=abc and we see that metric1 does not have tag1 defined via its metric (intrinsic) tags, then we can't be sure yet whether another index (f.e. meta tag index) does assign tag1=abc to metric1. So in this case the filter would return a None decision, which means that then other index(es) would need to get checked for whether this expression is satisfied or not.
On the other hand if the metric (intrinsic) tags would assign tag1=abc to metric1, then it would directly return a conclusive decision Fail and sub-subsequent index(es) can be ignored. If the metric (intrinsic) tags would assign tag1=otherValue to metric1, then it would directly return a conclusive decision Pass because the indexes are checked in order of their priority of conflict resolution.

Currently the lookups in the meta tag index are not implemented yet, this will be coming in the next PR.

The current status of this PR is that it works fine in my manual tests and the unit tests pass. I'm going to add more benchmarks and potentially add some performance improvements, but I'm not expecting any major changes.

This implements step 2) of this list: #660 (comment)

UPDATE, trying to explain the above stuff a bit better:

Filter Functions (Matchers):
The given query expressions, once parsed into the types implementing tagquery.Expression, now provide a method called GetMetricDefinitionFilter(). This method returns a filter function (in other contexts this is also called a "matcher") which can be passed a MetricDefinition and it will return a decision which is either Pass, Fail or None (inconclusive). This allows us to build a chain of filter functions, first from the given expressions, but also from the query expressions associated with meta records which match the given expressions.
So let's say we have an expression tag=~val.* and want to apply it as a filter (not to build the initial result set). Once it's parsed into the according tagquery.Expression we call GetMetricDefinitionFilter() in TagQueryContext.prepareExpressions() to obtain a filter function which we'll use to filter down the result set.
Here comes the important part ->
Additionally, once we want to use the meta tag index, we can use the same tagquery.Expression instance to lookup meta tag records (tagquery.MetaTagRecord) which match the expression tag=~val.* from the meta tag index (UnpartitionedMemoryIdx.metaTags). When we do that lookup from the meta tag index we use the tagquery.Expression instance not as a filter, but to build a set of meta tag records which match the expression tag=~val.*. We do that via its methods GetKey() and ValuePasses() (because the =~ operator matches against the tag value of a key).
The resulting meta tag records come with one or multiple sets of query expressions (MetaTagRecord.Expressions). If any of these sets of query expressions match a MetricDefinition this means that the meta tags associated with the meta tag record get assigned to that Metric and we already know that one of the meta tags matches the expression tag=~val.*, which means that the expression tag=~val.* matches this metric (via the meta tag that gets assigned to it).
Concretely, to implement this, prepareExpressions() will lookup those meta tag records as described and build a set of filter chains from them (because there are multiple sets of expressions associated with each meta tag record, and one expression can result in multiple meta tag records). These filter chains can then be used by testByAllExpressions(). If a MetricDefinition passes any one of the meta tag record based filter chains then it should be part of the result set (so there's a logical OR there).
There will probably be some tuning regarding which filter chain is cheaper to evaluate and which is more expensive, we should try to filter the result set down using the cheap filters first before applying the expensive ones, that's why tagquery.Expression has a GetCostMultiplier() method, to estimate cost of evaluation.

Filter Decisions:
The metric definition filters (tagquery.MetricDefinitionFilter) return a filter decision (tagquery.FilterDecision) which can be one of Fail, Pass or None. The Fail and Pass value simply mean that a filter has been able to conclusively decide that a given MetricDefinition has either passed the filter or has been disqualified by it. The None case is necessary because in certain cases a filter cannot make this decision without also looking at the other indexes that need to be taken into account. Let's take the example tag!=value:
Once we use the meta tag index, the function testByAllExpressions() will first apply a filter function that's been generated from the expression tag!=value to each MetricDefinition of the result set. If a MetricDefinition comes with a tag tag=value, the filter can directly return Fail. If the MetricDefinition comes with a tag tag=otherValue, the filter can directly return Pass. But if the MetricDefinition does not have the tag tag defined at all, then this filter can't make a final decision because it is possible that one of the meta tag records assigns the tag tag=value to the given MetricDefinition, so it returns None. In the case of None, testByAllExpressions() will have to run the given MetricDefinition through the filter chains that have been generated from the meta tag records assigning the tag tag=value (as described above). If one of them matches the given MetricDefinition, then it can be disqualified at that point and we can interrupt the filter chains. If none of them match the given MetricDefinition, the tagquery.Expression object's GetDefaultDecision() is used to make the final decision whether the MetricDefinition matches this expression or not. In the case of the type tagquery.expressionNotEqual (representing !=) the return value of GetDefaultDecision() would be Pass. In the case of other expression types, like for example tagquery.expressionEqual (=) the return value of GetDefaultDecision() would be Fail. Because if a tag does not get assigned to a metric and we're evaluating a = expression, then the decision should be Fail. If a tag does not get assigned to a metric and we're evaluating a != expression, then the result should be Pass.

@replay replay changed the title [WIP] Move evaluation logic into tag expressions [WIP] Move tag query evaluation logic into tag expressions Jul 1, 2019
@replay replay force-pushed the move_evaluation_logic_into_tag_expressions branch 12 times, most recently from 3e06b88 to ea77afc Compare July 6, 2019 19:52
@replay replay force-pushed the move_evaluation_logic_into_tag_expressions branch 6 times, most recently from 1ccbefd to cda3515 Compare July 16, 2019 01:20
@replay replay changed the title [WIP] Move tag query evaluation logic into tag expressions Move tag query evaluation logic into tag expressions Jul 16, 2019
@replay replay force-pushed the move_evaluation_logic_into_tag_expressions branch 3 times, most recently from 2eacdba to 13609e1 Compare July 16, 2019 17:26
@replay
Copy link
Contributor Author

replay commented Jul 16, 2019

Just FYI, those are the current benchmark results compared between master and this branch:

replay@mst-nb:~$  benchcmp /tmp/bench_master  /tmp/bench_branch
benchmark                                                        old ns/op     new ns/op     delta
BenchmarkTagQueryFilterAndIntersect/partitioned-8                11478558      40059410      +248.99%
BenchmarkTagQueryFilterAndIntersect/unPartitioned-8              47054050      179546200     +281.57%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/partitioned-8       31461560      44589916      +41.73%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/unPartitioned-8     156438270     200434630     +28.12%
BenchmarkTagQueryKeysByPrefixSimple/partitioned-8                15807         15468         -2.14%
BenchmarkTagQueryKeysByPrefixSimple/unPartitioned-8              2261          2217          -1.95%
BenchmarkTagQueryKeysByPrefixExpressions/partitioned-8           19747003      9970316       -49.51%
BenchmarkTagQueryKeysByPrefixExpressions/unPartitioned-8         66596885      10764884      -83.84%
BenchmarkTagQueryFilterByEqualExpression/partitioned-8           9365065       9606763       +2.58%
BenchmarkTagQueryFilterByEqualExpression/unPartitioned-8         28489946      34461254      +20.96%
BenchmarkTagQueryFilterByMatchExpression/partitioned-8           49737850      42749714      -14.05%
BenchmarkTagQueryFilterByMatchExpression/unPartitioned-8         297312400     244258900     -17.84%
BenchmarkTagQueryFilterByHasTagExpression/partitioned-8          62335045      57052060      -8.48%
BenchmarkTagQueryFilterByHasTagExpression/unPartitioned-8        359227266     335862066     -6.50%

benchmark                                                        old allocs     new allocs     delta
BenchmarkTagQueryFilterAndIntersect/partitioned-8                7690           5233           -31.95%
BenchmarkTagQueryFilterAndIntersect/unPartitioned-8              3338           883            -73.55%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/partitioned-8       235127         34635          -85.27%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/unPartitioned-8     205174         4759           -97.68%
BenchmarkTagQueryKeysByPrefixSimple/partitioned-8                32             32             +0.00%
BenchmarkTagQueryKeysByPrefixSimple/unPartitioned-8              4              4              +0.00%
BenchmarkTagQueryKeysByPrefixExpressions/partitioned-8           26992          983            -96.36%
BenchmarkTagQueryKeysByPrefixExpressions/unPartitioned-8         3460           206            -94.05%
BenchmarkTagQueryFilterByEqualExpression/partitioned-8           17050          17009          -0.24%
BenchmarkTagQueryFilterByEqualExpression/unPartitioned-8         16615          16607          -0.05%
BenchmarkTagQueryFilterByMatchExpression/partitioned-8           370968         34930          -90.58%
BenchmarkTagQueryFilterByMatchExpression/unPartitioned-8         369369         33379          -90.96%
BenchmarkTagQueryFilterByHasTagExpression/partitioned-8          501437         164938         -67.11%
BenchmarkTagQueryFilterByHasTagExpression/unPartitioned-8        500794         164792         -67.09%

benchmark                                                        old bytes     new bytes     delta
BenchmarkTagQueryFilterAndIntersect/partitioned-8                709176        511772        -27.84%
BenchmarkTagQueryFilterAndIntersect/unPartitioned-8              310965        131580        -57.69%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/partitioned-8       18413345      2381103       -87.07%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/unPartitioned-8     16453477      419914        -97.45%
BenchmarkTagQueryKeysByPrefixSimple/partitioned-8                2624          2624          +0.00%
BenchmarkTagQueryKeysByPrefixSimple/unPartitioned-8              352           352           +0.00%
BenchmarkTagQueryKeysByPrefixExpressions/partitioned-8           1704966       211470        -87.60%
BenchmarkTagQueryKeysByPrefixExpressions/unPartitioned-8         240589        36368         -84.88%
BenchmarkTagQueryFilterByEqualExpression/partitioned-8           4691539       4687239       -0.09%
BenchmarkTagQueryFilterByEqualExpression/unPartitioned-8         4068719       4067700       -0.03%
BenchmarkTagQueryFilterByMatchExpression/partitioned-8           36166214      9278016       -74.35%
BenchmarkTagQueryFilterByMatchExpression/unPartitioned-8         35042364      8164630       -76.70%
BenchmarkTagQueryFilterByHasTagExpression/partitioned-8          68448358      41549945      -39.30%
BenchmarkTagQueryFilterByHasTagExpression/unPartitioned-8        63688786      36831418      -42.17%

Most of them are looking good, but two show a significant slow-down. I'll investigate where that slow down comes from and try to fix it.

@replay replay force-pushed the move_evaluation_logic_into_tag_expressions branch from 13609e1 to 3738209 Compare July 16, 2019 21:24
@replay
Copy link
Contributor Author

replay commented Jul 16, 2019

The good news is that I now know why some benchmarks are performing worse, the bad news is that I have no good idea how to improve that.

  • In the current master we only have one tag index. This means that to evaluate the condition tag=value we only needs to check whether a certain id exists in the tag index under the keys [tag][value]. If it does the condition is satisfied, otherwise it is not.
  • We now need to work based on the assumption that there are multiple indexes that can assign tags to a metric, with a defined order of precedence. This means that when we see that a metric does not exist in the primary tag index in [tag][value], then we need to iterate over its tags and check if any of them assign the tag tag with a different value to it (f.e. tag=anothervalue). If that's the case then we can directly return saying that the condition is not satisfied, because we check the different tag indexes in decreasing order of precedence so tag=anothervalue would have precedence even if a subsequent tag index would assign tag=value to this metric. Only if by iterating over a metric's .Tags property we don't see the tag that we're looking for assigning a different value, then we go on to check the next index (meta tag index).

The difference in these benchmarks comes from the additional looping over the .Tags property if there was no exact match for tag=value, to check whether there's another entry beginning with tag=.

@replay
Copy link
Contributor Author

replay commented Jul 17, 2019

I think to solve this above described issue with the performance regression I'll add another option similar to tag-support = true/false, I'll call it meta-tag-support = true/false. That way it should be possible to avoid performance regressions at least when meta tag support is turned off.

@replay
Copy link
Contributor Author

replay commented Jul 18, 2019

With the latest 3 commits the performance has gotten better, but not as good as master yet:

benchmark                                                        old ns/op     new ns/op     delta
BenchmarkTagQueryFilterAndIntersect/partitioned-8                40323084      31360866      -22.23%
BenchmarkTagQueryFilterAndIntersect/unPartitioned-8              180773119     149006891     -17.57%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/partitioned-8       45430404      45239445      -0.42%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/unPartitioned-8     199925766     195143375     -2.39%
BenchmarkTagQueryKeysByPrefixSimple/partitioned-8                17038         17804         +4.50%
BenchmarkTagQueryKeysByPrefixSimple/unPartitioned-8              2291          2328          +1.62%
BenchmarkTagQueryKeysByPrefixExpressions/partitioned-8           9962729       9969944       +0.07%
BenchmarkTagQueryKeysByPrefixExpressions/unPartitioned-8         10184006      10596547      +4.05%
BenchmarkTagQueryFilterByEqualExpression/partitioned-8           11629725      11567934      -0.53%
BenchmarkTagQueryFilterByEqualExpression/unPartitioned-8         34105115      29431685      -13.70%
BenchmarkTagQueryFilterByMatchExpression/partitioned-8           47443844      47445716      +0.00%
BenchmarkTagQueryFilterByMatchExpression/unPartitioned-8         238119214     239374546     +0.53%
BenchmarkTagQueryFilterByHasTagExpression/partitioned-8          72438808      74312474      +2.59%
BenchmarkTagQueryFilterByHasTagExpression/unPartitioned-8        317500930     323824766     +1.99%

benchmark                                                        old allocs     new allocs     delta
BenchmarkTagQueryFilterAndIntersect/partitioned-8                5159           5123           -0.70%
BenchmarkTagQueryFilterAndIntersect/unPartitioned-8              825            822            -0.36%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/partitioned-8       34615          34623          +0.02%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/unPartitioned-8     4758           4759           +0.02%
BenchmarkTagQueryKeysByPrefixSimple/partitioned-8                32             32             +0.00%
BenchmarkTagQueryKeysByPrefixSimple/unPartitioned-8              4              4              +0.00%
BenchmarkTagQueryKeysByPrefixExpressions/partitioned-8           984            991            +0.71%
BenchmarkTagQueryKeysByPrefixExpressions/unPartitioned-8         206            207            +0.49%
BenchmarkTagQueryFilterByEqualExpression/partitioned-8           17009          17000          -0.05%
BenchmarkTagQueryFilterByEqualExpression/unPartitioned-8         16610          16611          +0.01%
BenchmarkTagQueryFilterByMatchExpression/partitioned-8           34920          34927          +0.02%
BenchmarkTagQueryFilterByMatchExpression/unPartitioned-8         33355          33357          +0.01%
BenchmarkTagQueryFilterByHasTagExpression/partitioned-8          164903         164917         +0.01%
BenchmarkTagQueryFilterByHasTagExpression/unPartitioned-8        164780         164776         -0.00%

benchmark                                                        old bytes     new bytes     delta
BenchmarkTagQueryFilterAndIntersect/partitioned-8                496563        496810        +0.05%
BenchmarkTagQueryFilterAndIntersect/unPartitioned-8              104305        101565        -2.63%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/partitioned-8       2357742       2358837       +0.05%
BenchmarkTagQueryFilterAndIntersectOnlyRegex/unPartitioned-8     395076        388860        -1.57%
BenchmarkTagQueryKeysByPrefixSimple/partitioned-8                2624          2624          +0.00%
BenchmarkTagQueryKeysByPrefixSimple/unPartitioned-8              352           352           +0.00%
BenchmarkTagQueryKeysByPrefixExpressions/partitioned-8           210184        210344        +0.08%
BenchmarkTagQueryKeysByPrefixExpressions/unPartitioned-8         33915         33963         +0.14%
BenchmarkTagQueryFilterByEqualExpression/partitioned-8           4687368       4687182       -0.00%
BenchmarkTagQueryFilterByEqualExpression/unPartitioned-8         4068257       4068524       +0.01%
BenchmarkTagQueryFilterByMatchExpression/partitioned-8           9274441       9274677       +0.00%
BenchmarkTagQueryFilterByMatchExpression/unPartitioned-8         8120561       8120910       +0.00%
BenchmarkTagQueryFilterByHasTagExpression/partitioned-8          41543443      41544661      +0.00%
BenchmarkTagQueryFilterByHasTagExpression/unPartitioned-8        36818300      36818546      +0.00%

@Dieterbe
Copy link
Contributor

iterating over a metric's .Tags property

This should be really fast. Do we do more than merely iterate and check for equality ? Any parsing or more logic going on ?

@replay
Copy link
Contributor Author

replay commented Jul 18, 2019

@Dieterbe no, it just used to compare the strings for equality, while now it is doing a lookup by id from a map: 39e35e2#diff-cb68716cfded2a2336a3778f548d7a17R41
keep in mind that within one query this can happen many millions of times, so an insignificant seeming difference can have impact

replay and others added 20 commits August 7, 2019 10:02
this allows us to minimize the number of ids that need to get filtered
by the filters, while at the same time still taking the execution cost
of each filter into account.
f.e. if we have the choice between first calling a filter which uses
regex or first calling a filter that doesn't use regex, we'd first want
to call the one that's not using regex and let it reduce the size of the
potential result set, that way the regex-using filter would later only
get applied on a smaller set of potential results.
also fix some confusing terminology in variable names
this makes HAS_TAG correctly indicate that its value matches exactly,
and it also uses the optimized lookup when doing the lookup of the
initial ID set in getInitialByTag()
this renames some methods to make it clearer what they do.
it also adds another explanator comment.
this changes the sortByCost logic so it puts more weight on the
expression type, and only takes cardinality into account when it has to
sort two expressions of the same operator cost.
in the benchmarks this seems to lead to better results.
@replay replay force-pushed the move_evaluation_logic_into_tag_expressions branch from fe39ea0 to 142a893 Compare August 7, 2019 14:07
@@ -84,6 +82,8 @@ func ConfigSetup() {
memoryIdx.DurationVar(&findCacheBackoffTime, "find-cache-backoff-time", time.Minute, "amount of time to disable the findCache when the invalidate queue fills up.")
memoryIdx.StringVar(&indexRulesFile, "rules-file", "/etc/metrictank/index-rules.conf", "path to index-rules.conf file")
memoryIdx.StringVar(&maxPruneLockTimeStr, "max-prune-lock-time", "100ms", "Maximum duration each second a prune job can lock the index.")
memoryIdx.IntVar(&tagquery.MatchCacheSize, "match-cache-size", 1000, "size of regular expression cache in tag query evaluation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be loading variables into another package from here. Can we move the variable initialization code into tagquery? Same would apply to tagquery.MetaTagSupport.

Copy link
Contributor Author

@replay replay Aug 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually that's what i've done at first, then @Dieterbe asked me to keep the settings in the memory-idx (which i think makes sense). It's just unfortunate that now we have to set this setting across packages
#1373 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something else we can do here then? Like keep them completely contained within the index and then after they are loaded initialize them into tagquery with its own variables to hold them? Although then if we ever want to change them at runtime it would be a bit trickier.

It just feels like we are creating a lot of codependency here.

Copy link
Contributor Author

@replay replay Aug 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the package tagquery can't import the package memory, this would result in a cycle. We could initialize a setting in memory, and then in ConfigProcess() just copy that value into tagquery.MetaTagSupport / tagquery.MatchCacheSize. Then both of these two packages would hold a copy of the setting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done: aefbc3c

Co-Authored-By: Robert Milan <42070645+robert-milan@users.noreply.github.com>
@replay replay force-pushed the move_evaluation_logic_into_tag_expressions branch from 345ffae to e565d55 Compare August 8, 2019 13:58
Copy link
Contributor

@robert-milan robert-milan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@replay replay merged commit 494c5f8 into master Aug 9, 2019
@replay replay deleted the move_evaluation_logic_into_tag_expressions branch August 9, 2019 14:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants