Skip to content
This repository has been archived by the owner on Aug 13, 2019. It is now read-only.

Optimize queries using regex matchers for set lookups #602

Merged
merged 19 commits into from
May 27, 2019

Conversation

naivewong
Copy link
Contributor

@naivewong naivewong commented May 14, 2019

naivewong added 4 commits May 13, 2019 21:53
Signed-off-by: naivewong <867245430@qq.com>
Signed-off-by: naivewong <867245430@qq.com>
Signed-off-by: naivewong <867245430@qq.com>
Signed-off-by: naivewong <867245430@qq.com>
@brian-brazil
Copy link
Contributor

You need to check for and exclude the wrapper we put around the regex the user specifies.

@naivewong
Copy link
Contributor Author

Oh yes, I forgot to exclude the wrapper. But sorry what did you mean by "check for"?

@naivewong
Copy link
Contributor Author

Is that we exclude the wrapper only if we find it?

@brian-brazil
Copy link
Contributor

If the wrapper is missing, treat it as a regex.

Signed-off-by: naivewong <867245430@qq.com>
@brian-brazil
Copy link
Contributor

That looks about right. Can you add it to the postings benchmark too?

I'd be interested to know if there's a point where it's better to use the regex rather than the set.

@gouthamve
Copy link
Collaborator

I'd be interested to know if there's a point where it's better to use the regex rather than the set.

I don't think that'll ever be the case, as with regex, first we'll get all the matching label values (the set) and doing what the set matcher does anyways. With set, we'll be avoiding a step.

@brian-brazil
Copy link
Contributor

At some point, us doing parsing of the strings might be slower than the regex. Would be good to check one way or the other.

@bboreham
Copy link
Contributor

What's the motivation for only allowing escaping of special characters? What if I want to match \t or \n in a label value?

@codesome
Copy link
Contributor

codesome commented May 14, 2019

Adding to @bboreham's comment, isn't it enough to just include any character that is escaped rather than checking for special character or '\\'?

naivewong added 3 commits May 15, 2019 11:09
Signed-off-by: naivewong <867245430@qq.com>
Signed-off-by: naivewong <867245430@qq.com>
Signed-off-by: naivewong <867245430@qq.com>
@naivewong
Copy link
Contributor Author

Benchmark

BenchmarkSetMatcher/SetMatch,nSeries=15,pattern="^(?:1|2|3)$"-8         	  300000	      4579 ns/op	    3425 B/op	      59 allocs/op
BenchmarkSetMatcher/RegexMatch,nSeries=15,pattern="1|2|3"-8             	  200000	      6349 ns/op	    2836 B/op	      64 allocs/op
BenchmarkSetMatcher/SetMatch,nSeries=15,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8         	  200000	      8249 ns/op	    5443 B/op	      96 allocs/op
BenchmarkSetMatcher/RegexMatch,nSeries=15,pattern="1|2|3|4|5|6|7|8|9|10"-8             	  200000	      8844 ns/op	    4208 B/op	      81 allocs/op
BenchmarkSetMatcher/SetMatch,nSeries=200,pattern="^(?:1|2|3)$"-8                       	  300000	      5011 ns/op	    3425 B/op	      59 allocs/op
BenchmarkSetMatcher/RegexMatch,nSeries=200,pattern="1|2|3"-8                           	   20000	     70441 ns/op	   31928 B/op	     545 allocs/op
BenchmarkSetMatcher/SetMatch,nSeries=200,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8        	  200000	      9755 ns/op	    5546 B/op	      96 allocs/op
BenchmarkSetMatcher/RegexMatch,nSeries=200,pattern="1|2|3|4|5|6|7|8|9|10"-8            	   20000	     83923 ns/op	   35351 B/op	     644 allocs/op

Copy link
Contributor

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add test case for >1 blocks and get the benchmark results, which your benchmark already allows? One such test case should be enough.

querier_test.go Show resolved Hide resolved
querier_test.go Outdated Show resolved Hide resolved
querier_test.go Outdated Show resolved Hide resolved
Signed-off-by: naivewong <867245430@qq.com>
@naivewong
Copy link
Contributor Author

benchmark                                                                   old ns/op     new ns/op     delta
BenchmarkSetMatcher/nSeries=15,pattern="^(?:1|2|3)$"-8                      6512          4461          -31.50%
BenchmarkSetMatcher/nSeries=15,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8       10420         8136          -21.92%
BenchmarkSetMatcher/nSeries=1000,pattern="^(?:1|2|3)$"-8                    346047        234610        -32.20%
BenchmarkSetMatcher/nSeries=1000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     429347        273327        -36.34%

benchmark                                                                   old allocs     new allocs     delta
BenchmarkSetMatcher/nSeries=15,pattern="^(?:1|2|3)$"-8                      72             59             -18.06%
BenchmarkSetMatcher/nSeries=15,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8       95             96             +1.05%
BenchmarkSetMatcher/nSeries=1000,pattern="^(?:1|2|3)$"-8                    2290           1330           -41.92%
BenchmarkSetMatcher/nSeries=1000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     2655           1975           -25.61%

benchmark                                                                   old bytes     new bytes     delta
BenchmarkSetMatcher/nSeries=15,pattern="^(?:1|2|3)$"-8                      3657          3425          -6.34%
BenchmarkSetMatcher/nSeries=15,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8       5478          5445          -0.60%
BenchmarkSetMatcher/nSeries=1000,pattern="^(?:1|2|3)$"-8                    105927        90080         -14.96%
BenchmarkSetMatcher/nSeries=1000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     130824        118979        -9.05%

Signed-off-by: naivewong <867245430@qq.com>
@brian-brazil
Copy link
Contributor

What about 100k to 1M series? It's the bigger data sizes where the wins are, what you've tested are already fast.

@gouthamve
Copy link
Collaborator

Also a case where there is only one value for the label, but 15 options in the set matcher. Maybe that will tell us if the set matcher is always faster or not.

Signed-off-by: naivewong <867245430@qq.com>
@naivewong
Copy link
Contributor Author

benchmark                                                                               old ns/op     new ns/op     delta
BenchmarkSetMatcher/nSeries=1,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8          5937          8446          +42.26%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8         12308         8428          -31.52%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,pattern="^(?:1|2|3)$"-8                        8016          4653          -41.95%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,pattern="^(?:1|2|3)$"-8                     522961        207214        -60.38%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8      651767        303631        -53.41%
BenchmarkSetMatcher/nSeries=100000,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     27351         13267         -51.49%
BenchmarkSetMatcher/nSeries=500000,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     42813         31066         -27.44%

benchmark                                                                               old allocs     new allocs     delta
BenchmarkSetMatcher/nSeries=1,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8          50             96             +92.00%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8         95             96             +1.05%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,pattern="^(?:1|2|3)$"-8                        72             59             -18.06%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,pattern="^(?:1|2|3)$"-8                     3290           1330           -59.57%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8      3655           1975           -45.96%
BenchmarkSetMatcher/nSeries=100000,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     180            96             -46.67%
BenchmarkSetMatcher/nSeries=500000,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     180            96             -46.67%

benchmark                                                                               old bytes     new bytes     delta
BenchmarkSetMatcher/nSeries=1,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8          3527          5209          +47.69%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8         5476          5436          -0.73%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,pattern="^(?:1|2|3)$"-8                        3657          3425          -6.34%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,pattern="^(?:1|2|3)$"-8                     121933        90080         -26.12%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8      146805        118994        -18.94%
BenchmarkSetMatcher/nSeries=100000,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     6744          5353          -20.63%
BenchmarkSetMatcher/nSeries=500000,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8     6745          5352          -20.65%

@gouthamve
Copy link
Collaborator

This looks good so far!

Can you also add a nBlocks=10 version for BenchmarkSetMatcher/nSeries=500000,nBlocks=1,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-8?

Signed-off-by: naivewong <867245430@qq.com>
@naivewong
Copy link
Contributor Author

Unable to show you the results because the previous one was already the limit of my MBP.

@krasi-georgiev
Copy link
Contributor

pinged you on irc to give you access to a big machine where you can run these test quickly.

querier_test.go Outdated Show resolved Hide resolved
naivewong added 2 commits May 16, 2019 11:40
Signed-off-by: naivewong <867245430@qq.com>
Signed-off-by: naivewong <867245430@qq.com>
@naivewong
Copy link
Contributor Author

benchmark                                                                                                     old ns/op      new ns/op     delta
BenchmarkSetMatcher/nSeries=1,nBlocks=1,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48               7499           11422         +52.31%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48              15715          12522         -20.32%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,cardinality=100,pattern="^(?:1|2|3)$"-48                             9715           6789          -30.12%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,cardinality=100,pattern="^(?:1|2|3)$"-48                          731441         369395        -49.50%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48           951066         466389        -50.96%
BenchmarkSetMatcher/nSeries=100000,nBlocks=1,cardinality=100000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48       24648679       16168         -99.93%
BenchmarkSetMatcher/nSeries=500000,nBlocks=1,cardinality=500000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48       127604669      12485         -99.99%
BenchmarkSetMatcher/nSeries=500000,nBlocks=10,cardinality=500000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48      1222647955     206051        -99.98%
BenchmarkSetMatcher/nSeries=1000000,nBlocks=1,cardinality=1000000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48     260974613      15219         -99.99%

benchmark                                                                                                     old allocs     new allocs     delta
BenchmarkSetMatcher/nSeries=1,nBlocks=1,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48               50             96             +92.00%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48              95             96             +1.05%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,cardinality=100,pattern="^(?:1|2|3)$"-48                             72             59             -18.06%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,cardinality=100,pattern="^(?:1|2|3)$"-48                          3290           1330           -59.57%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48           3655           1975           -45.96%
BenchmarkSetMatcher/nSeries=100000,nBlocks=1,cardinality=100000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48       100080         96             -99.90%
BenchmarkSetMatcher/nSeries=500000,nBlocks=1,cardinality=500000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48       500080         96             -99.98%
BenchmarkSetMatcher/nSeries=500000,nBlocks=10,cardinality=500000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48      5000765        919            -99.98%
BenchmarkSetMatcher/nSeries=1000000,nBlocks=1,cardinality=1000000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48     1000080        96             -99.99%

benchmark                                                                                                     old bytes     new bytes     delta
BenchmarkSetMatcher/nSeries=1,nBlocks=1,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48               3542          5227          +47.57%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48              5486          5438          -0.87%
BenchmarkSetMatcher/nSeries=15,nBlocks=1,cardinality=100,pattern="^(?:1|2|3)$"-48                             3663          3430          -6.36%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,cardinality=100,pattern="^(?:1|2|3)$"-48                          122103        90211         -26.12%
BenchmarkSetMatcher/nSeries=1000,nBlocks=20,cardinality=100,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48           146976        119136        -18.94%
BenchmarkSetMatcher/nSeries=100000,nBlocks=1,cardinality=100000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48       1605359       5353          -99.67%
BenchmarkSetMatcher/nSeries=500000,nBlocks=1,cardinality=500000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48       8006583       5352          -99.93%
BenchmarkSetMatcher/nSeries=500000,nBlocks=10,cardinality=500000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48      80087445      54573         -99.93%
BenchmarkSetMatcher/nSeries=1000000,nBlocks=1,cardinality=1000000,pattern="^(?:1|2|3|4|5|6|7|8|9|10)$"-48     16007303      5352          -99.97%

querier_test.go Outdated Show resolved Hide resolved
@brian-brazil
Copy link
Contributor

Those results look more like it.

@codesome
Copy link
Contributor

Gains at higher cardinality are impressive!

Signed-off-by: naivewong <867245430@qq.com>
querier.go Show resolved Hide resolved
querier.go Outdated Show resolved Hide resolved
querier.go Outdated Show resolved Hide resolved
querier.go Show resolved Hide resolved
Signed-off-by: naivewong <867245430@qq.com>
@brian-brazil
Copy link
Contributor

Just had a thought there that we should have some unittests to cover when one of the values in the regex is the empty string.

Signed-off-by: naivewong <867245430@qq.com>
for i := 4; i < len(pattern)-2; i++ {
if escaped {
switch {
case isRegexMetaCharacter(pattern[i]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check necessary?
Why not just allow every escaped character? (Or perhaps disallow \0... since that’s complicated).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-brazil Should I include the cases above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, though check what Go does for escaping characters that don't need to be escaped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I was thinking about the wrong language.
The set of escapes we care about here is documented at https://github.com/google/re2/wiki/Syntax and is painfully complicated.
There are lots of escapes we don't want to accept, like \g, \p, \x, so you do need something like what you have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rechecked, the special characters you meant like \n, \a are actually already included in the else {} part of findSetMatches. What I detect here are the special characters like \\., \\+ in regexp, which means after I find \\, I determine if the next char is special.

naivewong added 2 commits May 17, 2019 22:25
Signed-off-by: naivewong <867245430@qq.com>
Signed-off-by: naivewong <867245430@qq.com>
Copy link
Contributor

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

block_test.go Show resolved Hide resolved
querier.go Show resolved Hide resolved
@brian-brazil
Copy link
Contributor

👍

Copy link
Collaborator

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Good work!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize queries using regex matchers for set lookups
7 participants