Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(storage/bloom): support simplifiable regexp matchers #14622

Merged
merged 9 commits into from
Nov 4, 2024

Conversation

rfratto
Copy link
Member

@rfratto rfratto commented Oct 25, 2024

What this PR does / why we need it:
This adds support for basic regexps which can be simplified into a sequence of OR matchers, such as:

  • key=~"value" becomes key="value"
  • key=~"value1|value2" becomes key="value1" or key="value2".
  • key=~".+" checks for the presence of key.

key=~".+" is currently the only way to check if a key exists.

Only the cases above are "officially" supported. However, we technically support basic concatenations and character classes due to how regexp/syntax parses and simplifies expressions such as value1|value2 into value[12].

To prevent unbounded cardinality, we limit regexp expansion to 25 matchers; otherwise a regexp like value[0-9][0-9][0-9][0-9] would expand into 10,000 matchers (too many!).

Which issue(s) this PR fixes:
Closes grafana/loki-private#1106.

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

This adds support for basic regexps which can be simplified into a
sequence of OR matchers, such as:

* `key=~"value"` becomes `key="value"`
* `key=~"value1|value2"` becomes `key="value1" or key="value2"`.

Matchers like `key=~".+"` continue to not be supported because the lack
of a key doesn't mean that it doesn't exist as a label.

Only the cases above are "officially" supported. However, we technically
support basic concatenations and character classes due to how
regexp/syntax parses and simplifies expressions such as `value1|value2`
into `value[12]`.

To prevent unbounded cardinality, we limit regexp expansion to 25
matchers; otherwise a regexp like `value[0-9][0-9][0-9][0-9]` would
expand into 10,000 matchers (too many!).
@rfratto rfratto requested a review from a team as a code owner October 25, 2024 16:36
@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Oct 25, 2024
Copy link
Contributor

@JStickler JStickler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[docs team] Doc part of this PR LGTM (one typo)

docs/sources/query/query_accceleration.md Outdated Show resolved Hide resolved
Copy link
Contributor

@salvacorts salvacorts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Left two nits but LGTM 👏

docs/sources/query/query_accceleration.md Show resolved Hide resolved
pkg/storage/bloom/v1/ast_extractor.go Outdated Show resolved Hide resolved
Copy link
Contributor

@chaudum chaudum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

docs/sources/query/query_accceleration.md Show resolved Hide resolved
pkg/storage/bloom/v1/ast_extractor.go Outdated Show resolved Hide resolved
@rfratto rfratto force-pushed the bloom-support-simplifiable-regexp branch from 2fb1a06 to f7df523 Compare November 4, 2024 15:04
@rfratto rfratto requested a review from salvacorts November 4, 2024 15:10
@rfratto
Copy link
Member Author

rfratto commented Nov 4, 2024

@salvacorts Going to merge since you gave a soft LGTM in your last review, but please let me know if there's any other changes you'd like me to make :)

@rfratto rfratto merged commit 8eca826 into grafana:main Nov 4, 2024
60 checks passed
@rfratto rfratto deleted the bloom-support-simplifiable-regexp branch November 4, 2024 20:09
chaudum pushed a commit that referenced this pull request Nov 6, 2024
This adds support for basic regexps which can be simplified into a sequence of
OR matchers, such as:

* `key=~"value" becomes key="value"
* `key=~"value1|value2" becomes key="value1" or key="value2".
* `key=~".+" checks for the presence of key. This is currently the only way to 
   check if a key exists.

Only the cases above are "officially" supported. However, we technically
support basic concatenations and character classes due to how regexp/syntax
parses and simplifies expressions such as `value1|value2` into `value[12]`.

To prevent unbounded cardinality, we limit regexp expansion to 25 matchers;
otherwise a regexp like `value[0-9][0-9][0-9][0-9]` would expand into 10,000
matchers (too many!).

Closes grafana/loki-private#1106.

Co-authored-by: J Stickler <julie.stickler@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants