Skip to content

ES Agg pushdown v2 #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 21, 2022
Merged

ES Agg pushdown v2 #2

merged 8 commits into from
Jan 21, 2022

Conversation

gruuya
Copy link

@gruuya gruuya commented Jan 17, 2022

Building on the changes from splitgraph/Multicorn#2, extend the ES FDW so that we can push down

  • doc filtering stemming from SQL qual (WHERE) clauses
  • COUNT(*), which does not have a simple equivalent that is suitable for all query translations

CU-1z461e4

Since ES lacks analogous operation primitive we need to handle it differently depending when
- only aggregations present we must specify track_total_hits to true, so as to get an accurate
count above the default 10000
- when grouping clauses are present, the doc_count field suffices, which is present by default.
@gruuya gruuya requested a review from mildbyte January 17, 2022 12:50
@gruuya gruuya self-assigned this Jan 17, 2022
Copy link

@mildbyte mildbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was simple enough, good work covering it by tests in the main repo! Moving on to the Multicorn part.

@@ -43,7 +44,7 @@ def _base_qual_to_es(col, op, value, column_map=None):
return {"bool": {"must_not": {"term": {col: value}}}}

if op == "~~":
return {"match": {col: value.replace("%", "*")}}
return {"wildcard": {col: value.replace("%", "*")}}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this would handle underscores (single-character match) and escapes (e.g. \% matches %), so something like:

  • % (when not preceded by a single \) becomes *
  • _ (when not preceded by a single \) becomes ?
  • \% becomes % (since it has no special meaning in ES wildcards)
  • \_ becomes _ (same)

It does become complex though. I'd be happy with us returning false positives (since PG will recheck and filter them out) but I don't know how to process the input LIKE pattern to get to a pattern that definitely over-matches instead of possibly under-matching (e.g. right now the x LIKE 'a_ple' won't match apple AFAIU)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think this should do it, let me add a couple of tests in the engine repo.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think of it, I need to handle case like \\%(becomes \*), and \\\% (becomes \%?).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • \\% in PG patterns means "backslash + any characters", so it'd become \\*
  • \\\% means "backslash + %", so it'd become \\% (since % has no special meaning)

Copy link
Author

@gruuya gruuya Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, this should now be covered as well. Here are the corresponding tests.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also test / check that this escapes special ES pattern characters, e.g.:

  • * becomes \*
  • ? becomes \?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, added a fix and a test for these cases too.

@gruuya gruuya merged commit 7f9760d into master Jan 21, 2022
@gruuya gruuya deleted the es-agg-pushdown-v2-cu-1z461e4 branch January 21, 2022 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants