-
Notifications
You must be signed in to change notification settings - Fork 1
ES Agg pushdown v2 #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Since ES lacks analogous operation primitive we need to handle it differently depending when - only aggregations present we must specify track_total_hits to true, so as to get an accurate count above the default 10000 - when grouping clauses are present, the doc_count field suffices, which is present by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one was simple enough, good work covering it by tests in the main repo! Moving on to the Multicorn part.
pg_es_fdw/_es_query.py
Outdated
@@ -43,7 +44,7 @@ def _base_qual_to_es(col, op, value, column_map=None): | |||
return {"bool": {"must_not": {"term": {col: value}}}} | |||
|
|||
if op == "~~": | |||
return {"match": {col: value.replace("%", "*")}} | |||
return {"wildcard": {col: value.replace("%", "*")}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this would handle underscores (single-character match) and escapes (e.g. \%
matches %
), so something like:
%
(when not preceded by a single\
) becomes*
_
(when not preceded by a single\
) becomes?
\%
becomes%
(since it has no special meaning in ES wildcards)\_
becomes_
(same)
It does become complex though. I'd be happy with us returning false positives (since PG will recheck and filter them out) but I don't know how to process the input LIKE pattern to get to a pattern that definitely over-matches instead of possibly under-matching (e.g. right now the x LIKE 'a_ple'
won't match apple
AFAIU)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think this should do it, let me add a couple of tests in the engine repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think of it, I need to handle case like \\%
(becomes \*
), and \\\%
(becomes \%
?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\\%
in PG patterns means "backslash + any characters", so it'd become\\*
\\\%
means "backslash +%
", so it'd become\\%
(since%
has no special meaning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, this should now be covered as well. Here are the corresponding tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also test / check that this escapes special ES pattern characters, e.g.:
*
becomes\*
?
becomes\?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building on the changes from splitgraph/Multicorn#2, extend the ES FDW so that we can push down
WHERE
) clausesCOUNT(*)
, which does not have a simple equivalent that is suitable for all query translationsCU-1z461e4