-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EQL: Fix cidrMatch function fails to match when used in scripts #56246
Conversation
Pinging @elastic/es-ql (:Query Languages/EQL) |
...l/src/main/java/org/elasticsearch/xpack/eql/expression/function/scalar/string/CIDRMatch.java
Outdated
Show resolved
Hide resolved
Object o = field.fold(); | ||
ArrayList<Object> arr = new ArrayList<>(addresses.size()); | ||
for (Expression address : addresses) { | ||
final Equals eq = new Equals(source(), field, address); | ||
func = (func == null) ? eq : new Or(source(), func, eq); | ||
arr.add(address.fold()); | ||
} | ||
return doProcess(o, arr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Object o = field.fold(); | |
ArrayList<Object> arr = new ArrayList<>(addresses.size()); | |
for (Expression address : addresses) { | |
final Equals eq = new Equals(source(), field, address); | |
func = (func == null) ? eq : new Or(source(), func, eq); | |
arr.add(address.fold()); | |
} | |
return doProcess(o, arr); | |
return doProcess(field.fold(), Expressions.fold(addresses)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
public ScriptTemplate asScript() { | ||
ScriptTemplate leftScript = asScript(field); | ||
|
||
List<Object> values = new ArrayList<>(new LinkedHashSet<>(foldListOfValues(addresses, field.dataType()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List<Object> values = new ArrayList<>(new LinkedHashSet<>(foldListOfValues(addresses, field.dataType()))); | |
List<Object> values = Expressions.fold(addresses) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will not eliminate duplicated values as far as I see
@@ -156,22 +156,29 @@ InternalEqlScriptUtils.concat([InternalQlScriptUtils.docValue(doc,params.v0),par | |||
cidrMatchFunctionOne | |||
process where cidrMatch(source_address, "10.0.0.0/8") | |||
; | |||
"term":{"source_address":{"value":"10.0.0.0/8" | |||
"script":{"source":"InternalQlScriptUtils.nullSafeFilter(InternalEqlScriptUtils.cidrMatch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add an optimizer rule to handle this so that we don't lose the more performant term
query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will probably have to do another pass for that after @astefan merges his PR that also will affect the query planning optimization for simple equals. Can adjust this PR if that change merges sooner, sometimes this week.
...l/src/main/java/org/elasticsearch/xpack/eql/expression/function/scalar/string/CIDRMatch.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/elasticsearch/xpack/eql/expression/function/scalar/string/CIDRUtils.java
Show resolved
Hide resolved
Updated PR to use term query for cidrMatch when possible otherwise fallback to script. |
if (cm.field() instanceof FieldAttribute && Expressions.foldable(cm.addresses())) { | ||
String targetFieldName = handler.nameOf(((FieldAttribute) cm.field()).exactAttribute()); | ||
|
||
Set<Object> set = new LinkedHashSet<>(CollectionUtils.mapSize(cm.addresses().size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this worthwhile to add as a method? It's also called within CidrMatch.asScript
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly, needed Set here for TermsQuery constructor and List for the scripting func
new ExpressionTranslators.StringQueries(), | ||
new ExpressionTranslators.Matches(), | ||
new ExpressionTranslators.MultiMatches(), | ||
new Scalars() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preferred without static importing everything here, more visible QL translators vs EQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A relevant aspect here: you might want to update your branch from master, to incorporate the StartsWith
changes from #56404. In that PR, the Scalars
from QL was updated to have startsWith
use a PrefixQuery whenever possible. And for EQL and its QueryTranslator that you added, it will probably need the Scalars
from QL, as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, and adjusted Scalar translation in EQ to be able to reuse it for startsWith function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, but I have some comments as well.
|
||
public static boolean isInRange(String address, String... cidrAddresses) { | ||
// Check if address is parsable first | ||
byte[] addr = InetAddresses.forString(address).getAddress(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this code is in EQL, maybe we should handle the IllegalArgumentException here and wrap it in EqlIllegalArgumentException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
new ExpressionTranslators.StringQueries(), | ||
new ExpressionTranslators.Matches(), | ||
new ExpressionTranslators.MultiMatches(), | ||
new Scalars() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A relevant aspect here: you might want to update your branch from master, to incorporate the StartsWith
changes from #56404. In that PR, the Scalars
from QL was updated to have startsWith
use a PrefixQuery whenever possible. And for EQL and its QueryTranslator that you added, it will probably need the Scalars
from QL, as well.
import org.elasticsearch.xpack.ql.type.DataType; | ||
import org.elasticsearch.xpack.ql.type.DataTypeConverter; | ||
|
||
public class EqlTranslatorHandler implements TranslatorHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is the same as QlTranslatorHandler
except asQuery
method. Maybe you could reuse most of the code from QlTranslatorHandler
and only change the relevant bit in EqlTranslatorHandler
. For example, why not extending QlTranslatorHandler
and override asQuery
only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
process where cidrMatch(source_address, "10.0.0.0/8") == true | ||
; | ||
{"bool":{"must":[{"term":{"event.category":{"value":"process" | ||
{"terms":{"source_address":["10.0.0.0/8"] | ||
; | ||
|
||
cidrMatchFunctionTwo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add one more test where there are two cidrMatch
function calls in the same query: where cidrMatch(whatever) or cidrMatch(another_whatever)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
process where cidrMatch(source_address, "10.0.0.0/8") or cidrMatch(source_address, "192.168.0.0/16") | ||
; | ||
{"bool":{"must":[{"term":{"event.category":{"value":"process" | ||
{"terms":{"source_address":["10.0.0.0/8"],"boost":1.0}},{"terms":{"source_address":["192.168.0.0/16"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it is relevant to have both terms
queries in there and the must
for event.category
, but there has to be a bool
should
as well.
The query should translate as a bool
with two must
statements:
- one is
term
onevent.category
- the other is a
bool
with twoshould
statements whereterms
on thesource_address
is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Replaced the previous implementation with script function.
Addresses #55709
@astefan I think this will conflict with your changes on query folding with equal operator optimization, it generates a different script with function now.