-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scripting: enable regular expressions by default #63029
Scripting: enable regular expressions by default #63029
Conversation
* Setting `script.painless.regex.enabled` has a new option, `use-factor`, the default. This defaults to using regular expressions but limiting the complexity of the regular expressions. In addition to `use-factor`, the setting can be `true`, as before, which enables regular expressions without limiting them. `false` totally disables regular expressions, which was the old default. * New setting `script.painless.regex.limit-factor`. This limits regular expression complexity by limiting the number characters a regular expression can consider based on input length. The default is `6`, so a regular expression can consider `6` * input length number of characters. With input `foobarbaz` (length `9`), for example, the regular expression can consider `54` (`6 * 9`) characters. This reduces the impact of exponential backtracking in Java's regular expression engine. * add `@inject_constant` annotation to whitelist. This annotation signals that a compiler settings will be injected at the beginning of a whitelisted method. The format is `argnum=settingname`: `1=foo_setting 2=bar_setting`. Argument numbers must start at one and must be sequential. * Augment `Pattern.split(CharSequence)` `Pattern.split(CharSequence, int)`, `Pattern.splitAsStream(CharSequence)` `Pattern.matcher(CharSequence)` to take the value of `script.painless.regex.limit-factor` as a an injected parameter, limiting as explained above when this setting is in use. Fixes: elastic#49873
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have a few high level comments mostly. First is on the naming. "use-factor" is pretty cryptic. Can we use something like "limited"? This flows nicely to me. If I ask the question "are regexes enabled?" the answer true/false/limited. Second, it seems there are quite a few TODOs leftover that can probably be removed? If they are to be left in then please expand on them and/or create an issue so someone else on the team could understand what needs to be done.
@@ -58,6 +58,7 @@ class java.util.regex.Matcher { | |||
String replaceFirst(String) | |||
boolean requireEnd() | |||
Matcher reset() | |||
# Whitelisting Matcher.reset(String) works around the regex limiting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't actually whitelist this method, I think this is just a comment to note if we did allow that method it would allow escaping the regex limiting? Could you clarify the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated comment.
@@ -66,6 +66,8 @@ public ScriptScope(PainlessLookup painlessLookup, CompilerSettings compilerSetti | |||
staticConstants.put("$SOURCE", scriptSource); | |||
staticConstants.put("$DEFINITION", painlessLookup); | |||
staticConstants.put("$FUNCTIONS", functionTable); | |||
// TODO(stu): inject compiler settings here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover todo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, these and all the left over todos were why this was initially a draft. They are gone now.
@@ -211,7 +211,7 @@ protected void injectStaticFieldsAndGetters() { | |||
irLoadFieldMemberNode.setLocation(internalLocation); | |||
irLoadFieldMemberNode.setExpressionType(String.class); | |||
irLoadFieldMemberNode.setName("$NAME"); | |||
irLoadFieldMemberNode.setStatic(true); | |||
irLoadFieldMemberNode.setStatic(true); // TODO(stu): add $COMPILER_INJECTS, add hash map and set it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this todo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
if (arguments.containsKey(argNum) == false) { | ||
throw new IllegalArgumentException("[@inject_constant] missing argument number [" + argNum + "]"); | ||
} | ||
// TODO(stu): Jack, how do I verify against CompilerSettings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todos like this are very cryptic. Can we just have a normal comment if some explanation is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed when draft.
import java.util.Collections; | ||
import java.util.List; | ||
|
||
public class InjectConstantAnnotation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have some basic java docs on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Pinging @elastic/es-core-infra (:Core/Infra/Scripting) |
@stu-elastic One note is we need to ensure we support the operators ==~ and =~ in the BinaryMathNode.write method. Apologies as I dropped the ball on this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Setting `script.painless.regex.enabled` has a new option, `use-factor`, the default. This defaults to using regular expressions but limiting the complexity of the regular expressions. In addition to `use-factor`, the setting can be `true`, as before, which enables regular expressions without limiting them. `false` totally disables regular expressions, which was the old default. * New setting `script.painless.regex.limit-factor`. This limits regular expression complexity by limiting the number characters a regular expression can consider based on input length. The default is `6`, so a regular expression can consider `6` * input length number of characters. With input `foobarbaz` (length `9`), for example, the regular expression can consider `54` (`6 * 9`) characters. This reduces the impact of exponential backtracking in Java's regular expression engine. * add `@inject_constant` annotation to whitelist. This annotation signals that a compiler settings will be injected at the beginning of a whitelisted method. The format is `argnum=settingname`: `1=foo_setting 2=bar_setting`. Argument numbers must start at one and must be sequential. * Augment `Pattern.split(CharSequence)` `Pattern.split(CharSequence, int)`, `Pattern.splitAsStream(CharSequence)` `Pattern.matcher(CharSequence)` to take the value of `script.painless.regex.limit-factor` as a an injected parameter, limiting as explained above when this setting is in use. Fixes: elastic#49873
* Setting `script.painless.regex.enabled` has a new option, `use-factor`, the default. This defaults to using regular expressions but limiting the complexity of the regular expressions. In addition to `use-factor`, the setting can be `true`, as before, which enables regular expressions without limiting them. `false` totally disables regular expressions, which was the old default. * New setting `script.painless.regex.limit-factor`. This limits regular expression complexity by limiting the number characters a regular expression can consider based on input length. The default is `6`, so a regular expression can consider `6` * input length number of characters. With input `foobarbaz` (length `9`), for example, the regular expression can consider `54` (`6 * 9`) characters. This reduces the impact of exponential backtracking in Java's regular expression engine. * add `@inject_constant` annotation to whitelist. This annotation signals that a compiler settings will be injected at the beginning of a whitelisted method. The format is `argnum=settingname`: `1=foo_setting 2=bar_setting`. Argument numbers must start at one and must be sequential. * Augment `Pattern.split(CharSequence)` `Pattern.split(CharSequence, int)`, `Pattern.splitAsStream(CharSequence)` `Pattern.matcher(CharSequence)` to take the value of `script.painless.regex.limit-factor` as a an injected parameter, limiting as explained above when this setting is in use. Fixes: #49873 Backport of: 93f29a4
Now that we've got regexes enabled by default (elastic#63029) this adds a test to runtime fields just to make sure that it works with regexes. It does, but this adds a test to make sure it continues to work.
Now that we've got regexes enabled by default (#63029) this adds a test to runtime fields just to make sure that it works with regexes. It does, but this adds a test to make sure it continues to work.
Now that we've got regexes enabled by default (elastic#63029) this adds a test to runtime fields just to make sure that it works with regexes. It does, but this adds a test to make sure it continues to work.
Now that we've got regexes enabled by default (elastic#63029) this adds a test to runtime fields just to make sure that it works with regexes. It does, but this adds a test to make sure it continues to work.
Documents the `script.painless.regex.enabled` and `script.painless.regex.limit-factor` cluster settings. Relates to elastic#63029. Closes elastic#75199.
Documents the `script.painless.regex.enabled` and `script.painless.regex.limit-factor` cluster settings. Relates to elastic#63029. Closes elastic#75199.
Documents the `script.painless.regex.enabled` and `script.painless.regex.limit-factor` cluster settings. Relates to elastic#63029. Closes elastic#75199.
Documents the `script.painless.regex.enabled` and `script.painless.regex.limit-factor` cluster settings. Relates to elastic#63029. Closes elastic#75199.
Documents the `script.painless.regex.enabled` and `script.painless.regex.limit-factor` cluster settings. Relates to elastic#63029. Closes elastic#75199.
Setting
script.painless.regex.enabled
has a new option,limited
, the default. This defaults to using regularexpressions but limiting the complexity of the regular
expressions.
In addition to
limited
, the setting can betrue
, asbefore, which enables regular expressions without limiting them.
false
totally disables regular expressions, which was theold default.
New setting
script.painless.regex.limit-factor
. This limitsregular expression complexity by limiting the number characters
a regular expression can consider based on input length.
The default is
6
, so a regular expression can consider6
* input length number of characters. With inputfoobarbaz
(length9
), for example, the regular expressioncan consider
54
(6 * 9
) characters.This reduces the impact of exponential backtracking in Java's
regular expression engine.
add
@inject_constant
annotation to whitelist.This annotation signals that a compiler settings will
be injected at the beginning of a whitelisted method.
The format is
argnum=settingname
:1=foo_setting 2=bar_setting
.Argument numbers must start at one and must be sequential.
Augment
Pattern.split(CharSequence)
Pattern.split(CharSequence, int)
,Pattern.splitAsStream(CharSequence)
Pattern.matcher(CharSequence)
to take the value of
script.painless.regex.limit-factor
as aan injected parameter, limiting as explained above when this
setting is in use.
Fixes: #49873