-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle missing and multiple values in script #29611
Handle missing and multiple values in script #29611
Conversation
fda0be2
to
cec0419
Compare
Previously in script for numeric fields, there was no way to check if a document is missing a value. Also certain operations on multiple- values fields were missing. This PR adds the following: - return null for doc['field'] if a document is missing a 'field1': Now we can do this: if (doc['field'] == null) {return -1;} return doc['field'].value; or doc['field']?.value ?: -1 - add the following functions for multiple-valued numeric fields: doc['field'].min returns the minumum amoung values doc['field'].max returns the maximum amoung values doc['field'].sum returns the sum of amoung values doc['field'].avg returns the average of values Closes elastic#29286
cec0419
to
5aeab57
Compare
Pinging @elastic/es-core-infra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Mayya, thanks for tackling this. The changes seem good. My one concern is this is a backwards incompatible change. Users currently may not be checking the fields for existence and will suddenly get NPE's if they are using the default values for docs with missing fields. One approach to solve this problem would be to have a setting that allows users to turn this missing field return null on in 6.x. If the flag is not set, we give a deprecation warning. Then remove the flag altogether in 7.
@jdconrad Thanks Jack for your review. Your concern makes sense. Do you think we should add an index-level setting for 6.x:
The default will be @rjernst what do you think? |
I think doing a deprecation either through a setting or a sysprop is a good idea. Deprecating If doing through a sysprop, we have the advantage of the code change being very simple. We would just need one place somewhere during node startup that checked what value we statically read, and emitting a deprecation log message if it is set to the old behavior. For a setting, I think we would want this as a cluster setting, since scripts can be run across many indices, and it would not make sense (or be possible, since scripts are compiled once for the node) to vary this per index/shard. However, passing the cluster setting down and updating may be tricky. At least, it will require a few levels of signature changes to IndicesService, QueryShardContext and SearchLookup classes. I think we would want to get a BooleanSupplier down into LeafDocLookup, for which the return value is tied to this cluster setting. This is tricky because we don't currently have ClusterSettings (which is needed to hook up a settings listener) in IndicesService. As far as when we should emit the deprecation warning, I think in either case (sysprop or cluster setting) it should be during startup. |
I agree on the one time deprecation warning on start up. Otherwise the logs get filled with a huge amount of spam whenever a script is used. |
Previously in script for numeric fields, there was no way to check if a document is missing a value. Also certain operations on multiple- values fields were missing. This PR adds the following: - add the following functions for multiple-valued numeric fields: doc['field'].min returns the minumum amoung values doc['field'].max returns the maximum amoung values doc['field'].sum returns the sum of amoung values doc['field'].avg returns the average of values - return null for doc['field'] if a document is missing a 'field1': Now we can do this: if (doc['field'] == null) {return -1;} return doc['field'].value; or doc['field']?.value ?: -1 This new behaviour will only work if the following system property is set: `export ES_JAVA_OPTS="-Des.script.null_for_missing_value=true"' Closes elastic#29286
…ple-values-in-script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks again for doing this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good. I have some requests.
docs/CHANGELOG.asciidoc
Outdated
@@ -19,6 +19,7 @@ | |||
=== Breaking Java Changes | |||
|
|||
=== Deprecations | |||
Returning 0 for missing numeric fields in script is deprecated. PR: 29611 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this, since the changelog is being taken another direction.
about this forthcoming change. | ||
|
||
|
||
===== Multiple values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that the getting started guide is the place for either of these things to be documented. Maybe @debadair has a better suggestion for where they could fit in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the "examples" section is really the "getting started" guide, and the other pieces should probably be split out. Not sure we want to do that as part of this change, though. For now, I'd rename this Multi-value field operations , put it after the regular expressions section, and bump it up to a level 3 heading so it's a separate page.
will be changed in the next major version of elasticsearch: | ||
if a document is missing a field `field`, `doc['field']` for this document | ||
will return `null`. You can set a system property | ||
`export ES_JAVA_OPTS="-Des.script.null_for_missing_value=true"' on a node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should link to setting jvm.options, since that is OS agnostic.
[float] | ||
===== Missing values | ||
|
||
Currently by default, if a document is missing a numeric field `field`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would describe this a bit differently. I suggest using 2 different paragraphs. The first should describe the current behavior (but it is not necessary to say "currently by default"). This should also not state "returns 0" as it is field type dependent what is returned (and all field types behavior should be described).
The next paragraph can then describe the new option. The last sentence can mention the default behavior will change in a future release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to keep the missing values info in the Accessing doc values section. I'd probably pare it down to something like:
If you request the value from a numeric field that isn’t in the document, Painless returns zero.
IMPORTANT: Starting in 7.0, Painless returns a value of null
instead of zero if the field does not exist. To enable this behavior now, set the following system property:
+
export ES_JAVA_OPTS="-Des.script.null_for_missing_value=true”
+
If you do not enable this behavior, a deprecation warning is logged on start up.
modules/lang-painless/build.gradle
Outdated
@@ -25,6 +25,7 @@ esplugin { | |||
} | |||
|
|||
integTestCluster { | |||
systemProperty 'es.script.null_for_missing_value', 'true' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have another integ test where this value is left out, to ensure the old behavior still works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst Thanks for the feedback. What is a proper way to create another integ test?
I have got the following code, but I am not sure how to add module mapper-extras
to additionalClusterTestCluster
as well.
integTestCluster {
module project.project(':modules:mapper-extras')
}
Task additionalClusterTest = tasks.create(
name: "additionalClusterTest", type: RestIntegTestTask)
additionalClusterTestCluster {
systemProperty 'es.script.null_for_missing_value', 'true'
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how to add module mapper-extras to additionalClusterTestCluster
There is not need to add it, since rest integ tests run against the entire distribution (ie it will have all modules).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, it would be better to do it. So, use this for the cluster config:
systemProperty 'es.script.null_for_missing_value', 'true'
distribution = 'integ-test-zip'
module project // add the lang-painless module itself
module project.project(':modules:mapper-extras')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst thanks a lot for the suggestion
@@ -718,6 +718,10 @@ public void onTimeout(TimeValue timeout) { | |||
writePortsFile("transport", transport.boundAddress()); | |||
} | |||
|
|||
if (!ScriptModule.NULL_FOR_MISSING_VALUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
== false
please
@@ -718,6 +718,10 @@ public void onTimeout(TimeValue timeout) { | |||
writePortsFile("transport", transport.boundAddress()); | |||
} | |||
|
|||
if (!ScriptModule.NULL_FOR_MISSING_VALUE) | |||
logger.warn("Script: returning 0 for missing numeric values is deprecated. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be using a DeprecationLogger, so the message goes to the deprecation log.
@@ -91,7 +92,8 @@ public void setDocument(int docId) { | |||
localCacheFieldData.put(fieldName, scriptValues); | |||
} | |||
try { | |||
scriptValues.setNextDocId(docId); | |||
boolean docHasValues = scriptValues.setNextDocId(docId); | |||
if (!docHasValues && ScriptModule.NULL_FOR_MISSING_VALUE) return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
== false
please
@@ -83,7 +83,7 @@ | |||
scripts.put("doc['random_score']", vars -> { | |||
Map<?, ?> doc = (Map) vars.get("doc"); | |||
ScriptDocValues.Doubles randomScore = (ScriptDocValues.Doubles) doc.get("random_score"); | |||
return randomScore.getValue(); | |||
if (randomScore != null) {return randomScore.getValue();} else return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is going to be on a single line, please use a ternary statement. The same applies to the other tests here that were changed in a similar way.
[float] | ||
===== Missing values | ||
|
||
Currently by default, if a document is missing a numeric field `field`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to keep the missing values info in the Accessing doc values section. I'd probably pare it down to something like:
If you request the value from a numeric field that isn’t in the document, Painless returns zero.
IMPORTANT: Starting in 7.0, Painless returns a value of null
instead of zero if the field does not exist. To enable this behavior now, set the following system property:
+
export ES_JAVA_OPTS="-Des.script.null_for_missing_value=true”
+
If you do not enable this behavior, a deprecation warning is logged on start up.
about this forthcoming change. | ||
|
||
|
||
===== Multiple values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the "examples" section is really the "getting started" guide, and the other pieces should probably be split out. Not sure we want to do that as part of this change, though. For now, I'd rename this Multi-value field operations , put it after the regular expressions section, and bump it up to a level 3 heading so it's a separate page.
|
||
There is a number of operations designed for numeric fields, | ||
if a document has multiple values in such a field: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd edit the intro to:
Painless supports the following operations for multi-value numeric fields:
In the list, I'd drop "among values" from min & max. For sum and average, I'd probably say "of the values" rather than "of all values".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@debadair thanks for the suggestion, noted
Previously in script for numeric fields, there was no way to check if a document is missing a value. Also certain operations on multiple- values fields were missing. This PR adds the following: - add the following functions for multiple-valued numeric fields: doc['field'].min returns the minumum amoung values doc['field'].max returns the maximum amoung values doc['field'].sum returns the sum of amoung values doc['field'].avg returns the average of values - return null for doc['field'] if a document is missing a 'field1': Now we can do this: if (doc['field'] == null) {return -1;} return doc['field'].value; or doc['field']?.value ?: -1 This new behaviour will only work if the following system property is set: `export ES_JAVA_OPTS="-Des.script.null_for_missing_value=true"'
…ple-values-in-script
…ple-values-in-script
3b58d22
to
89a3bd6
Compare
89a3bd6
to
3b5f421
Compare
…ple-values-in-script
@rjernst Ryan, I have addressed all your and Deb's comments. Can you please continue the review when you have available time. Thanks. |
Closed in favour of #30975 Multiple values still need to be addressed |
Previously in script for numeric fields, there was no way to check if
a document is missing a value. Also certain operations on multiple-
values fields were missing.
This PR adds the following:
return
null
for doc['field'] if a document is missing a 'field1':Now we can do this:
if (doc['field'] == null) {return -1;} return doc['field'].value; or
doc['field']?.value ?: -1
add the following functions for multiple-valued numeric fields:
doc['field'].min
returns the minumum amoung valuesdoc['field'].max
returns the maximum amoung valuesdoc['field'].sum
returns the sum of amoung valuesdoc['field'].avg
returns the average of valuesCloses #29286