Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing fields values unexpectedly return 0 in painless #29286

Closed
mayya-sharipova opened this issue Mar 28, 2018 · 5 comments
Closed

Missing fields values unexpectedly return 0 in painless #29286

mayya-sharipova opened this issue Mar 28, 2018 · 5 comments
Assignees
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement

Comments

@mayya-sharipova
Copy link
Contributor

Describe the feature: Missing fields values are not processed as expected in painless
Elasticsearch version (bin/elasticsearch --version): 6.2.3

Steps to reproduce:

PUT /indexscore1
{
  "mappings": {
			"doc": {
				"properties": {
					"f1": {
						"type":     "text"
					},
					"f2" : {
						"type" : "integer"
					}
				}
			}
		}
}

POST/_bulk
{ "index" : { "_index" : "indexscore1", "_type" : "doc", "_id" : "4" } }
{ "type" : "doc", "f1" : "beautiful horse"}

GET /indexscore1/_search
 {
	 "query" : {
		 "function_score": {
				"script_score" : {
					"script" : {
							"source" : "if (doc.containsKey('f2')) {return doc['f2'].value;} return 1.0;"
					}    
				}	
		 }
	 }
 }

I expect that "if (doc.containsKey('f2')) {return doc['f2'].value;} return 1.0;" should return 1 for my doc, but this returns 0:

"hits": [
                     {
				"_index": "indexscore1",
				"_type": "doc",
				"_id": "4",
				"_score": 0.0,
				"_source": {
					"type": "doc",
					"f1": "beautiful horse"
				}
			}
		]
@mayya-sharipova mayya-sharipova added the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Mar 28, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Mar 28, 2018

I have clarified the following information from @rjernst and @jpountz :

  • We use SortedNumericDocValuesto access document's integer values with the following code:
public long getValue() {
            if (count == 0) {
                return 0L;
            }
            return values[0];
        }

which does return 0.

  • Lucene used to have a contract that NumericDocValuesshould return 0 when the document doesn't have a value. But later a bitset of documents that had a value was added and now we even have iterators of docs with a value, so this must be something that can be fixed

If we want painless not to return 0 for missing values, we should correct this.
This may be something we should change in LeafDocLookup. we can make it return null in this case, by checking scriptValues.isEmpty() before returning from get

@mayya-sharipova
Copy link
Contributor Author

@rjernst Ryan, what do you think if we add these two functions to painless:

missing(doc['field1'], <floatValue>) - returns floatValue if a current document is missing field1, otherwise returns doc's value for this field. Only applicable for numeric fields.

multi_value(doc['field1'], <stringValue>). Options for <stringValue> are avg, min, max, sum. For multi valued fields, depending on the chosen option, returns their averages, minimum, maximum, or sum. Only applicable for numeric fields.

@rjernst
Copy link
Member

rjernst commented Apr 16, 2018

missing(doc['field1'], <floatValue>)

I think this would be unintuitive to have to know about an additional method to call (a non member method on the return of doc['field'] too). Returning null from doc['field'] is more intuitive IMO. The field missing from doc means it is not in the mappings, and returning null means the current doc has no values.

multi_value(doc['field1'], <stringValue>)

I think this would be better as member methods on ScriptDocValues.Longs (and ScriptDocValues.Doubles too if we want them for double fields). This would be similar to methods we have for geo fields.

@jdconrad
Copy link
Contributor

@rjernst I'm concerned about one thing - if we do return null for missing field values in this case, once we make scripting index-aware, we do not want to return objects from fields that represent numerical types so we get the performance benefit of not boxing/unboxing. We can probably just document the differences between index-aware versus non-index aware, but will this be confusing later on?

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Apr 20, 2018
Previously in script for numeric fields, there was no way to check if
a document is missing a value. Also certain operations on multiple-
values fields were missing.

This PR adds the following:

- return null for doc['field'] if a document is missing a 'field1':
    Now we can do this:
    if (doc['field'] == null) {return -1;} return doc['field'].value;  or
    doc['field']?.value ?: -1

- add the following functions for multiple-valued numeric fields:
    doc['field'].min returns the minumum amoung values
    doc['field'].max returns the maximum amoung values
    doc['field'].sum returns the sum of amoung values
    doc['field'].avg returns the average of values

Closes elastic#29286
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Apr 30, 2018
Previously in script for numeric fields, there was no way to check if
a document is missing a value. Also certain operations on multiple-
values fields were missing.

This PR adds the following:

- add the following functions for multiple-valued numeric fields:
    doc['field'].min returns the minumum amoung values
    doc['field'].max returns the maximum amoung values
    doc['field'].sum returns the sum of amoung values
    doc['field'].avg returns the average of values

- return null for doc['field'] if a document is missing a 'field1':
    Now we can do this:
    if (doc['field'] == null) {return -1;} return doc['field'].value;  or
    doc['field']?.value ?: -1
    This new behaviour will only work is the following system property is set:
    `export ES_JAVA_OPTS="-Des.script.null_for_missing_value=true"'

Closes elastic#29286
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue May 1, 2018
Previously in script for numeric fields, there was no way to check if
a document is missing a value. Also certain operations on multiple-
values fields were missing.

This PR adds the following:

- add the following functions for multiple-valued numeric fields:
    doc['field'].min returns the minumum amoung values
    doc['field'].max returns the maximum amoung values
    doc['field'].sum returns the sum of amoung values
    doc['field'].avg returns the average of values

- return null for doc['field'] if a document is missing a 'field1':
    Now we can do this:
    if (doc['field'] == null) {return -1;} return doc['field'].value; or
    doc['field']?.value ?: -1
    This new behaviour will only work if the following system property is set:
    `export ES_JAVA_OPTS="-Des.script.null_for_missing_value=true"'

Closes elastic#29286
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Jun 4, 2018
Throw an exception for `doc['field'].value`
if this document is missing a value for the `field`.

For 7.0:
This is the default behaviour from 7.0

For 6.x:
To enable this behavior from 6.x, a user can set a jvm.option:
 `-Des.script.exception_for_missing_value=true` on a node.
If a user does not enable this behavior, a deprecation warning is logged on start up.

Closes elastic#29286
mayya-sharipova added a commit that referenced this issue Jul 9, 2018
* Handle missing values in painless

Throw an exception for `doc['field'].value`
if this document is missing a value for the `field`.

For 7.0:
This is the default behaviour from 7.0

For 6.x:
To enable this behavior from 6.x, a user can set a jvm.option:
 `-Des.script.exception_for_missing_value=true` on a node.
If a user does not enable this behavior, a deprecation warning is logged on start up.

Closes #29286
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Jul 10, 2018
Throw an exception for `doc['field'].value`
if this document is missing a value for the `field`.

For 7.0:
This is the default behaviour from 7.0

For 6.x:
To enable this behavior from 6.x, a user can set a jvm.option:
 `-Des.script.exception_for_missing_value=true` on a node.
If a user does not enable this behavior, a deprecation warning is logged on start up.

Closes elastic#29286
mayya-sharipova added a commit that referenced this issue Jul 17, 2018
Throw an exception for `doc['field'].value`
if this document is missing a value for the `field`.

For 7.0:
This is the default behaviour from 7.0

For 6.x:
To enable this behavior from 6.x, a user can set a jvm.option:
 `-Des.script.exception_for_missing_value=true` on a node.
If a user does not enable this behavior, a deprecation warning is logged on start up.

Closes #29286
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Jul 19, 2018
Throw an exception for doc['field'].value
if this document is missing a value for the field.

After deprecation changes have been backported to 6.x,
make this a default behaviour in 7.0

Closes elastic#29286
mayya-sharipova added a commit that referenced this issue Jul 19, 2018
Throw an exception for doc['field'].value
if this document is missing a value for the field.

After deprecation changes have been backported to 6.x,
make this a default behaviour in 7.0

Closes #29286
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Aug 13, 2018
For main code:
- ScriptModule will NOT produce a deprecation warning anymore,
when `es.scripting.exception_for_missing_value` is not set.
This warning is produced only when missing values in script are used and
is produced only once.

For tests:
- make `es.scripting.exception_for_missing_value` equal to `false`
by default in tests. This behaviour is similar with default
node settings, and with tests running from IDE.

- modify `ScriptDocValuesMissingV6BehaviourTests` to test the default
behaviour in v6, where there are default values for missing values.

- add `ScriptDocValuesMissingV7BehaviourTests` to test the default
behavior in v7 where `es.scripting.exception_for_missing_value`
is set to `true`, and where exceptions are produced for missing
values.

Relates to elastic#29286
mayya-sharipova added a commit that referenced this issue Sep 10, 2018
For main code:
- ScriptModule will NOT produce a deprecation warning anymore,
when `es.scripting.exception_for_missing_value` is not set.
This warning is produced only when missing values in script are used and
is produced only once.

For tests:
- make `es.scripting.exception_for_missing_value` equal to `false`
by default in tests. This behaviour is similar with default
node settings, and with tests running from IDE.

- modify `ScriptDocValuesMissingV6BehaviourTests` to test the default
behaviour in v6, where there are default values for missing values.

- add `ScriptDocValuesMissingV7BehaviourTests` to test the default
behavior in v7 where `es.scripting.exception_for_missing_value`
is set to `true`, and where exceptions are produced for missing
values.

Relates to #29286

* Address feedback

* Address feedback 2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement
Projects
None yet
Development

No branches or pull requests

5 participants