Bulk insert with script behaves incorrectly #48670

AlexeyRaga · 2019-10-30T03:49:45Z

I am using an official ES docker container: docker.elastic.co/elasticsearch/elasticsearch:7.4.1

Elasticsearch version (bin/elasticsearch --version):
Version: 7.4.1, Build: default/docker/fc0eeb6e2c25915d63d871d344e3d0b45ea0ea1e/2019-10-22T17:16:35.176724Z, JVM: 13

I also tried 7.3.2 and 7.2.1, they all experience this issue.

Plugins installed: []

JVM version (java -version):
openjdk version "13" 2019-09-17
OpenJDK Runtime Environment AdoptOpenJDK (build 13+33)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 13+33, mixed mode, sharing)

OS version (uname -a if on a Unix-like system):
Linux fa18f15fe8f0 4.9.184-linuxkit #1 SMP Tue Jul 2 22:58:16 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

When using bulk upsert with a "painless" script, first document in a batch seems to be handled incorrectly.

Steps to reproduce:

Perform this bulk insert into an index that doesn't yet contain documents with these ids:

curl -X POST "localhost:9200/_bulk?pretty" -H 'Content-Type: application/json' -d'
{ "update" : { "_id" : "1", "_index" : "index3"} }
{ "script" : { "source": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 2}}, "scripted_upsert": true, "upsert" : {"counter" : 1}}
{ "update" : { "_id" : "2", "_index" : "index3"} }
{ "script" : { "source": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 2}}, "scripted_upsert": true, "upsert" : {"counter" : 1}}
'

In this example I am inserting the same thing twice, only _id value is different, so I expect two identical documents to be inserted into the ES.

Query the index:

$ curl -X GET localhost:9200/index3/_search

{
  "took": 547,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "index3",
        "_type": "_doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "counter": 7
        }
      },
      {
        "_index": "index3",
        "_type": "_doc",
        "_id": "2",
        "_score": 1,
        "_source": {
          "counter": 3
        }
      }
    ]
  }
}

Note that the counter field for the first document is incorrect and is different from the counter field value from the second document.

3 is a correct value and 7 is not a correct value according to my expectations.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-10-30T08:10:11Z

Pinging @elastic/es-core-infra (:Core/Infra/Scripting)

rjernst · 2019-11-25T17:29:17Z

Thanks for simple reproduction instructions. I was able to reproduce this and track down the problem. A scripted upsert for the first doc in a bulk request may run many times (for me it ran only once, and thus I got 5 for the counter) depending on how long the underlying mapping update takes when needed. We have a loop in TransportShardBulkAction.performOnPrimary that tries executing the bulk action but returns early if an async mapping update was kicked off (as it was in my case, since I did not create the index or mappings ahead of time).

I'm moving this to the distrib team to determine the best fix. Seems like we need to either make the operation idempotent, so the potential subsequent runs start with a fresh doc again, or start and wait on mapping updates before even attempting to index the doc on the primary.

elasticmachine · 2019-11-25T17:29:33Z

Pinging @elastic/es-distributed (:Distributed/CRUD)

ywelsch · 2019-11-26T08:50:54Z

I've created a fix here: #49578

Fixes a bug where a scripted upsert that causes a dynamic mapping update is retried (because mapping update is still in-flight), and the request is mutated multiple times. Closes #48670

jeejeeone · 2021-03-12T08:42:58Z

This problem seems to be in 6.5 also. Is there a reliable workaround? Would storing painless script into elastic help? Anything else? Any help appreciated! @ywelsch @rjernst

henningandersen · 2021-03-15T13:21:02Z

One possible workaround is to ensure the mappings are in place on the index before doing the update, avoiding the dynamic mapping update.

jeejeeone · 2021-03-24T13:06:54Z

@henningandersen Thanks! This was very helpful

jimczi added the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Oct 30, 2019

jimczi added :Core/Infra/Core Core issues without another label >bug labels Oct 30, 2019

AlexeyRaga mentioned this issue Oct 31, 2019

Support es 7 bitemyapp/bloodhound#266

Closed

rjernst added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Core/Infra/Core Core issues without another label :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache labels Nov 25, 2019

ywelsch mentioned this issue Nov 26, 2019

Do not mutate request on scripted upsert #49578

Merged

ywelsch closed this as completed in #49578 Nov 27, 2019

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk insert with script behaves incorrectly #48670

Bulk insert with script behaves incorrectly #48670

AlexeyRaga commented Oct 30, 2019 •

edited

Loading

elasticmachine commented Oct 30, 2019

rjernst commented Nov 25, 2019

elasticmachine commented Nov 25, 2019

ywelsch commented Nov 26, 2019

jeejeeone commented Mar 12, 2021

henningandersen commented Mar 15, 2021

jeejeeone commented Mar 24, 2021

Bulk insert with script behaves incorrectly #48670

Bulk insert with script behaves incorrectly #48670

Comments

AlexeyRaga commented Oct 30, 2019 • edited Loading

elasticmachine commented Oct 30, 2019

rjernst commented Nov 25, 2019

elasticmachine commented Nov 25, 2019

ywelsch commented Nov 26, 2019

jeejeeone commented Mar 12, 2021

henningandersen commented Mar 15, 2021

jeejeeone commented Mar 24, 2021

AlexeyRaga commented Oct 30, 2019 •

edited

Loading