Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reindex confict: "proceed" throws error when casting malformed data to geo_point #17617

Closed
g00fy- opened this issue Apr 8, 2016 · 4 comments
Closed
Assignees
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >docs General docs changes help wanted adoptme

Comments

@g00fy-
Copy link

g00fy- commented Apr 8, 2016

When reindexing malformed object to geo_point the conflict parameter is ignorred and stops the task

example doc

{"geo":{"lat":null,"lon":null}}

with mapping:

{"properties":{"geo":{"properties":{"lat":{"type":long}, "lon":{"type":long}}}}

reindexing to

{"properties":{"geo":{"type":"geo_point", "ignore_malformed": true}}}

exception:

{
   "took": 24483,
   "timed_out": false,
   "total": 5211,
   "updated": 699,
   "created": 2100,
   "batches": 28,
   "version_conflicts": 0,
   "noops": 0,
   "retries": 0,
   "failures": [
      {
         "index": "search.locations",
         "type": "location",
         "id": "86d20ff9-ea24-448c-95a4-f060a30a80ad",
         "cause": {
            "type": "mapper_parsing_exception",
            "reason": "failed to parse",
            "caused_by": {
               "type": "parse_exception",
               "reason": "latitude must be a number"
            }
         },
         "status": 400
      }
   ]
}

Elasticsearch version:
docker:latest (2.3)

example query

POST /_reindex
{
   "conflicts": "proceed",
   "source": {
      "index": "raw.locations"
   },
   "dest": {
      "index": "search.locations"
   }
}
@nik9000
Copy link
Member

nik9000 commented Apr 8, 2016

That doesn't look like a bug to me. Maybe a documentation bug.
conflicts=proceed only works on version conflicts.

If you want to skip the ones with invalid Geo points maybe you can craft
the query to do so? You could probably also use a script to try to make
them valid. But reindex doesn't have support for skipping arbitrary errors.
On Apr 8, 2016 6:58 AM, "Piotrek Majewski" notifications@github.com wrote:

When reindexing malformed object to geo_point the conflict parameter is
ignorred and stops the task

example doc

{"geo":{"lat":null,"lon":null}}

with mapping:

{"properties":{"geo":{"properties":{"lat":{"type":long}, "lon":{"type":long}}}}

reindexing to

{"properties":{"geo":{"type":"geo_point", "ignore_malformed": true}}}

exception:

{
"took": 24483,
"timed_out": false,
"total": 5211,
"updated": 699,
"created": 2100,
"batches": 28,
"version_conflicts": 0,
"noops": 0,
"retries": 0,
"failures": [
{
"index": "search.locations",
"type": "location",
"id": "86d20ff9-ea24-448c-95a4-f060a30a80ad",
"cause": {
"type": "mapper_parsing_exception",
"reason": "failed to parse",
"caused_by": {
"type": "parse_exception",
"reason": "latitude must be a number"
}
},
"status": 400
}
]
}

Elasticsearch version:
docker:latest (2.3)

example query

POST /_reindex
{
"conflicts": "proceed",
"source": {
"index": "raw.locations"
},
"dest": {
"index": "search.locations"
}
}


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#17617

@eskibars eskibars added the >docs General docs changes label Apr 8, 2016
@eskibars
Copy link
Contributor

eskibars commented Apr 8, 2016

We probably should clarify a bit further in the documentation what "conflicts" means. Its a bit implicit by the fact that the return result says "version_conflicts", but I could see how somebody could miss that and assume this will skip more problem types

@lcawl lcawl added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Reindex API labels Feb 13, 2018
@henningandersen henningandersen self-assigned this Mar 26, 2019
@henningandersen
Copy link
Contributor

henningandersen commented Mar 26, 2019

I tend to think there is a bug with geo_point here. Since the mapping was marked ignore_malformed : true, the index request should have succeeded. Nothing to do with the "conflicts": "proceed" _reindex setting though.

I tried reproducing this on master by indexing a doc with null lat/lon. It gave me a different error from above:

put localhost:9200/x/_doc/3?pretty
{"geo":{"lat":null,"lon":null}}
{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "failed to parse"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "failed to parse",
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "Malformed content, found extra data after parsing: END_OBJECT"
    }
  },
  "status" : 400
}

It seems the ignore_malformed handling has been put in place, but the parsing then mismatches the curly braces. This PR: #16833 seems to have added the ignore_malformed handling.

henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Mar 26, 2019
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.

Related to elastic#17617
henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Mar 26, 2019
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.

Related to elastic#17617
henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Mar 27, 2019
Improved XContentSubParser to allow any token, which is useful for
wrapping in cases where both object and values are allowed.

Related to elastic#17617
henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Mar 28, 2019
Reverted to a minimalistic change.

Related to elastic#17617
henningandersen added a commit that referenced this issue Mar 28, 2019
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.

Related to #17617
henningandersen added a commit that referenced this issue Mar 29, 2019
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.

Related to #17617
henningandersen added a commit that referenced this issue Mar 29, 2019
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.

Related to #17617
henningandersen added a commit that referenced this issue Apr 8, 2019
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.

Related to #17617
@henningandersen
Copy link
Contributor

Fixed above mentioned bug in geo_point parsing and clarified meaning of conflicts: proceed in reindex. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >docs General docs changes help wanted adoptme
Projects
None yet
Development

No branches or pull requests

6 participants