Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Arrays of geo_shape are broken after v2.8.0 #14193

Open
kurtmckee opened this issue Jun 11, 2024 · 8 comments · May be fixed by #14196
Open

[BUG] Arrays of geo_shape are broken after v2.8.0 #14193

kurtmckee opened this issue Jun 11, 2024 · 8 comments · May be fixed by #14196
Labels
bug Something isn't working Geospatial Other

Comments

@kurtmckee
Copy link

kurtmckee commented Jun 11, 2024

Describe the bug

While investigating whether a product I work on could migrate from ElasticSearch to OpenSearch, I found that OpenSearch support for ingesting arrays of geo_shape objects is broken after v2.8.0.

Specifically, the error message I get back during ingest is:

{
  "error" : {
    "type" : "illegal_argument_exception",
    "reason" : "DocValuesField \"location\" appears more than once in this document (only one value is allowed per field)"
  }
}

I created a YAML test and used git bisect between v2.8.0 and v2.9.0 commits to track the issue to commit ebe6f2c2878f036cb4c1147683c30091747d7632, introduced in PRs #4266 and #8301.

Related component

Other

To Reproduce

Note

I've opened PR #14196 using a branch containing the YAML test file that demonstrates this issue. An OpenSearch maintainer can pull the changes, add fix commits, and push back to the associated PR.

I'm running the test suite using the following command with an additional YAML file that demonstrates the problem.

./gradlew :modules:geo:yamlRestTest

YAML file location:

modules/geo/src/yamlRestTest/resources/rest-api-spec/test/geo_shape/300_array_of_geo_shapes.yml

YAML file contents (with comments added only here):

Details

setup:
  - do:
      indices.create:
        index: test
        body:
          settings:
            number_of_replicas: 0
          mappings:
            properties:
              location:
                type: "geo_shape"

---
# This works across all git commits in 2.8.0..2.9.0
"Single shape":
  - do:
      bulk:
        refresh: true
        body:
          - index:
              _index: test
              _id: 1
          - location:
              type: "polygon"
              coordinates: [
                [
                  [ -87.0, 41.0 ],
                  [ -88.0, 41.0 ],
                  [ -88.0, 42.0 ],
                  [ -87.0, 42.0 ],
                  [ -87.0, 41.0 ],
                ]
              ]

  - match: {errors: false}

---
# This fails starting at commit ebe6f2c2878f036cb4c1147683c30091747d7632
"Array of shapes":
  - do:
      bulk:
        refresh: true
        body:
          - index:
              _index: test
              _id: 1
          - location:
              - type: "polygon"
                coordinates: [
                  [
                    [ -87.0, 41.0 ],
                    [ -88.0, 41.0 ],
                    [ -88.0, 42.0 ],
                    [ -87.0, 42.0 ],
                    [ -87.0, 41.0 ],
                  ]
                ]
              - type: "polygon"
                coordinates: [
                  [
                    [ -97.0, 41.0 ],
                    [ -98.0, 41.0 ],
                    [ -98.0, 42.0 ],
                    [ -97.0, 42.0 ],
                    [ -97.0, 41.0 ],
                  ]
                ]

  - match: {errors: false}

Expected behavior

I expected that I could creating a "location" field with a geo_shape mapping and then ingest an array of geo_shape objects. The product I work on uses ElasticSearch 7.10 and arrays of geo_shape objects can be ingested, and this also is how arrays are documented to work:

There is no dedicated array field type in OpenSearch. Instead, you can pass an array of values into any field.

Additional Details

None. I'm just running the test suite with an additional YAML file.

@kurtmckee kurtmckee added bug Something isn't working untriaged labels Jun 11, 2024
kurtmckee added a commit to kurtmckee/pr-OpenSearch that referenced this issue Jun 11, 2024
This was first broken in commit ebe6f2c

Demonstrates opensearch-project#14193

Signed-off-by: Kurt McKee <contactme@kurtmckee.org>
@github-actions github-actions bot added the Other label Jun 11, 2024
@dblock
Copy link
Member

dblock commented Jun 12, 2024

Thank you for a very detailed report, a YAML REST test, and a bisect, this is super helpful @kurtmckee. Are you trying to fix it? /cc: @heemin32 @navneet1v

@kurtmckee
Copy link
Author

@dblock You're welcome! Unfortunately I won't be able to dive in to fix this (I lack experience with Java). I'm grateful you double-checked, though, to avoid possible duplication of effort!

@andrross
Copy link
Member

[Triage - attendees 1 2 3 4]

Thanks for filing. Looks like a clear bug. @kurtmckee thanks for the investigation and detail provided!

@heemin32
Copy link
Contributor

@navneet1v Could you take a look?

@navneet1v
Copy link
Contributor

@kurtmckee on which version of Opensearch you are facing the issue? Can you switch to 2.11/2.13 version of Opensearch?

@kurtmckee
Copy link
Author

@navneet1v

which version of Opensearch you are facing the issue

This bug is present in all versions of OpenSearch after v2.8.0. I opened a PR, #14196, based off main, that demonstrates this issue.

The Jenkins run for that PR shows that STDOUT contains the error I described in this bug report.

I backtracked through the Docker images provided by OpenSearch in our test suite until I found that 2.8.0 worked; I then cloned the OpenSearch repo, created the YAML test case, and used git bisect to find the commit that introduced the regression.

@navneet1v
Copy link
Contributor

@navneet1v

which version of Opensearch you are facing the issue

This bug is present in all versions of OpenSearch after v2.8.0. I opened a PR, #14196, based off main, that demonstrates this issue.

The Jenkins run for that PR shows that STDOUT contains the error I described in this bug report.

I backtracked through the Docker images provided by OpenSearch in our test suite until I found that 2.8.0 worked; I then cloned the OpenSearch repo, created the YAML test case, and used git bisect to find the commit that introduced the regression.

Can you try one thing, for the geo_shapes field can you try disabling the doc_value using "doc_values": false in mappings.

I will go ahead and look into what part of code is creating the issue, because I do know that we can index more than 1 geo_shape in 1 field.

@kurtmckee
Copy link
Author

@navneet1v Thanks for that suggestion. It appears that if doc_values: false is set in the mapping then arrays of geo_shape can be ingested.

I've added an additional commit that also tests doc_values: false to PR #14196 to both demonstrate that this avoids the problem, and to ensure that the test suite guards against future regressions. You should have permissions to push commits to that PR branch.

# Using the GitHub CLI
gh co 14196

# Using git
git remote add kurtmckee git@github.com:kurtmckee/pr-OpenSearch.git
git fetch kurtmckee
git checkout -b fix-arrays-of-geo_shape-issue-14193 kurtmckee/fix-arrays-of-geo_shape-issue-14193

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Geospatial Other
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants