Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] In flat_object fields null values indexed as "null" strings #13880

Closed
akramarev opened this issue May 29, 2024 · 8 comments · Fixed by #14069
Closed

[BUG] In flat_object fields null values indexed as "null" strings #13880

akramarev opened this issue May 29, 2024 · 8 comments · Fixed by #14069
Labels
bug Something isn't working Indexing Indexing, Bulk Indexing and anything related to indexing

Comments

@akramarev
Copy link

Describe the bug

Flat_object documentation doesn't define null values indexing rules, though it looks unexpected that documents containing fields with null values can be searched with exists or even prefix: "nu" queries.

Related component

Indexing

To Reproduce

  1. create a new index with flat_object field mapping
PUT demo-flattened
{
  "mappings": {
    "properties": {
      "record": {
        "type": "flat_object",
      }
    }
  }
}
  1. index a document with record.name: null
PUT demo-flattened/_doc/1
{
  "record": {
    "name": null
  }
}
  1. search the document using prefix query:
POST demo-flattened/_search
{
  "query": {
    "prefix": {
      "record.name": {
        "value": "nu",
      }
    }
  }
}
  1. OR search the document using exists query:
POST demo-flattened/_search
{
  "query": {
    "exists": {
      "field": "record.name"
    }
  }
}

Expected behavior

Both queries 3 and 4 should return no documents, but they do return the ingested document. Tested using latest opensearchproject/opensearch:2.14.0 image.

Additional Details

Host/Environment:

  • Version opensearchproject/opensearch:2.14.0

Additional context

  • validated my expectations with docker.elastic.co/elasticsearch/elasticsearch:7.15.0, with flattened field mapping instead of flat_object
  • I know that flat_object mapping today doesn't support additional parameters ([Enhancement] Add Open Parameters to Flat_object Field Type #7137) including null_value, though the currently selected default when null is treated as "null" (based on the behavior described above) doesn't look right.
@akramarev akramarev added bug Something isn't working untriaged labels May 29, 2024
@github-actions github-actions bot added the Indexing Indexing, Bulk Indexing and anything related to indexing label May 29, 2024
@kkewwei
Copy link
Contributor

kkewwei commented May 31, 2024

@akramarev it seems that query 4 return document is expected. I tested on the pr #7137, it doesn't return document in query 3.

@akramarev
Copy link
Author

Thanks for the rely @kkewwei. It's great that your PR fixes my main concern - test case 3 (prefix query by "nu" for null field). I should spin it up and check locally as well.

Regarding test case 4 (Exists query on the null field) though, I'm not sure the behavior is expected. My expectations are based on the documentation:

1. The flat object field type supports the following queries: ..., Exists
2. An indexed value will not exist for a document field in any of the following cases: ..., The field in the source JSON is null or [].

So the Exists behavior that I expect from text field for example, I equally expect from flat_object fields, at least from root level json properties (e.g. record.name). Also as I mentioned elasticsearch:7.15.0 confirms my expectations.

Text field example just illustrate the point above:

PUT demo-text
{
  "mappings": {
    "properties": {
      "record": {
        "type": "text",
      }
    }
  }
}

PUT demo-text/_doc/1
{
  "record": null
}

# expected/actual hits.total.value=0
POST demo-text/_search
{
  "query": {
    "exists": {
      "field": "record"
    }
  }
}

@kkewwei
Copy link
Contributor

kkewwei commented Jun 3, 2024

@akramarev I update my statement. In query case 3, it doesn't return document too. We don't build any index for the null value, you can see it from the unit test:
https://github.com/opensearch-project/OpenSearch/pull/13853/files#diff-b8bb41a040735cadecdd9200b835224f11a6a213e6c61bffc8ef4a74180f58e2R201

In query case 3, it will return doc when we set the null_value for the field.

@akramarev
Copy link
Author

akramarev commented Jun 3, 2024

@kkewwei I'm confused with case 3 and case 3 in the latest reply, could you clarify? I think you meant "In query case 3, it shouldn't return the document too".

In the meantime, I pulled your branch and verified that Prefix query doesn't return the document as expected (thanks for the fix!). While Exists query still returns it, I believe it's a bug. The quickest repro with curl commands, same as before, but just a bit simpler to copy&paste in a terminal to validate.

❯ git status
On branch flat_object_para
Your branch is up to date with 'origin/flat_object_para'.

nothing to commit, working tree clean

❯ curl -X PUT "localhost:9200/demo-flattened" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "record": {
        "type": "flat_object"
      }
    }
  }
}'
{"acknowledged":true,"shards_acknowledged":true,"index":"demo-flattened"}%
                                                                                                           
❯ curl -X PUT "localhost:9200/demo-flattened/_doc/1" -H 'Content-Type: application/json' -d'
{
  "record": {
    "name": null
  }
}'
{"_index":"demo-flattened","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}%
                                 
❯ curl -X POST "localhost:9200/demo-flattened/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "exists": {
      "field": "record.name"
    }
  }
}'
{"took":30,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"demo-flattened","_id":"1","_score":1.0,"_source":
{
  "record": {
    "name": null
  }
}}]}}%

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7]
@akramarev Thanks for creating this issue, could you make a pull request to address?

@kkewwei
Copy link
Contributor

kkewwei commented Jun 6, 2024

@peternied I'm working on it, and almost done, please assign it to me, @akramarev, it looks more complicated than expected, there are too many cases about null value for flat_object type, I will pull separate pr to fix the null value.

@kkewwei
Copy link
Contributor

kkewwei commented Jun 8, 2024

@akramarev, please try the new branch #14069 when you are free. I test two of cases ok now.

@akramarev
Copy link
Author

Thanks, @kkewwei, I can confirm - the test cases mentioned above work properly in your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Indexing Indexing, Bulk Indexing and anything related to indexing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants