Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

null_pointer_exception returned when search to documents with invalid value #42187

Closed
HanguChoi opened this issue May 16, 2019 · 5 comments
Closed
Labels
:Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@HanguChoi
Copy link

HanguChoi commented May 16, 2019

Elasticsearch version (bin/elasticsearch --version):

Version: 7.0.0, Build: default/tar/b7e28a7/2019-04-05T22:55:32.697037Z, JVM: 1.8.0_112

Plugins installed: []

  • analysis-nori

JVM version (java -version):

java version "1.8.0_112"
Java(TM) SE Runtime Environment (build 1.8.0_112-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b16, mixed mode)

OS version (uname -a if on a Unix-like system):

  • Mac OS high sierra 10.13.2(17C88)
  • and linux (AWS) // I'll check if you need this.

Description of the problem including expected versus actual behavior:

  • expected: no errors.
  • actual: null_pointer_exception returned when query.
  • condition: seems when doc value is not valid
    (meaning: a, 1, 한글 is ok. but '', ., null is not ok.)

Steps to reproduce:

  1. create index
  2. add doc with invalid value.
  3. query with multi_match field

When I inspect this bug, the error conditions are very various.
I'll describe it with log below.

Provide logs (if relevant):

  • create index and add docs
  • the first user's introduction.text value is '' (empty string)
curl -X DELETE "localhost:9700/some_index"

curl -X PUT "localhost:9700/some_index" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1,
    "analysis": {}
  },
  "mappings": {
    "dynamic": false,
    "properties": {
      "user_id": {
        "type": "integer"
      },
      "skills": {
        "type": "object",
        "properties": {
          "id": {
            "type": "integer"
          },
          "name": {
            "type": "text",
            "fields": {
              "raw": {
                "type": "keyword"
              }
            }
          }
        }
      },
      "introduction": {
        "type": "object",
        "properties": {
          "text": {
            "type": "text",
            "fields": {
              "raw": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  }
}
'




curl -X PUT "localhost:9700/some_index/_doc/1" -H "Content-Type: application/json" -d'
{
  "user_id": 1,
  "introduction": {
    "text": ""
  },
  "skills": [
    {
      "id": 1,
      "name": "rails"
    }
  ]
}
'

curl -X PUT "localhost:9700/some_index/_doc/2" -H "Content-Type: application/json" -d'
{
  "user_id": 2,
  "introduction": {
    "text": "my name is hangu"
  },
  "skills": [
    {
      "id": 1,
      "name": "rails"
    }
  ]
}
'
  • and query
curl -X GET "localhost:9700/some_index/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            {
              "multi_match": {
                "query": "rails",
                "type": "cross_fields",
                "fields": [
                  "introduction.text",
                  "skills.name"
                ],
                "slop": 10
              }
            }
          ]
        }
      },
      "functions": []
    }
  },
  "from": 0,
  "size": 10
}
'

the response is

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 4,
    "skipped": 0,
    "failed": 1,
    "failures": [
      {
        "shard": 4,
        "index": "some_index",
        "node": "1QxjLQfUQ1yuIBV6dIs4Xg",
        "reason": {
          "type": "null_pointer_exception",
          "reason": null
        }
      }
    ]
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "some_index",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "user_id": 2,
          "introduction": {
            "text": "my name is hangu"
          },
          "skills": [
            {
              "id": 1,
              "name": "rails"
            }
          ]
        }
      }
    ]
  }
}

If errors, then all documents in error shard seems not hit.

@HanguChoi HanguChoi changed the title null_pointer_exception returned when search to documents with invalid value null_pointer_exception returned when search to documents with invalid value May 16, 2019
@HanguChoi
Copy link
Author

HanguChoi commented May 16, 2019

I found various conditions for this error.

settings

  • if I setting with "number_of_shards": 2, instead of 5, then no error (oh?? shard 3,4 also error )

what is valid doc?

no error

  • if I set value a for first user's introduction.text instead of '', then no error
  • if I set value 1 , then no error
  • if I set value 한글 , then no error
  • if I set value &nbsp; , then no error <== best workaround option until now I found.

error

  • if I set value . , then error
  • if I set value - , then error
  • if I set value _ , then error
  • if I set value ... , then error
  • if I set value ' '(spaced string) , then error
  • if I set value null , then error

search

  • if I use multi_match field introduction.text.raw instead of introduction.text then no error
  • if I use multi_match field skills.name.raw instead of skills.name then no error
  • if I query with hangu instead of rails then no error (valid doc matched only I thought)

I'll add others if I find more.

Thank you.

@HanguChoi
Copy link
Author

HanguChoi commented May 16, 2019

omit properties

  • if I add doc like below, also error. (no skills properties for second user)
curl -X PUT "localhost:9700/some_index/_doc/1" -H "Content-Type: application/json" -d'
{
  "user_id": 1,
  "introduction": {
    "text": "some"
  },
  "skills": [
    {
      "id": 1,
      "name": "rails"
    }
  ]
}
'

curl -X PUT "localhost:9700/some_index/_doc/2" -H "Content-Type: application/json" -d'
{
  "user_id": 2,
  "introduction": {
    "text": "my name is hangu rails developer"
  }
}
'

  • skills: [] => error
  • skills: [{ }] => error

@dnhatn dnhatn added the :Search Relevance/Analysis How text is split into tokens label May 17, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@codebird
Copy link
Contributor

codebird commented May 18, 2019

this is a duplicate of #41118 and fixed by this #41125

@jimczi
Copy link
Contributor

jimczi commented May 20, 2019

Thanks for spotting @codebird , however the fix introduced a new bug in 7.0.1 which is resolved in the coming release (7.1.0).

@jimczi jimczi closed this as completed May 20, 2019
@javanna javanna added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

6 participants