/_cat/indices: race condition/exception thrown when creating/deleting indices rapidly #17395

joshuar · 2016-03-30T06:38:44Z

Elasticsearch Versions:
2.1.x - 2.2.x

JVM Version:
OpenJDK 64-Bit Server VM (build 25.77-b03, mixed mode)

Description of the problem including expected versus actual behavior:

It looks like if you create and delete indices very rapidly you can sometimes end up getting a 404 index_not_found_exception from the /_cat/indices API. So basically rather than seeing the indices that still exist, you get an exception because one index was created and deleted between when the endpoint resolves indices and then call indices stats api for those indices.

Steps to reproduce:

Attached is a script that should hopefully reproduce the problem, but it can take some time (few minutes depending on the ability of your ES cluster to handle responses :)

cat-indices-issue.sh.zip

The script is Bash. It will run a number of background processes that simply spin creating and then deleting an index with the REST API. The script waits until the output of _cat/indices produces the exception, then exits. NOTE: It will clean-up any background processes it creates but won't clean up any test indices that might still exist.

When the script dies, it will produce output like:

{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"testindex3","index":"testindex3"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"testindex3","index":"testindex3"},"status":404}
...output of the script cleaning up after itself...

The text was updated successfully, but these errors were encountered:

kwilczynski · 2016-03-30T06:47:36Z

@joshuar thank you for logging this! (via @faeldt)

To add, sometimes the "type": "null_pointer_exception" (as an addition to the above; sorry I don't have full JSON output at hand) would show up when querying /_cat/indices or /_cat/shards, etc.

Since this is a machine parseable output, it would be delightful for it to suppress any error and just not return JSON (or any errors for that matter, unless you add ?e=true or something).

clintongormley · 2016-03-30T19:35:08Z

@kwilczynski if you happen to come across this NPE again, please paste the stack trace from the logs into this issue

kwilczynski · 2016-03-31T01:34:26Z

@clintongormley I will try to fish something on the Elasticsearch side, as since I fixed this on my side (to retry on error, and I don't log in debug in production, I lost it for now, sadly).

kwilczynski · 2016-03-31T02:40:33Z

@clintongormley I found the following NPE going through the older logs:

RemoteTransportException[[prd11-c-tky-master-search-catalogpf][10.184.20.209:9300][cluster:monitor/stats[n]]]; nested: NullPointerException;
:Caused by: java.lang.NullPointerException
     at org.elasticsearch.action.admin.cluster.health.ClusterIndexHealth.<init>(ClusterIndexHealth.java:73)
     at org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction.nodeOperation(TransportClusterStatsAction.java:127)
     at org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction.nodeOperation(TransportClusterStatsAction.java:58)
     at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:211)
     at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:207)
     at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:350)
     at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
     at java.lang.Thread.run(Thread.java:745)

clintongormley · 2016-03-31T17:36:36Z

thanks @kwilczynski

djschny · 2016-05-09T15:46:22Z

Note I also am receiving similar with _cat/indices on ES 2.3.2. It happens if I'm running a node with multiple path.data entries and remove one of the drives that was on the path.data:

{
   "error": {
      "root_cause": [
         {
            "type": "null_pointer_exception",
            "reason": null
         }
      ],
      "type": "null_pointer_exception",
      "reason": null
   },
   "status": 500
}

Withe following stacktrace from the node:

[2016-05-09 11:40:24,520][WARN ][rest.suppressed          ] /_cat/indices Params: {}
java.lang.NullPointerException
    at org.elasticsearch.rest.action.cat.RestIndicesAction.buildTable(RestIndicesAction.java:331)
    at org.elasticsearch.rest.action.cat.RestIndicesAction.access$100(RestIndicesAction.java:52)
    at org.elasticsearch.rest.action.cat.RestIndicesAction$1$1$1.buildResponse(RestIndicesAction.java:97)
    at org.elasticsearch.rest.action.cat.RestIndicesAction$1$1$1.buildResponse(RestIndicesAction.java:94)
    at org.elasticsearch.rest.action.support.RestResponseListener.processResponse(RestResponseListener.java:43)
    at org.elasticsearch.rest.action.support.RestActionListener.onResponse(RestActionListener.java:49)
    at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:89)
    at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:85)
    at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.onCompletion(TransportBroadcastByNodeAction.java:378)
    at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.onNodeResponse(TransportBroadcastByNodeAction.java:347)
    at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction$1.handleResponse(TransportBroadcastByNodeAction.java:319)
    at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction$1.handleResponse(TransportBroadcastByNodeAction.java:311)
    at org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:819)
    at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:803)
    at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:793)
    at org.elasticsearch.transport.DelegatingTransportChannel.sendResponse(DelegatingTransportChannel.java:58)
    at org.elasticsearch.transport.RequestHandlerRegistry$TransportChannelWrapper.sendResponse(RequestHandlerRegistry.java:134)
    at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:412)
    at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:386)
    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
    at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Closes elastic#16419 Closes elastic#17395

Closed indices are already displayed when no indices are explicitly selected. This commit ensures that closed indices are also shown when wildcard filtering is used. It also addresses another issue that is caused by the fact that the cat action is based internally on 3 different cluster states (one when we query the cluster state to get all indices, one when we query cluster health, and one when we query indices stats). We currently fail the cat request when the user specifies a concrete index as parameter that does not exist. The implementation works as intended in that regard. It checks this not only for the first cluster state request, but also the subsequent indices stats one. This means that if the index is deleted before the cat action has queried the indices stats, it rightfully fails. In case the user provides wildcards (or no parameter at all), however, we fail the indices stats as we pass the resolved concrete indices to the indices stats request and fail to distinguish whether these indices have been resolved by wildcards or explicitly requested by the user. This means that if an index has been deleted before the indices stats request gets to execute, we fail the overall cat request. The fix is to let the indices stats request do the resolving again and not pass the concrete indices. Closes #16419 Closes #17395

clintongormley added >bug help wanted adoptme :Allocation labels Mar 30, 2016

djschny mentioned this issue May 9, 2016

recovery issues when using multiple entries in path.data #18217

Closed

ywelsch added a commit to ywelsch/elasticsearch that referenced this issue May 24, 2016

Expand wildcards to closed indices in /_cat/indices

1c3d7da

Closes elastic#16419 Closes elastic#17395

ywelsch mentioned this issue May 24, 2016

Expand wildcards to closed indices in /_cat/indices #18545

Merged

ywelsch closed this as completed in #18545 May 25, 2016

lcawl added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Allocation labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/_cat/indices: race condition/exception thrown when creating/deleting indices rapidly #17395

/_cat/indices: race condition/exception thrown when creating/deleting indices rapidly #17395

joshuar commented Mar 30, 2016

kwilczynski commented Mar 30, 2016

clintongormley commented Mar 30, 2016

kwilczynski commented Mar 31, 2016

kwilczynski commented Mar 31, 2016

clintongormley commented Mar 31, 2016

djschny commented May 9, 2016

/_cat/indices: race condition/exception thrown when creating/deleting indices rapidly #17395

/_cat/indices: race condition/exception thrown when creating/deleting indices rapidly #17395

Comments

joshuar commented Mar 30, 2016

kwilczynski commented Mar 30, 2016

clintongormley commented Mar 30, 2016

kwilczynski commented Mar 31, 2016

kwilczynski commented Mar 31, 2016

clintongormley commented Mar 31, 2016

djschny commented May 9, 2016