Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.NullPointerException in Tribe node #20298

Closed
zcola opened this issue Sep 2, 2016 · 7 comments
Closed

java.lang.NullPointerException in Tribe node #20298

zcola opened this issue Sep 2, 2016 · 7 comments
Labels

Comments

@zcola
Copy link

zcola commented Sep 2, 2016

Elasticsearch version:2.3.4

Plugins installed: [ 'analysis-ik', 'kopf', ' license', 'marvel-agent' ]

JVM version:Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

OS version:Debian 8

Description of the problem including expected versus actual behavior:

elasticsearch.yml
network.host: 10.160.98.78
node.name: "${HOSTNAME}_${NODE_ZONE}"
tribe:
    e100:
        cluster.name: logstash-es
        discovery.zen.ping.timeout: 100s
        network.host: 10.160.98.78
        discovery.zen.ping.multicast.enabled: false
        discovery.zen.ping.unicast.hosts: ["10.120.69.96", "10.120.69.97", ...]
    e101:
        cluster.name: es-new
        discovery.zen.ping.timeout: 100s
        network.host: 10.160.98.78
        discovery.zen.ping.multicast.enabled: false
        discovery.zen.ping.unicast.hosts: [ "10.63.72.10", "10.63.72.11", ..." ]
    e102:
        cluster.name: es-102
        discovery.zen.ping.timeout: 100s
        network.host: 10.160.98.78
        discovery.zen.ping.multicast.enabled: false
        discovery.zen.ping.unicast.hosts: [ "10.63.72.12" ]
    blocks.indices.write: e100
    on_conflict: prefer_e100
script.engine.groovy.inline.search: true
script.engine.groovy.inline.aggs: true
path.plugins: "/home/elk/running/elasticsearch/plugins_tribe"

Steps to reproduce:

  • When the timer expires night to clean index, tribe node may not work properly (We have 2 tribe node,It seems to work fine 19.36, 98.78 error)
curl 10.160.98.78:9200/_cat/indices
{"error":{"root_cause":[{"type":"null_pointer_exception","reason":null}],"type":"null_pointer_exception","reason":null},"status":500}
curl http://10.63.19.36:9200/_cat/indices
green open g15_zzz-2016.06.30                           1 0         3     0   6.8kb   6.8kb
green open g18_tmp_nat_iptables-fileupload              1 0      1172     0   283kb   283kb
green open xy2freeclient-2016.09                        1 1     70172     0  13.7mb   6.8mb
green open xy2freeclient-2016.08                        1 1   1434906     0 250.1mb 125.1mb
green open appdown_accesslog-2016.08                    1 1 107015373     0  34.6gb  17.3gb
  • We use curator to close or delete indices
    def delete(self, indices):
        """
        :param indices: 删除列表
        :return:
        """
        w = indices if isinstance(indices, list) else [indices]
        return curator.delete_indices(self.client, w)

    def close(self, indices):
        """

        :param indices: 关闭列表
        :return:
        """
        w = indices if isinstance(indices, list) else [indices]
        return curator.close_indices(self.client, w)
2016-09-01 03:30:24,773 - __main__ - INFO - ---------------------- voicelog start ----------------------
GET / {} None
GET / {} None
GET /_cat/indices/voicelog-2016.08.26 {'h': 'status', u'format': 'json'} None
GET / {} None
POST /voicelog-2016.08.26/_flush/synced {} None
POST /voicelog-2016.08.26/_close {'ignore_unavailable': 'true'} None
2016-09-01 03:30:26,722 - __main__ - INFO - 关闭索引:voicelog-2016.08.26 ret:True
2016-09-01 03:30:26,722 - __main__ - INFO - 关闭5天前的index:voicelog-2016.08.26 过期天数:6
  • After restart tribe node 98.78 back to normal

Provide logs (if relevant):

[2016-09-01 03:30:26,331][INFO ][tribe                    ] [elk-edata04-101_tribe] [e100] removing index [voicelog-2016.08.26]
[2016-09-01 03:30:26,332][WARN ][tribe                    ] [elk-edata04-101_tribe] failed to process [cluster event from e100, zen-disco-receive(from master [{elk-edata05-100}{Kf1SqrhFR9ywP_fMAB_Jdw}{10.120.69.109}{10.120.69.109:9300}{master=true}])]
java.lang.NullPointerException
[2016-09-01 03:30:26,741][WARN ][cluster.service          ] [elk-edata04-101_tribe/e100] failed to notify ClusterStateListener
java.lang.ClassCastException: org.elasticsearch.license.plugin.core.LicensesMetaData cannot be cast to org.elasticsearch.license.plugin.core.LicensesMetaData
    at org.elasticsearch.license.plugin.core.LicensesService.clusterChanged(LicensesService.java:466)
    at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
[2016-09-01 03:30:26,748][INFO ][tribe                    ] [elk-edata04-101_tribe] [e100] removing index [voicelog-2016.08.26]
[2016-09-01 03:30:26,748][WARN ][tribe                    ] [elk-edata04-101_tribe] failed to process [cluster event from e100, zen-disco-receive(from master [{elk-edata05-100}{Kf1SqrhFR9ywP_fMAB_Jdw}{10.120.69.109}{10.120.69.109:9300}{master=true}])]
java.lang.NullPointerException
[2016-09-01 03:30:31,685][WARN ][cluster.service          ] [elk-edata04-101_tribe/e100] failed to notify ClusterStateListener
java.lang.ClassCastException: org.elasticsearch.license.plugin.core.LicensesMetaData cannot be cast to org.elasticsearch.license.plugin.core.LicensesMetaData
    at org.elasticsearch.license.plugin.core.LicensesService.clusterChanged(LicensesService.java:466)
    at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
[2016-09-02 18:28:18,359][WARN ][rest.suppressed          ] path: /_cat/indices, params: {}
java.lang.NullPointerException
    at org.elasticsearch.rest.action.cat.RestIndicesAction.buildTable(RestIndicesAction.java:345)
    at org.elasticsearch.rest.action.cat.RestIndicesAction.access$100(RestIndicesAction.java:52)
    at org.elasticsearch.rest.action.cat.RestIndicesAction$1$1$1.buildResponse(RestIndicesAction.java:111)
    at org.elasticsearch.rest.action.cat.RestIndicesAction$1$1$1.buildResponse(RestIndicesAction.java:108)
    at org.elasticsearch.rest.action.support.RestResponseListener.processResponse(RestResponseListener.java:43)
    at org.elasticsearch.rest.action.support.RestActionListener.onResponse(RestActionListener.java:49)
    at org.elasticsearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:89)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
@clintongormley
Copy link
Contributor

@zcola you're sure you're using 2.3.4 on both tribe nodes and clusters? I ask because this should have been fixed by #18545

@zcola
Copy link
Author

zcola commented Sep 3, 2016

@clintongormley Yes, All nodes are running Version 2.3.4 ,In the tribe node not just request _cat/indices return 500, request some index return 404. And are upgrading from 2.3.2 to 2.3.4, such problems (small chance) appears, we do not know there is no correlation?

@clintongormley
Copy link
Contributor

@areek could you take a look at this please?

@areek
Copy link
Contributor

areek commented Sep 13, 2016

@zcola Assuming you are running _cat/indices against the tribe node and getting the NullPointerException transiently after removing/closing indices in some underlying cluster.

The tribe node keeps track of all the indices for underlying clusters, when you remove or close indices, the cluster state (of the underlying cluster) gets updated and then the tribe node updates its state subsequently. If a _cat/indices request hits the tribe node before the tribe nodes updates its state but after an underlying cluster closes/removes it's indices. You can run into NPE for _cat/indices as the closed/removed indices do not have any available shards to report stats from. I will work on a PR to fix this, but in the meantime can you report:

  • if this error is transient
  • does the _cat/indices succeed when you point it directly at any underlying cluster

areek added a commit to areek/elasticsearch that referenced this issue Sep 19, 2016
Currently, when an index exists in the cluster state but has no shards for reporting stats,
the missing stats object cause a `NullPointerException` when requesting the indices stats.
In this commit missing stats object for an index are initialized as empty stats instead
of null, honoring the stats flags set in the stats request. The commit fixes the issue for all
APIs that use the indices stats API namely `_cat/indices`, `_cat/shards` and `_stats`.

closes elastic#20298
@zcola
Copy link
Author

zcola commented Aug 11, 2017

2.4.4 will still appear, we use "_cat / indices" as node health monitoring

@bleskes bleskes added the help wanted adoptme label Aug 22, 2017
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Sep 28, 2017
Currently, when an index exists in the cluster state but has no shards for reporting stats,
the missing stats object cause a `NullPointerException` when requesting the indices stats.
In this commit missing stats object for an index are initialized as empty stats instead
of null, honoring the stats flags set in the stats request. The commit fixes the issue for all
APIs that use the indices stats API namely `_cat/indices`, `_cat/shards` and `_stats`.

closes elastic#20298
@rjernst
Copy link
Member

rjernst commented Oct 10, 2017

Note that a fix for this issue existed in #20464, but needs to be revived to be mergeable.

@rjernst
Copy link
Member

rjernst commented Apr 17, 2018

Tribe node was removed in #28443, so this issue is no longer relevant.

@rjernst rjernst closed this as completed Apr 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants