-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to change elasticsearchClusterName #416
Comments
Hi @nberthet, thanks for getting in touch. You're problem is nothing to do with the cli parameters, what you have put is correct. The problem is that the framework is unable to register as a framework with the master. This is usually caused by not being able to connect to zookeeper, or the master. There is an open issue to sanity check the framework state before this method is called, to prevent this stack trace. But the real issue is the inability to register as a framework. Check ip addresses/hostnames/hostname resolution/dns/firewall/auth/etc. Thanks, Phil |
Hi @philwinder, thanks for the quick heads up. Actually, I schedule the framework from marathon, the task starts "properly". After that, I can see the framework being registered in mesos, that's from there that I accessed the framework web UI. Also according to netstat.. I can see a connection established to both mesos master and zookeeper
Anything else I can look at ? |
Because you started it from marathon, it will always keep the scheduler alive. Just because the scheduler is "on" doesn't mean it is working. All I can say is that the scheduler can't register the elasticsearch framework for some reason. As to why, it could be a number of reasons and each user will have different problems. It is usually hostame resolution that causes the problem. Ah, I see that you are running the scheduler in "BRIDGE" mode. Are you sure your master can route to your scheduler? Have you got a dns/service discovery mechanism set up so that the master can obtain the correct IP address of the scheduler? If not, and I suspect not, then try running the scheduler in HOST mode, so that the scheduler takes the IP address of the slave it is running on. Thanks, Phil |
Hi @philwinder, I confirm, running with HOST would work. I'm just trying to understand why, because the port mapping was correct and all config done by IPs rather than hostnames. What connectivity (ie. ports) is required between the framework and the mesos master / what is being advertised ? |
OK great. So the scheduler needs to communicate with the master. The scheduler requests to register the framework, then the master acknowledges this. This is done asynchronously so the scheduler needs to be routable from the master. In order to do that, the scheduler must advertise itself on the correct hostname/ipaddress which is normally taken from the mesos-agent hostname. Unfortunately that is the hostname of the agent machine, and not the scheduler. So in short, you would need to have a proxy running on the agent to proxy the traffic to the scheduler, although I haven't tested this. I will add a task to update the docs to recommend host mode, as BRIDGE mode is probably more trouble than it's worth for the scheduler. Thanks. |
Hi @philwinder, Thanks for your help. I'll give another shot at bridged configuration whenever I'll have some time to experiment. At least for the time being everything is peachy |
I am running in host mode and getting the same error. It clearly says in the beginning that successfully connected to Zookeeper. Mesos version: 0.25.0-0.2.70.ubuntu1404 ( tried 0.26 at first then downgraded to what you guys tested on ) es.json:
Here is the full log from start of the container till the error:
Any ideas @philwinder ? |
I was looking #411 could it be because I dont use hostnames at all? My config is only ip based. See above. |
Hi @zoza1982 In the upcoming release, which should be landing within the next couple of days, I have added some sanity checks to ensure the framework is registered. This will remove this obscure ZooKeeper error and replace it with a simpler "the framework cannot register with the master" type message. I can't see from your logs what you specific problem is. I would recommend waiting a couple of days and trying on the new 0.7.0 version. If that doesn't work, then we can look again. Unfortunately the framework register code is an async callback from Mesos. So it's hard to know why it hasn't registered. The only thing you can do is trawl through the master logs and try to find an error message that relates to the ES framework. |
Thank you @philwinder . I will wait for 0.7 release and then come back here. |
To cut a long story short, if you are using hostnames, then the hostnames need to be resolvable. Otherwise Mesos and ES won't know the address of the hostname. In 0.7 we've added the option to overwrite the hostnames with ip addresses, like Mesos can. This might help your situation. But either way, I would recommend always thinking about your network. E.g. "is that routable, is that on a different subnet, is that hostname resolvable", etc. It is the hardest thing to get right in clusters, especially when using docker. |
Here is my case:
I just shrunk cluster to just 1 mesos master and 1 slave to narrow down the troubleshooting and define manually hostnames in /etc/hosts on both. What names do you suggest to put? mesos.master? zookepers? |
I fixed it ! Here is the issue. By looking at the master logs very deeply...I found something weird. This keeps looping ....
Notice from masters perspective Elasticsearch framework is on IP address @127.0.1.1:53203 ???? What?? It can't be ...it should be a slave's IP address.. So I went to slave in /etc/hosts and saw very weird thing ( a line )which was there by default ( ubuntu )
So I corrected it to
And on mesos master logs I see now :-)
And everything else now started working.... 👍 |
Hi,
I didn't try yet to play with different combination, but I was trying to deploy the framework using the marathon json example.
I tried to set both the frameworkName and elasticsearchClusterName, the framework was properly started, but it was unable to start the elasticsearch nodes, it seems some zookeeper path is being computed wrongly (additional /), please see the stack trace I included.
EDIT: after playing around with options, I confirm it's only about the elasticsearchClusterName, changing the frameworkName is fine.EDIT: After verification, it seems the problems happens anyway, no matter if I specify the cluster name / framework name or not
My marathon json
Stacktrace found in the framework logs
The text was updated successfully, but these errors were encountered: