Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync is failling when no agents stored in cache #191

Closed
tomez opened this issue Feb 28, 2017 · 11 comments
Closed

Sync is failling when no agents stored in cache #191

tomez opened this issue Feb 28, 2017 · 11 comments
Assignees

Comments

@tomez
Copy link
Collaborator

tomez commented Feb 28, 2017

In rare cases when marathon-consul is commencing sync there could be no agents available in the cache. For example just after start-up.

It results in following error:

{
  "error" : "Can't get Consul services: No Consul client available in agents cache",
  "level" : "error",
  "msg" : "An error occured while performing sync",
  "time" : "2017-02-28T10:44:15+01:00"
}

And sync is dropped.

This is happening because agent cache is fed on registrations and de-registrations. While sync at the beginning is calling for list of applications and to do that agent cache must not be empty.

Summarizing: no register/de-register before sync -> marathon-consul will fail on sync.

To resolve, we could place consul-master in configuration as a fall-back agent-replacement and use in the case of nil cache.

@janisz
Copy link
Contributor

janisz commented Mar 3, 2017

Summarizing: no register/de-register before sync -> marathon-consul will fail on sync.

It's not true. Sync is first task marathon-consul does after start. If there is no marathon-consul's task in Marathon then agents cache will not be populated and this error will happen.

We can solve it by adding --consul-agent-location parameter so we always have at leas one agent in cache.

@xueyi28
Copy link

xueyi28 commented Mar 31, 2017

how to add --consul-agent-location parameter?

@janisz
Copy link
Contributor

janisz commented Mar 31, 2017

It needs to be added in consul.Config handle this flag in config.parseFlags and finally add this agent to agents map in agents.NewAgents(config *Config)

@hokiegeek2
Copy link

hokiegeek2 commented Apr 5, 2017

Hey, I am getting tripped up by this as well.

I have marathon-consul running as a Marathon task with the following args: --marathon-location=server1:8080 --marathon-leader=server2:8080 --sse-enabled=true

I can see the Marathon events coming into the logs but the corresponding marathon apps are not being registered or de-registered in consul. Here are the error messages I get:

Syncing services started

"An error occured while performing sync" error="Can't get Consul services: No Consul client available in agents cache"

"There was a problem deregistering task" ... error="No Consul client available in agents cache"

Also, consul is receiving TASK_RUNNING events, but not registering

Any ideas?

--John

@janisz
Copy link
Contributor

janisz commented Apr 6, 2017

@hokiegeek2 Marathon-Conusl will not register task (and populate agents cache) if task is has not consul label. Can you add more logs?

@hokiegeek2
Copy link

hokiegeek2 commented Apr 6, 2017

@janisz Thanks so much for responding so quickly! I really appreciate it. This looks like a great tool and I am looking forward to getting it running correctly.

I apologize, I should have been more precise in my question. I have one marathon task labeled consul=jupyterhub-nginx which is the task I am looking for and I am pegging it to a node that has a mesos slave and consul agent. Finally, I have marathon-consul running on all four of my Marathon masters. marathon-consul is deployed in Docker containers that are launched locally, not via Marathon with the args I specified above, although I did set the log-level to debug as well.

When I start the jupyterhub-nginx in Marathon I get two messages in the marathon-consul logs:

level="info" msg="Got StatusEvent" Id=jupyterhub-nginx******** TaskStatus="TASK_RUNNING"
level="debug" msg="Not handled task status" Id=jupyterhub-nginx******** taskStatus="TASK_RUNNING"

In Consul I see the node I am launching the Marathon task on is listed node with an agent health status of "Agent alive and reachable", but I don't see my task

So the marathon events are being sent to and acknowledged by marathon-consul, but the corresponding marathon tasks are not being registered with Consul.

--John

@janisz
Copy link
Contributor

janisz commented Apr 6, 2017

@hokiegeek2 Your task is not registered because marathon-consul registers only when task becomes healthy. Did you specify health check for task?

The task is registered when Marathon marks it as alive.

@hokiegeek2
Copy link

@janisz Yay! That was it! Many apologies for missing this in the docs. Thanks!

--John

@janisz
Copy link
Contributor

janisz commented Apr 6, 2017

No problem. Created issue for updating the readme #221

janisz added a commit that referenced this issue Apr 6, 2017
In some cases consul agents cache could be empty.
This chagne allow providing consul agent hostname to
initially populate the cache.

Fixes: #191
janisz added a commit that referenced this issue Apr 7, 2017
In some cases consul agents cache could be empty.
This chagne allow providing consul agent hostname to
initially populate the cache.

Fixes: #191
janisz added a commit that referenced this issue Apr 10, 2017
In some cases consul agents cache could be empty.
This chagne allow providing consul agent hostname to
initially populate the cache.

Fixes: #191
@xueyi28
Copy link

xueyi28 commented Apr 26, 2017

  1. marathon app is running , it is have a label ["consul": ""]
  2. consul agent that run docker container must open 8500 port, and consul-marathon can connect to consul agent
  3. marathon app must have a health check
  4. local host must run a consul agent, it is use to force sync marathon app task info
    ####我英文不太行,我还是说中文吧
  5. marathon的应用必须在运行的时候增加consul:""的标签
  6. 运行marathon应用的机器上必须安装consul agent,并且打开8500端口,让consul-marathon可以连接调用它。
  7. marathon上的应用必须配置健康检查
  8. 本地主机上(就是运行consul-marathon的主机),他必须安装consul agent,因为consul-marathon的应用同步依赖这个,这是个很强悍的功能,同步意味着marathon的容错及时纠正

总的来说这并不是consul-marathon开发的问题,而是因为说明文档中没有详细的原理说明导致的。我有时间弄个图就清楚了

@janisz
Copy link
Contributor

janisz commented Apr 26, 2017

@xueyi28 我不会说中文
I can't speak Chinese. Could you please translate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants