You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This isn't a big problem for me because my retry logic will quickly recover from it, but I figured I'd file an issue. I have a service that comes up, registers itself with a health check, and then tries to acquire a lock using the serfHealth check along with its service check for the session. I've noticed that sometimes the first attempt(s) to create a session fail with a "missing check registration" error. Anecdotally from the logs, the error seems to happen when the session creation occurs before the agent (service/health) sync. After the agent sync everything works smoothly.
Sample logs:
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 'localhost.localdomain'
Datacenter: 'dc1'
Server: true (bootstrap: true)
Client Addr: 127.0.0.1 (HTTP: 8500, DNS: 8600, RPC: 8400)
Cluster Addr: 10.0.2.15 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
==> Log data will now stream in as it occurs:
2014/06/25 22:03:15 [INFO] serf: EventMemberJoin: localhost.localdomain 10.0.2.15
2014/06/25 22:03:15 [INFO] serf: EventMemberJoin: localhost.localdomain.dc1 10.0.2.15
2014/06/25 22:03:15 [INFO] raft: Node at 10.0.2.15:8300 [Follower] entering Follower state
2014/06/25 22:03:15 [INFO] consul: adding server localhost.localdomain (Addr: 10.0.2.15:8300) (DC: dc1)
2014/06/25 22:03:15 [INFO] consul: adding server localhost.localdomain.dc1 (Addr: 10.0.2.15:8300) (DC: dc1)
2014/06/25 22:03:15 [DEBUG] dns: request for {_test.consul. 255 1} (94.705us)
2014/06/25 22:03:15 [DEBUG] dns: request for {_test.consul. 255 1} (141.358us)
2014/06/25 22:03:15 [ERR] agent: failed to sync remote state: No cluster leader
2014/06/25 22:03:17 [WARN] raft: Heartbeat timeout reached, starting election
2014/06/25 22:03:17 [INFO] raft: Node at 10.0.2.15:8300 [Candidate] entering Candidate state
2014/06/25 22:03:17 [DEBUG] raft: Votes needed: 1
2014/06/25 22:03:17 [DEBUG] raft: Vote granted. Tally: 1
2014/06/25 22:03:17 [INFO] raft: Election won. Tally: 1
2014/06/25 22:03:17 [INFO] raft: Node at 10.0.2.15:8300 [Leader] entering Leader state
2014/06/25 22:03:17 [INFO] consul: cluster leadership acquired
2014/06/25 22:03:17 [INFO] consul: New leader elected: localhost.localdomain
2014/06/25 22:03:17 [INFO] consul: member 'localhost.localdomain' joined, marking health alive
2014/06/25 22:03:18 [DEBUG] http: Request /v1/agent/self (237.181us)
2014/06/25 22:03:18 [DEBUG] http: Request /v1/agent/service/register (127.111us)
2014/06/25 22:03:18 [DEBUG] Check 'service:redacted-1' status is now passing
2014/06/25 22:03:18 [DEBUG] http: Request /v1/agent/check/pass/service:redacted-1 (108.952us)
2014/06/25 22:03:18 [DEBUG] http: Request /v1/catalog/service/redacted (202.819us)
2014/06/25 22:03:18 [ERR] http: Request /v1/session/create, error: Missing check 'service:redacted-1' registration
2014/06/25 22:03:18 [DEBUG] http: Request /v1/session/create (1.520863ms)
2014/06/25 22:03:18 [DEBUG] http: Request /v1/catalog/service/redacted?index=3 (32.946089ms)
2014/06/25 22:03:18 [INFO] agent: Synced service 'redacted-1'
2014/06/25 22:03:18 [DEBUG] http: Request /v1/catalog/service/redacted?index=5 (1.052369ms)
2014/06/25 22:03:18 [INFO] agent: Synced check 'service:redacted-1'
2014/06/25 22:03:19 [DEBUG] http: Request /v1/session/create (2.277945ms)
2014/06/25 22:03:19 [DEBUG] http: Request /v1/kv/service/redacted/leader?acquire=433db3cc-d442-6ef9-536f-08dc1882f148 (1.204903ms)
2014/06/25 22:03:19 [DEBUG] http: Request /v1/kv/service/redacted/leader (103.923us)
The text was updated successfully, but these errors were encountered:
Thanks for reporting this, I've run into it myself. I think the issue we have is that the /v1/agent endpoint's simply update the local state, and kick off the anti-entropy run. However since we respond to the client before this is finished, there is a race. I think we need to change the /v1/agent endpoints to block until the sync is done (or at least attempted, since it may fail).
This isn't a big problem for me because my retry logic will quickly recover from it, but I figured I'd file an issue. I have a service that comes up, registers itself with a health check, and then tries to acquire a lock using the serfHealth check along with its service check for the session. I've noticed that sometimes the first attempt(s) to create a session fail with a "missing check registration" error. Anecdotally from the logs, the error seems to happen when the session creation occurs before the agent (service/health) sync. After the agent sync everything works smoothly.
Sample logs:
The text was updated successfully, but these errors were encountered: