Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash at shutdown when a check configuration is incorrectly reloaded #987

Closed
falzm opened this issue May 29, 2015 · 3 comments
Closed

Crash at shutdown when a check configuration is incorrectly reloaded #987

falzm opened this issue May 29, 2015 · 3 comments
Labels
type/bug Feature does not function as expected

Comments

@falzm
Copy link

falzm commented May 29, 2015

Hi

I was messing around with Consul when I found that the crashes at shutdown when a service reconfiguration is performed, and that service is erronous.

Example: at startup, the agent has one check defined in /etc/consul.d/check.json:

{
  "check": {
    "name": "my service",
    "id": "my service",
    "script": "/tmp/check_service.sh",
    "interval": "10s"
  }
}

The agent starts correctly, and starts executing its check:

==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Joining cluster...
    Join completed. Synced with 1 initial agents
==> Consul agent running!
         Node name: 'xxxxxx'
        Datacenter: 'dc1'
            Server: false (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
      Cluster Addr: 10.91.19.21 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2015/05/29 21:06:51 [INFO] serf: EventMemberJoin: xxxxxx 10.91.19.21
    2015/05/29 21:06:51 [INFO] serf: Attempting re-join to previously known node: xxxxxx: 10.91.19.20:8301
    2015/05/29 21:06:51 [INFO] agent: (LAN) joining: [xxxxxx]
    2015/05/29 21:06:51 [INFO] serf: EventMemberJoin: xxxxxx 10.91.19.20
    2015/05/29 21:06:51 [INFO] serf: Re-joined to previously known node: xxxxxx: 10.91.19.20:8301
    2015/05/29 21:06:51 [INFO] consul: adding server xxxxxx (Addr: 10.91.19.20:8300) (DC: dc1)
    2015/05/29 21:06:51 [INFO] agent: (LAN) joined: 1 Err: <nil>
    2015/05/29 21:06:51 [INFO] agent: Synced check 'my service'
    2015/05/29 21:06:53 [INFO] agent: Synced check 'my service'

I edit the check details as:

{
  "check": {
    "name": "my service",
    "id": "my service",
    "service_id": "my service",
    "script": "/tmp/check_service.sh",
    "interval": "10s"
  }
}

Then I refresh the agent by sending a HUP signal, but there seems to be an error so the agent deregisters the check:

==> Caught signal: hangup
==> Reloading configuration...
==> Failed reloading checks: Failed to register check 'my service': ServiceID "my service" does not exist &{my service my service  my serv
ice   {/tmp/check_service.sh  10s 0 0  }}
    2015/05/29 21:07:01 [INFO] agent: Deregistered check 'my service'

Then I shutdown the agent (either issuing a <ctrl+c> or sending a TERM signal):

==> Caught signal: terminated
    2015/05/29 21:07:07 [INFO] agent: requesting shutdown
    2015/05/29 21:07:07 [INFO] consul: shutting down client
    2015/05/29 21:07:07 [WARN] serf: Shutdown without a Leave
    2015/05/29 21:07:07 [INFO] agent: shutdown complete
    2015/05/29 20:54:30 [INFO] agent: requesting shutdown
    2015/05/29 20:54:30 [INFO] consul: shutting down client
    2015/05/29 20:54:30 [WARN] serf: Shutdown without a Leave
    2015/05/29 20:54:30 [INFO] agent: shutdown complete
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x1a9 pc=0x4a21ac]

goroutine 1 [running]:
github.com/hashicorp/consul/command/agent.(*Command).handleSignals(0xc2080480b0, 0x0, 0xc20807e060, 0xc20807e0c0, 0x1)
        /opt/gopath/src/github.com/hashicorp/consul/command/agent/command.go:763 +0x54c
github.com/hashicorp/consul/command/agent.(*Command).Run(0xc2080480b0, 0xc20800a020, 0x7, 0x7, 0x0)
        /opt/gopath/src/github.com/hashicorp/consul/command/agent/command.go:727 +0x24bc
github.com/mitchellh/cli.(*CLI).Run(0xc2080303c0, 0xc20802b8e0, 0x0, 0x0)
        /opt/gopath/src/github.com/mitchellh/cli/cli.go:100 +0x3ce
main.realMain(0x41325f)
        /opt/gopath/src/github.com/hashicorp/consul/main.go:40 +0x23b
main.main()
        /opt/gopath/src/github.com/hashicorp/consul/main.go:12 +0x1f

goroutine 5 [syscall]:
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:21 +0x1f
created by os/signal.init·1
        /usr/local/go/src/os/signal/signal_unix.go:27 +0x35

goroutine 10 [select, 19 minutes]:
github.com/armon/go-metrics.(*InmemSignal).run(0xc2080ac1c0)
        /opt/gopath/src/github.com/armon/go-metrics/inmem_signal.go:63 +0xf1
created by github.com/armon/go-metrics.NewInmemSignal
        /opt/gopath/src/github.com/armon/go-metrics/inmem_signal.go:37 +0x258

goroutine 11 [sleep]:
github.com/armon/go-metrics.(*Metrics).collectStats(0xc208042f50)
        /opt/gopath/src/github.com/armon/go-metrics/metrics.go:67 +0x2c
created by github.com/armon/go-metrics.New
        /opt/gopath/src/github.com/armon/go-metrics/start.go:61 +0xe8

goroutine 13 [runnable]:
github.com/hashicorp/consul/consul.(*ConnPool).reap(0xc208043270)
        /opt/gopath/src/github.com/hashicorp/consul/consul/pool.go:367 +0x4bb
created by github.com/hashicorp/consul/consul.NewPool
        /opt/gopath/src/github.com/hashicorp/consul/consul/pool.go:160 +0x165

goroutine 29 [runnable]:
github.com/hashicorp/consul/command/agent.(*Agent).handleEvents(0xc208045180)
        /opt/gopath/src/github.com/hashicorp/consul/command/agent/user_event.go:114 +0x2cc
created by github.com/hashicorp/consul/command/agent.Create
        /opt/gopath/src/github.com/hashicorp/consul/command/agent/agent.go:184 +0xba2

goroutine 385 [IO wait]:
net.(*pollDesc).Wait(0xc2082e4450, 0x72, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc2082e4450, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).Read(0xc2082e43f0, 0xc20824e000, 0x2, 0xffff, 0x0, 0x7fdb18f8ce18, 0xc208182350)
        /usr/local/go/src/net/fd_unix.go:242 +0x40f
net.(*conn).Read(0xc20803a130, 0xc20824e000, 0x2, 0xffff, 0x41ff01, 0x0, 0x0)
        /usr/local/go/src/net/net.go:121 +0xdc
github.com/miekg/dns.(*Conn).Read(0xc20830c460, 0xc20824e000, 0xffff, 0xffff, 0xffff, 0x0, 0x0)
        /opt/gopath/src/github.com/miekg/dns/client.go:231 +0x150
github.com/miekg/dns.(*Conn).ReadMsg(0xc20830c460, 0xc20800b3b0, 0x0, 0x0)
        /opt/gopath/src/github.com/miekg/dns/client.go:203 +0x128
github.com/miekg/dns.(*Client).exchange(0xc20830c410, 0xc20800b3b0, 0xc208182290, 0xe, 0x0, 0x0, 0x0, 0x0)
        /opt/gopath/src/github.com/miekg/dns/client.go:181 +0x362
github.com/miekg/dns.(*Client).Exchange(0xc20830c410, 0xc20800b3b0, 0xc208182290, 0xe, 0xffffffffffffff01, 0x425df9, 0x0, 0x0)
        /opt/gopath/src/github.com/miekg/dns/client.go:108 +0x83
created by github.com/miekg/dns.(*Server).Shutdown
        /opt/gopath/src/github.com/miekg/dns/server.go:357 +0x4cd

goroutine 35 [select, 19 minutes]:
github.com/hashicorp/go-checkpoint.func·001()
        /opt/gopath/src/github.com/hashicorp/go-checkpoint/checkpoint.go:188 +0x154
created by github.com/hashicorp/go-checkpoint.CheckInterval
        /opt/gopath/src/github.com/hashicorp/go-checkpoint/checkpoint.go:196 +0x1d5

goroutine 49 [syscall, 19 minutes, locked to thread]:
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:2232 +0x1

goroutine 38 [runnable]:
github.com/hashicorp/consul/command/agent.(*localState).antiEntropy(0xc2080451b0, 0xc208076780)
        /opt/gopath/src/github.com/hashicorp/consul/command/agent/local.go:323 +0x33d
created by github.com/hashicorp/consul/command/agent.(*Agent).StartSync
        /opt/gopath/src/github.com/hashicorp/consul/command/agent/agent.go:523 +0x4a

goroutine 383 [IO wait]:
net.(*pollDesc).Wait(0xc2082e4300, 0x72, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc2082e4300, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).Read(0xc2082e42a0, 0xc2080d2800, 0x200, 0x200, 0x0, 0x7fdb18f8ce18, 0xc208182268)
        /usr/local/go/src/net/fd_unix.go:242 +0x40f
net.(*conn).Read(0xc20803a050, 0xc2080d2800, 0x200, 0x200, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/net.go:121 +0xdc
github.com/miekg/dns.(*Conn).Read(0xc20830c370, 0xc2080d2800, 0x200, 0x200, 0x200, 0x0, 0x0)
        /opt/gopath/src/github.com/miekg/dns/client.go:258 +0x4a4
github.com/miekg/dns.(*Conn).ReadMsg(0xc20830c370, 0xc20800a2d0, 0x0, 0x0)
        /opt/gopath/src/github.com/miekg/dns/client.go:203 +0x128
github.com/miekg/dns.(*Client).exchange(0xc20830c320, 0xc20800a2d0, 0xc2081821e0, 0xe, 0x0, 0x0, 0x0, 0x0)
        /opt/gopath/src/github.com/miekg/dns/client.go:181 +0x362
github.com/miekg/dns.(*Client).Exchange(0xc20830c320, 0xc20800a2d0, 0xc2081821e0, 0xe, 0x1, 0x425df9, 0x0, 0x0)
        /opt/gopath/src/github.com/miekg/dns/client.go:108 +0x83
created by github.com/miekg/dns.(*Server).Shutdown
        /opt/gopath/src/github.com/miekg/dns/server.go:357 +0x4cd

This crash is reproducible every time (version 0.5.2 on Linux amd64).

BTW, I stumbled upon this issue trying to use "service-bound checks" as described in the documentation, but I don't really get what I'm doing wrong, when I look at the example... Any help would be much appreciated :)

@ryanuber
Copy link
Member

@falzm I've reproduced this, thanks for the report! The problem is definitely that the service ID is specified improperly in the first invocation of Consul. If the service_id field is not specified, then the check is assumed to be a node-level check, so the check registers correctly. In order to fix this, you should first register a service definition using configuration files or by using the API. See the "Service Definitions" page in the Consul Agent documentation here for details.

This is definitely a bug though - Any panic in Consul is a bug. I'll take a look at this.

@ryanuber ryanuber added the type/bug Feature does not function as expected label May 31, 2015
@ryanuber
Copy link
Member

The panic is fixed in 5226e29!

@falzm
Copy link
Author

falzm commented May 31, 2015

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

2 participants