Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.8.0 short-time till "leave" leads to EOF #2880

Closed
ilovezfs opened this issue Apr 8, 2017 · 41 comments
Closed

0.8.0 short-time till "leave" leads to EOF #2880

ilovezfs opened this issue Apr 8, 2017 · 41 comments
Assignees
Labels
theme/api Relating to the HTTP API interface type/bug Feature does not function as expected

Comments

@ilovezfs
Copy link

ilovezfs commented Apr 8, 2017

The following issue did not affect 0.7.5, but it does affect 0.8.0 and HEAD.

The error is "Error leaving: Put http://127.0.0.1:8500/v1/agent/leave: EOF"

==> Starting Consul agent...
==> Consul agent running!
           Version: 'v0.8.0-25-g3ef7dde6-dev (3ef7dde6+CHANGES)'
           Node ID: '0badd3ca-3553-380f-8a09-3b02ba961b45'
         Node name: 'iMac-TMP.local'
        Datacenter: 'dc1'
            Server: false (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 10.0.1.15 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2017/04/08 07:48:27 [INFO] serf: EventMemberJoin: iMac-TMP.local 10.0.1.15
    2017/04/08 07:48:27 [WARN] manager: No servers available
    2017/04/08 07:48:27 [ERR] agent: failed to sync remote state: No known Consul servers
==> /usr/local/Cellar/consul/HEAD-3ef7dde/bin/consul leave
    2017/04/08 07:48:28 [INFO] consul: client starting leave
    2017/04/08 07:48:28 [INFO] serf: EventMemberLeave: iMac-TMP.local 10.0.1.15
    2017/04/08 07:48:28 [INFO] agent: requesting shutdown
    2017/04/08 07:48:28 [INFO] consul: shutting down client
    2017/04/08 07:48:28 [INFO] manager: shutting down
    2017/04/08 07:48:28 [INFO] agent: shutdown complete
Error leaving: Put http://127.0.0.1:8500/v1/agent/leave: EOF

If I crank the sleep up to 30 seconds in the test before running consul leave, then it exits gracefully, as follows:

iMac-TMP:dir joe$ brew test consul -vd
/usr/local/Homebrew/Library/Homebrew/brew.rb (Formulary::FormulaLoader): loading /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/consul.rb
Testing consul
==> Using the sandbox
/usr/bin/sandbox-exec -f /tmp/homebrew20170408-30654-1xf5lmd.sb /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/bin/ruby -W0 -I /usr/local/Homebrew/Library/Homebrew -- /usr/local/Homebrew/Library/Homebrew/test.rb /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/consul.rb -vd --HEAD
/usr/local/Homebrew/Library/Homebrew/test.rb (Formulary::FromPathLoader): loading /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/consul.rb
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v0.8.0-25-g3ef7dde6-dev (3ef7dde6+CHANGES)'
           Node ID: '0badd3ca-3553-380f-8a09-3b02ba961b45'
         Node name: 'iMac-TMP.local'
        Datacenter: 'dc1'
            Server: false (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 10.0.1.15 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2017/04/08 07:49:26 [INFO] serf: EventMemberJoin: iMac-TMP.local 10.0.1.15
    2017/04/08 07:49:26 [WARN] manager: No servers available
    2017/04/08 07:49:26 [ERR] agent: failed to sync remote state: No known Consul servers
    2017/04/08 07:49:45 [WARN] manager: No servers available
    2017/04/08 07:49:45 [ERR] agent: failed to sync remote state: No known Consul servers
==> /usr/local/Cellar/consul/HEAD-3ef7dde/bin/consul leave
    2017/04/08 07:49:56 [INFO] consul: client starting leave
    2017/04/08 07:49:56 [INFO] serf: EventMemberLeave: iMac-TMP.local 10.0.1.15
    2017/04/08 07:49:56 [INFO] agent: requesting shutdown
    2017/04/08 07:49:56 [INFO] consul: shutting down client
    2017/04/08 07:49:56 [INFO] manager: shutting down
    2017/04/08 07:49:56 [INFO] agent: shutdown complete
Graceful leave complete
@ilovezfs
Copy link
Author

ilovezfs commented Apr 8, 2017

Note the test is just

  test do
    fork do
      exec "#{bin}/consul", "agent", "-data-dir", "."
    end
    sleep 3
    system "#{bin}/consul", "leave"
  end

@slackpad slackpad modified the milestones: 0.8.1, 0.8.2 Apr 12, 2017
@slackpad slackpad removed this from the 0.8.2 milestone Apr 25, 2017
@ilovezfs
Copy link
Author

Gentle ping on this.

@slackpad slackpad added the type/bug Feature does not function as expected label May 25, 2017
@slackpad
Copy link
Contributor

We are probably closing down the web server too soon - this is a bit of a race condition since the agent is shutting down, so we need a clean way to keep the agent up until the OK is sent back from the leave request.

@slackpad slackpad added the theme/api Relating to the HTTP API interface label May 25, 2017
@magiconair
Copy link
Contributor

This might already be in #3037

consul/command/agent/agent.go

Lines 1029 to 1046 in 92fb316

for _, srv := range a.httpServers {
// http server is HTTPS if TLSConfig is not nil and NextProtos does not only contain "h2"
// the latter seems to be a side effect of HTTP/2 support in go 1.8. TLSConfig != nil is
// no longer sufficient to check for an HTTPS server.
if srv.proto == "https" {
a.logger.Println("[INFO] agent: Stopping HTTPS server", srv.Addr)
} else {
a.logger.Println("[INFO] agent: Stopping HTTP server", srv.Addr)
}
// old behavior: just die
// srv.Close()
// graceful shutdown
ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
defer cancel()
srv.Shutdown(ctx)
}

@ilovezfs
Copy link
Author

Gentle ping on this.

@magiconair
Copy link
Contributor

This one is a bit of a challenge. I've recently changed the code so that we shut down the external endpoints down before we shutdown the internal ones. since you are triggering a consul leave via the HTTP endpoint is being shutdown before the server itself is down. I've tried to solve this but I need to take a closer look.

magiconair added a commit that referenced this issue Jun 19, 2017
When the agent is triggered to shutdown via an external 'consul leave'
command delivered via the HTTP API then the client expects to receive a
response when the agent is down. This creates a race on when to shutdown
the agent itself like the RPC server, the checks and the state and the
external endpoints like DNS and HTTP. Ideally, the external endpoints
should be shutdown before the internal state but if the goal is to
respond reliably that the agent is down then this is not possible.

This patch splits the agent shutdown into two parts implemented in a
single method to keep it simple and unambiguos for the caller. The first
stage shuts down the internal state, checks, RPC server, ...
synchronously and then triggers the shutdown of the external endpoints
asychronously. This way the caller is guaranteed that the internal state
services are down when Shutdown returns and there remains enough time to
send a response.

Fixes #2880
magiconair added a commit that referenced this issue Jun 19, 2017
When the agent is triggered to shutdown via an external 'consul leave'
command delivered via the HTTP API then the client expects to receive a
response when the agent is down. This creates a race on when to shutdown
the agent itself like the RPC server, the checks and the state and the
external endpoints like DNS and HTTP. Ideally, the external endpoints
should be shutdown before the internal state but if the goal is to
respond reliably that the agent is down then this is not possible.

This patch splits the agent shutdown into two parts implemented in a
single method to keep it simple and unambiguos for the caller. The first
stage shuts down the internal state, checks, RPC server, ...
synchronously and then triggers the shutdown of the external endpoints
asychronously. This way the caller is guaranteed that the internal state
services are down when Shutdown returns and there remains enough time to
send a response.

Fixes #2880
@magiconair
Copy link
Contributor

@ilovezfs could you check whether this fixes it for you please?

@magiconair magiconair self-assigned this Jun 19, 2017
magiconair added a commit that referenced this issue Jun 19, 2017
@magiconair
Copy link
Contributor

pls wait. I was too quick

@ilovezfs
Copy link
Author

Same thing even if they're in separate windows.

@magiconair
Copy link
Contributor

not for me:

Tue Jun 20 11:45 frank@192-168-100-100 $ consul agent -data-dir . -bind 127.0.0.1
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v0.8.4-27-g35ccc755-dev (35ccc755)'
           Node ID: 'cb1f6030-a220-4f92-57dc-7baaabdc3823'
         Node name: '192-168-100-100.local'
        Datacenter: 'dc1'
            Server: false (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

    2017/06/20 11:45:33 [INFO] serf: EventMemberJoin: 192-168-100-100.local 127.0.0.1
    2017/06/20 11:45:33 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp)
    2017/06/20 11:45:33 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp)
    2017/06/20 11:45:33 [INFO] agent: Started HTTP server on 127.0.0.1:8500
    2017/06/20 11:45:33 [WARN] manager: No servers available
    2017/06/20 11:45:33 [ERR] agent: failed to sync remote state: No known Consul servers
^Z
[1]+  Stopped                 consul agent -data-dir . -bind 127.0.0.1
[~/temp/bla]
Tue Jun 20 11:45 frank@192-168-100-100 $ bg
[1]+ consul agent -data-dir . -bind 127.0.0.1 &
[~/temp/bla]
Tue Jun 20 11:45 frank@192-168-100-100 $ consul leave
    2017/06/20 11:45:41 [INFO] consul: client starting leave
    2017/06/20 11:45:41 [INFO] serf: EventMemberLeave: 192-168-100-100.local 127.0.0.1
    2017/06/20 11:45:41 [INFO] agent: Requesting shutdown
    2017/06/20 11:45:41 [INFO] consul: shutting down client
    2017/06/20 11:45:41 [INFO] manager: shutting down
    2017/06/20 11:45:41 [INFO] agent: consul client down
    2017/06/20 11:45:41 [INFO] agent: shutdown complete
    2017/06/20 11:45:41 [INFO] agent: Stopping DNS server 127.0.0.1:8600 (tcp)
    2017/06/20 11:45:41 [INFO] agent: Stopping DNS server 127.0.0.1:8600 (udp)
    2017/06/20 11:45:41 [INFO] agent: Stopping HTTP server 127.0.0.1:8500
    2017/06/20 11:45:41 [INFO] agent: Waiting for endpoints to shut down
    2017/06/20 11:45:41 [INFO] agent: Endpoints down
    2017/06/20 11:45:41 [INFO] Exit code:  0
Graceful leave complete
[1]+  Done                    consul agent -data-dir . -bind 127.0.0.1
[~/temp/bla]

@ilovezfs
Copy link
Author

ilovezfs commented Jun 20, 2017

The window where I started consul agent -data-dir . is left hanging after I run consul leave in the other window if it was running in the background.

bash-3.2$ consul agent -data-dir .
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v0.8.4-26-gb083ce1-dev (b083ce1+CHANGES)'
           Node ID: '1b3b6489-6c47-0a5e-c8cf-60463cfdfbd6'
         Node name: 'iMac-TMP.local'
        Datacenter: 'dc1'
            Server: false (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 10.0.1.15 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

    2017/06/20 02:47:26 [INFO] serf: EventMemberJoin: iMac-TMP.local 10.0.1.15
    2017/06/20 02:47:26 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp)    2017/06/20 02:47:26 [ERR] agent: failed to sync remote state: No known Consul servers
    2017/06/20 02:47:26 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp)

    2017/06/20 02:47:26 [INFO] agent: Started HTTP server on 127.0.0.1:8500
    2017/06/20 02:47:26 [WARN] manager: No servers available
^Z
[1]+  Stopped                 consul agent -data-dir .
bash-3.2$ bg
[1]+ consul agent -data-dir . &
bash-3.2$     2017/06/20 02:47:31 [INFO] consul: client starting leave
    2017/06/20 02:47:31 [INFO] serf: EventMemberLeave: iMac-TMP.local 10.0.1.15
    2017/06/20 02:47:31 [INFO] agent: Requesting shutdown
    2017/06/20 02:47:31 [INFO] consul: shutting down client
    2017/06/20 02:47:31 [INFO] manager: shutting down
    2017/06/20 02:47:31 [INFO] agent: consul client down
    2017/06/20 02:47:31 [INFO] agent: shutdown complete
    2017/06/20 02:47:31 [INFO] agent: Stopping DNS server 127.0.0.1:8600 (tcp)
    2017/06/20 02:47:31 [INFO] agent: Stopping DNS server 127.0.0.1:8600 (udp)
    2017/06/20 02:47:31 [INFO] agent: Stopping HTTP server 127.0.0.1:8500
    2017/06/20 02:47:31 [INFO] agent: Waiting for endpoints to shut down
    2017/06/20 02:47:31 [INFO] agent: Endpoints down
    2017/06/20 02:47:31 [INFO] Exit code:  0

@magiconair
Copy link
Contributor

Asking colleagues to verify. My working hypothesis is that this is on your machine. What is your shell?

@magiconair
Copy link
Contributor

🤦‍♂️ bash-3.2 ...

@ilovezfs
Copy link
Author

Tried with bash 3.2 and 4.4.

@magiconair
Copy link
Contributor

Colleagues can't repro as well. Which macOS version are you using?

@ilovezfs
Copy link
Author

10.11. Going to try on a 10.12 box ....

@magiconair
Copy link
Contributor

I'm on 10.12.5

@magiconair
Copy link
Contributor

ruby version?

@ilovezfs
Copy link
Author

ilovezfs commented Jun 20, 2017

Same deal on 10.12:

bash-3.2$ mkdir /tmp/consultest
bash-3.2$ cd /tmp/consultest
bash-3.2$ consul agent -data-dir .
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v0.8.4-26-gb083ce17-dev (b083ce17+CHANGES)'
           Node ID: 'cb1f6030-a220-4f92-57dc-7baaabdc3823'
         Node name: 'Josephs-MacBook-Pro.local'
        Datacenter: 'dc1'
            Server: false (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 10.0.1.13 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

    2017/06/20 03:01:59 [INFO] serf: EventMemberJoin: Josephs-MacBook-Pro.local 10.0.1.13
    2017/06/20 03:01:59 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp)
    2017/06/20 03:01:59 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp)
    2017/06/20 03:01:59 [INFO] agent: Started HTTP server on 127.0.0.1:8500
    2017/06/20 03:01:59 [WARN] manager: No servers available
    2017/06/20 03:01:59 [ERR] agent: failed to sync remote state: No known Consul servers
^Z
[1]+  Stopped                 consul agent -data-dir .
bash-3.2$ bg
[1]+ consul agent -data-dir . &
bash-3.2$ consul leave
    2017/06/20 03:02:05 [INFO] consul: client starting leave
    2017/06/20 03:02:05 [INFO] serf: EventMemberLeave: Josephs-MacBook-Pro.local 10.0.1.13
    2017/06/20 03:02:05 [INFO] agent: Requesting shutdown
    2017/06/20 03:02:05 [INFO] consul: shutting down client
    2017/06/20 03:02:05 [INFO] manager: shutting down
    2017/06/20 03:02:05 [INFO] agent: consul client down
    2017/06/20 03:02:05 [INFO] agent: shutdown complete
    2017/06/20 03:02:05 [INFO] agent: Stopping DNS server 127.0.0.1:8600 (tcp)
    2017/06/20 03:02:05 [INFO] agent: Stopping DNS server 127.0.0.1:8600 (udp)
    2017/06/20 03:02:05 [INFO] agent: Stopping HTTP server 127.0.0.1:8500
Graceful leave complete
bash-3.2$     2017/06/20 03:02:05 [INFO] agent: Waiting for endpoints to shut down
    2017/06/20 03:02:05 [INFO] agent: Endpoints down
    2017/06/20 03:02:05 [INFO] Exit code:  0

@magiconair
Copy link
Contributor

Just for grins. Can you test with consul agent -dev ?

@ilovezfs
Copy link
Author

Note this is b083ce1 + #3163

@ilovezfs
Copy link
Author

bash-3.2$ consul agent -dev
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v0.8.4-26-gb083ce17-dev (b083ce17+CHANGES)'
           Node ID: 'cb1f6030-a220-4f92-57dc-7baaabdc3823'
         Node name: 'Josephs-MacBook-Pro.local'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

    2017/06/20 03:06:25 [DEBUG] Using unique ID "cb1f6030-a220-4f92-57dc-7baaabdc3823" from host as node ID
    2017/06/20 03:06:25 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:127.0.0.1:8300 Address:127.0.0.1:8300}]
    2017/06/20 03:06:25 [INFO] raft: Node at 127.0.0.1:8300 [Follower] entering Follower state (Leader: "")
    2017/06/20 03:06:25 [INFO] serf: EventMemberJoin: Josephs-MacBook-Pro.local 127.0.0.1
    2017/06/20 03:06:25 [INFO] consul: Adding LAN server Josephs-MacBook-Pro.local (Addr: tcp/127.0.0.1:8300) (DC: dc1)
    2017/06/20 03:06:25 [INFO] serf: EventMemberJoin: Josephs-MacBook-Pro.local.dc1 127.0.0.1
    2017/06/20 03:06:25 [INFO] consul: Handled member-join event for server "Josephs-MacBook-Pro.local.dc1" in area "wan"
    2017/06/20 03:06:25 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp)
    2017/06/20 03:06:25 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp)
    2017/06/20 03:06:25 [INFO] agent: Started HTTP server on 127.0.0.1:8500
    2017/06/20 03:06:25 [WARN] raft: Heartbeat timeout from "" reached, starting election
    2017/06/20 03:06:25 [INFO] raft: Node at 127.0.0.1:8300 [Candidate] entering Candidate state in term 2
    2017/06/20 03:06:25 [DEBUG] raft: Votes needed: 1
    2017/06/20 03:06:25 [DEBUG] raft: Vote granted from 127.0.0.1:8300 in term 2. Tally: 1
    2017/06/20 03:06:25 [INFO] raft: Election won. Tally: 1
    2017/06/20 03:06:25 [INFO] raft: Node at 127.0.0.1:8300 [Leader] entering Leader state
    2017/06/20 03:06:25 [INFO] consul: cluster leadership acquired
    2017/06/20 03:06:25 [INFO] consul: New leader elected: Josephs-MacBook-Pro.local
    2017/06/20 03:06:25 [DEBUG] consul: reset tombstone GC to index 3
    2017/06/20 03:06:25 [INFO] consul: member 'Josephs-MacBook-Pro.local' joined, marking health alive
    2017/06/20 03:06:25 [INFO] agent: Synced service 'consul'
    2017/06/20 03:06:25 [DEBUG] agent: Node info in sync
^Z
[1]+  Stopped                 consul agent -dev
bash-3.2$ bg
[1]+ consul agent -dev &
bash-3.2$ consul leave
    2017/06/20 03:06:34 [INFO] consul: server starting leave
    2017/06/20 03:06:34 [INFO] serf: EventMemberLeave: Josephs-MacBook-Pro.local.dc1 127.0.0.1
    2017/06/20 03:06:34 [INFO] serf: EventMemberLeave: Josephs-MacBook-Pro.local 127.0.0.1
    2017/06/20 03:06:34 [INFO] agent: Requesting shutdown
    2017/06/20 03:06:34 [INFO] consul: shutting down server
    2017/06/20 03:06:34 [INFO] consul: Handled member-leave event for server "Josephs-MacBook-Pro.local.dc1" in area "wan"
    2017/06/20 03:06:34 [INFO] manager: shutting down
    2017/06/20 03:06:34 [INFO] agent: consul server down
    2017/06/20 03:06:34 [INFO] agent: shutdown complete
    2017/06/20 03:06:34 [INFO] agent: Stopping DNS server 127.0.0.1:8600 (tcp)
    2017/06/20 03:06:34 [DEBUG] http: Request PUT /v1/agent/leave (396.429µs) from=127.0.0.1:55132
    2017/06/20 03:06:34 [INFO] agent: Stopping DNS server 127.0.0.1:8600 (udp)
    2017/06/20 03:06:34 [INFO] agent: Stopping HTTP server 127.0.0.1:8500
Graceful leave complete
bash-3.2$     2017/06/20 03:06:34 [INFO] agent: Waiting for endpoints to shut down
    2017/06/20 03:06:34 [INFO] agent: Endpoints down
    2017/06/20 03:06:34 [INFO] Exit code:  0

@magiconair
Copy link
Contributor

Can you DM me on Twitter?

@ilovezfs
Copy link
Author

Are you on IRC?

@magiconair
Copy link
Contributor

not really. Would prefer something with audio, FaceTime, Skype.

@magiconair
Copy link
Contributor

skype doesn't work for me. If you send me an email to frank at hashicorp.com then I'll send you a link.

@magiconair
Copy link
Contributor

@ilovezfs ?

@magiconair
Copy link
Contributor

I'm off to lunch now. Back in 30 min.

@ilovezfs
Copy link
Author

It seems to be non-deterministic. About a third of the time I see the behavior you're describing.

@magiconair
Copy link
Contributor

OK, then we'll leave it as is. I'll ask for some more feedback internally.

magiconair added a commit that referenced this issue Jun 21, 2017
When the agent is triggered to shutdown via an external 'consul leave'
command delivered via the HTTP API then the client expects to receive a
response when the agent is down. This creates a race on when to shutdown
the agent itself like the RPC server, the checks and the state and the
external endpoints like DNS and HTTP.

This patch splits the shutdown process into two parts:

 * shutdown the agent
 * shutdown the endpoints (http and dns)

They can be executed multiple times, concurrently and in any order but
should be executed first agent, then endpoints to provide consistent
behavior across all use cases. Both calls have to be executed for a
proper shutdown.

This could be partially hidden in a single function but would introduce
some magic that happens behind the scenes which one has to know of but
isn't obvious.

Fixes #2880
magiconair added a commit that referenced this issue Jun 21, 2017
When the agent is triggered to shutdown via an external 'consul leave'
command delivered via the HTTP API then the client expects to receive a
response when the agent is down. This creates a race on when to shutdown
the agent itself like the RPC server, the checks and the state and the
external endpoints like DNS and HTTP.

This patch splits the shutdown process into two parts:

 * shutdown the agent
 * shutdown the endpoints (http and dns)

They can be executed multiple times, concurrently and in any order but
should be executed first agent, then endpoints to provide consistent
behavior across all use cases. Both calls have to be executed for a
proper shutdown.

This could be partially hidden in a single function but would introduce
some magic that happens behind the scenes which one has to know of but
isn't obvious.

Fixes #2880
@magiconair
Copy link
Contributor

@ilovezfs our current assumption is that somewhere in the logging path we're buffering something. I'll keep looking.

@ilovezfs
Copy link
Author

@magiconair no problem. I am very happy the primary issue here was addressed because it will fix our CI for the fabio formula whenever your next release is published.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/api Relating to the HTTP API interface type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

3 participants