Consul exec
silently fails to wait for command to complete
#2757
Labels
type/bug
Feature does not function as expected
consul version
for both Client and ServerClient:
v0.6.4
Server:
v0.6.4
consul info
for both Client and ServerN/A.
Operating system and Environment details
Ubuntu 14.04
Description of the Issue (and unexpected/desired result)
We use
consul exec
to run jobs that sometimes take a long time (hours) to complete. We expectconsul exec
to block until this execution has completed on all nodes. This usually works as expected.I wintessed one incident where a network hiccup caused Consul to lose quorum and elect a new leader while
consul exec
was in progress. (The flood of logs related to this event can be shared on request.)Then
consul exec
emitted the following messages:and returned an exit status of 0.
Looking at the code, I see a plausible explanation. We hit the timeout branch in the big
for
loop. This lead to abreak
statement to end the loop. At the very end, the logic checks to see if any command had returned a non-zero exit status. But no commands had completed so there were no exit statuses, zero or otherwise.I think that function should have another
if
branch to return a non-zero status ifexitCount < ackCount
.Reproduction steps
N/A
Log Fragments or Link to gist
More details logs available on request.
The text was updated successfully, but these errors were encountered: