Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Windows] Consul 0.6.4 loses Leadership if raft database is up to the truncation failed due to the BoltDB mapping issue #2354

Closed
sitano opened this issue Sep 22, 2016 · 1 comment

Comments

@sitano
Copy link

sitano commented Sep 22, 2016

consul version for both Client and Server

Consul v0.6.4
Consul Protocol: 3 (Understands back to: 1)

consul info for both Client and Server

agent:
        check_monitors = 6
        check_ttls = 0
        checks = 6
        services = 2
build:
        prerelease =
        revision = 26a0ef8c
        version = 0.6.4
consul:
        bootstrap = true
        known_datacenters = ...
        leader = false
        server = true
raft:
        applied_index = 156201
        commit_index = 156201
        fsm_pending = 0
        last_contact = 2h50m17.6203775s
        last_log_index = 156201
        last_log_term = 2
        last_snapshot_index = 156190
        last_snapshot_term = 2
        num_peers = 0
        state = Follower
        term = 2
runtime:
        arch = amd64
        cpu_count = 8
        goroutines = 98
        max_procs = 2
        os = windows
        version = go1.6
serf_lan:
        encrypted = true
        event_queue = 0
        event_time = 6
        failed = 0
        intent_queue = 0
        left = 0
        member_time = 17
        members = 4
        query_queue = 0
        query_time = 1
serf_wan:
        encrypted = true
        event_queue = 0
        event_time = 2
        failed = 1
        intent_queue = 0
        left = 0
        member_time = 215401
        members = ...
        query_queue = 0
        query_time = 1               

Operating system and Environment details

PS> [System.Environment]::OSVersion

Platform ServicePack Version    VersionString
-------- ----------- -------    -------------
 Win32NT             6.3.9600.0 Microsoft Windows NT 6.3.9600.0

Description of the Issue (and unexpected/desired result)

When raft BoltDB database reaches out 32 megabytes of a size, engine tries to proceed truncation.
Truncation failed due to the bug in BoltDB <?= 1.2.0.
Right after that, Consul loses Leadership and never got reelected since then.

Reproduction steps

  1. Start consul 0.6.4 cluster on Windows
  2. Create raft db to be 32 megs or a little less
  3. When log will reach 32 megs, pray it will fail to be truncated

Log Fragments

    2016/09/22 04:58:43 [INFO] consul.fsm: snapshot created in 0
    2016/09/22 04:58:43 [INFO] raft: Starting snapshot up to 156190
    2016/09/22 04:58:43 [INFO] snapshot: Creating new snapshot at C:\Tools\Consul\data\raft\snapshots\2-156190-1474520323417.tmp
    2016/09/22 04:58:43 [INFO] snapshot: reaping snapshot C:\Tools\Consul\data\raft\snapshots\1-139749-1474459801156
    2016/09/22 04:58:43 [INFO] raft: Compacting logs from 137716 to 145950
    2016/09/22 04:58:43 [ERR] raft: Failed to take snapshot: log compaction failed: file resize error: truncate C:\Tools\Consul\data\raft\raft.db: The requested operation cannot be performed on a file with a user-mapped section open.
    2016/09/22 04:59:29 [ERR] raft: Failed to commit logs: file resize error: truncate C:\Tools\Consul\data\raft\raft.db: The requested operation cannot be performed on a file with a user-mapped section open.
    2016/09/22 04:59:29 [INFO] raft: Node at ...:8300 [Follower] entering Follower state
    2016/09/22 04:59:29 [INFO] consul: cluster leadership lost
    2016/09/22 04:59:29 [WARN] consul.coordinate: Batch update failed: file resize error: truncate C:\Tools\Consul\data\raft\raft.db: The requested operation cannot be performed on a file with a user-mapped section open.
    2016/09/22 04:59:30 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
    2016/09/22 04:59:46 [ERR] agent: coordinate update error: No cluster leader
    2016/09/22 05:00:01 [ERR] agent: coordinate update error: No cluster leader

Related Issues

Backports

https://github.com/sitano/consul/commits/v0.6.4-bp

@slackpad
Copy link
Contributor

Hi @sitano thanks for the info in case someone wants to update an older fork to fix this. I'll close this out since this is fixed in 0.7.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants