Closed
Description
consul version
for both Client and Server
Client: 0.9.2 upd from 0.8x
Server: 0.9.2 upd from 0.8x
consul info
for both Client and Server
Server: ACL enbled, but allow all.
Operating system and Environment details
Win server 2012r2
Consul started as service by nssm demon. Account - SYSTEM for both processes.
Description of the Issue (and unexpected/desired result)
After updating consul to 0.9.2 and fixing troubles with acl, file system trigger was discovered.
folder 'Consul\data\raft\snapshots' ate 40gb.
Reproduction steps
unknown...( At the moment it's stable. Begin at some times after restarting
Log Fragments (TRACE level):
2017/08/22 13:50:14 [INFO] raft: Starting snapshot up to 75495602
2017/08/22 13:50:14 [INFO] snapshot: Creating new snapshot at data\raft\snapshots\71-75495602-1503399014216.tmp
2017/08/22 13:50:14 [ERR] snapshot: Failed syncing parent directory data\raft\snapshots, error: sync data\raft\snapshots: The handle is invalid.
2017/08/22 13:50:14 [ERR] raft: Failed to take snapshot: failed to close snapshot: sync data\raft\snapshots: The handle is invalid.
2017/08/22 13:50:19 [INFO] consul.fsm: snapshot created in 0s
2017/08/22 13:50:19 [INFO] raft: Starting snapshot up to 75495650
2017/08/22 13:50:19 [INFO] snapshot: Creating new snapshot at data\raft\snapshots\71-75495650-1503399019803.tmp
2017/08/22 13:50:19 [ERR] snapshot: Failed syncing parent directory data\raft\snapshots, error: sync data\raft\snapshots: The handle is invalid.
2017/08/22 13:50:19 [ERR] raft: Failed to take snapshot: failed to close snapshot: sync data\raft\snapshots: The handle is invalid.
2017/08/22 13:50:28 [INFO] consul.fsm: snapshot created in 0s
2017/08/22 13:50:28 [INFO] raft: Starting snapshot up to 75495748
2017/08/22 13:50:28 [INFO] snapshot: Creating new snapshot at data\raft\snapshots\71-75495748-1503399028720.tmp
2017/08/22 13:50:28 [ERR] snapshot: Failed syncing parent directory data\raft\snapshots, error: sync data\raft\snapshots: The handle is invalid.
2017/08/22 13:50:28 [ERR] raft: Failed to take snapshot: failed to close snapshot: sync data\raft\snapshots: The handle is invalid.
Metadata
Metadata
Assignees
Type
Projects
Relationships
Development
No branches or pull requests
Activity
preetapan commentedon Aug 22, 2017
@Lexus-3141 Thanks for the report, this looks related to hashicorp/raft#232
Saving snapshots now calls the "sync" syscall to ensure that data is actually persisted to disk correctly. Sync's implementation is different for Windows vs *nix. For windows, this calls FlushFileBuffers. As far as I can tell this the right implementation. Do you see this autorecover after it emits the above error after restarting, or is it never able to save a snapshot successfully?
Lexus-3141 commentedon Aug 22, 2017
Yes, autorecovery works successfully. Log after restart:
But i'm not sure about '(Leader: "")', does it a correct way of leader discovery?
preetapan commentedon Aug 22, 2017
Yes the
Leader:""
log message is expected to be seen when it starts up as a follower.preetapan commentedon Aug 22, 2017
@Lexus-3141 according to the documentation for FlushFileBuffers:
The code changes for hashicorp/raft#232 perform a fsync on the parent directory, which is failing according to the logs you added above. This could be because the consul agent is not running with admin privileges, can you double check that?
Lexus-3141 commentedon Aug 22, 2017
It's do it. And it has an elevated token.

Once I found in windows 'System' principal has less rights then 'Administrator'. But I think it's enough for interact to file system.
preetapan commentedon Aug 22, 2017
@Lexus-3141 can you see if the error goes away with Administrator instead of System as the principal? Asking because while having write permissions allows you to fsync and saving the snapshot file, looks like fsyncing the snapshot directory requires administrator privileges according to that doc page. This is a difference in behavior between *nix and Windows. On *nix systems, write permissions are sufficient to do fsyncs.
Lexus-3141 commentedon Aug 22, 2017
I got same result again.
Lexus-3141 commentedon Aug 22, 2017
I try to auditing file system access and found nothing. All operation has success result.
Looks like you do it wrong. ( WinAPI is very dense.
Lexus-3141 commentedon Aug 22, 2017
And at local administrator too.
slackpad commentedon Aug 22, 2017
I was able to get this to happen in a Windows 10 x64 VM, so it doesn't appear to be specific to the Windows Server 2012r2 version reported here. We might need to do some kind of alternate thing for Windows for the directory sync.
Update raft library for windows snapshot fsync fixes. This fixes #3409
Merge pull request #3416 from hashicorp/issue_3409
s-vitaliy commentedon Feb 15, 2018
Hello, I have a similar issue with nomad (windows server 2016, nomad version 0.6.2). Is this issue fixed in nomad as well?