Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul fails to start with No space left on disk #820

Closed
fraenkel opened this issue Mar 27, 2015 · 6 comments
Closed

Consul fails to start with No space left on disk #820

fraenkel opened this issue Mar 27, 2015 · 6 comments

Comments

@fraenkel
Copy link

After about a day of running unit tests, we end up in a situation where we can no longer start Consul.
All we get is
[o][consul_cluster[0]] ==> Starting Consul agent...
[o][consul_cluster[0]] ==> Error starting agent: Failed to start Consul server: Failed to start Raft: No space left on device

Digging in to it, StateStore.initialize() is failing when calling s.env.Open(s.path, flags, 0755)
Seems like something is leaking badly since it requires us to reboot to get healthy again.

@armon
Copy link
Member

armon commented Mar 27, 2015

Hmm it shouldn't be "leaking" disk space. Depending on your test suite, it may just have exhausted the disk. Can you explain your benchmarks more?

@fraenkel
Copy link
Author

This is after a day of unit tests where we start and stop although we are killing consul.
Its not disk space since we have 15% free. I am not sure what device its referring to.

@armon
Copy link
Member

armon commented Mar 27, 2015

Hmm. It could be due to the way LMDB works. We mmap a 32GB file in, but the file is sparse. It's possible that there isn't enough space for the mmap to succeed, so the no space left error is being returned.

@fraenkel
Copy link
Author

That would seem a bit odd if it was the file. We are careful about creating all of the files under our tmpdir and remove them all. We see that we have enough memory and disk before we run. Its kinda like a switch. The moment we see the error, it fail until we reboot the machine.

An example of our tests can be found here:
https://github.com/cloudfoundry-incubator/consuladapter/blob/wip/lock_test.go

We have others but they are more variations on the theme. The ones in consuladapter will create 3 servers but in all other tests we only do 1. Number of servers doesn't matter once the error begins.

@ryanuber
Copy link
Member

This should now be a non-issue, as we've completely replaced the LMDB-backed state store with an in-memory index so we never need to touch the disk. @fraenkel I know this ticket is old but are you still seeing this? Trying with the newest Consul RC might give you significantly better results.

@fraenkel
Copy link
Author

@ryanuber Nope. I will close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants