-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unexpected end of JSON input after server crash #1221
Comments
Yeah this could probably done in a little safer manner to reduce the chances of corrupting them. It should be safe to delete them and restart, as long as you re-register the checks since the agent is the source of truth about what's registered. |
👍 for this. We ran into this as well:
|
The same problem :( |
saw this as well with |
This happened about half the times I made changes to I did see When there was 2 empty files, I actually got a useful error for the second one:
|
I have a pull request up for this in #1750 |
This looks like the same problem that nomad had in hashicorp/nomad#1357 which IMHO is a two part solution. Firstly fix the write to be atomic like in #1750 but secondly still allow startup of corrupt state. |
Out of curiosity, did you find something in my implementation that you didn't agree with, and is startup with corrupt state a desirable thing for a system responsible for dns and service discovery? I personally feel like consul's config files need to be as guaranteed as possible. |
Not sure if hashicorp/nomad#1367 is the same thing so I am also posting here. +1 for this fix. Occasionally I see this issue with Nomad 0.4.0
It turns out that some state.json file are empty
Deleting /nomad_directory/ helped to overcome the issue. Not sure what caused this. It might have been the shutdown of the nodes. There was no shortage of disk space. |
I just realized that my previous comment was for a similar issue with Nomad. Now I have experienced it with Consul too on 0.6.3.dev.
|
This issue is still valid in 0.8.5 with an empty JSON file in
|
Hi @zg did you experience a crash/running out of disk space to get the empty file, or did it occur some other way? |
I created the tmp file myself as a proof of concept, but previously there was an incident where consul couldn’t start due to zero byte tmp files that were in the data directory that we previously were unaware of. |
I see. I'll kick this back open so we can take a look. |
@zg how did you resolve this? I have an issue where I just pushed a rolling cluster update using kops and one of my consul instances failed to start as it's looking for a service file that doesn't exist.
|
@pjelar delete the empty services file and restart consul |
@MrMMorris there wasn't a file there. I upgraded to 1.0.0 and the problem went away. |
It could be that an upgrade deletes the tmp files. |
The files weren't there @zg I deleted them. I tried placing valid .json files there too. More likely that 1.0.0 handles this issue better than 0.8.5, it wasn't a normal situation as I also had to do a peers.json manual recovery. |
That’s how I resolved my issue as well, by deleting the empty JSON files. |
Previously a change was made to make the file writing atomic, but that wasn't enough to cover something like an OS crash so we needed something here to handle the situation more gracefully. Fixes #1221.
Consul v1.15.3 still does not write the
|
I also still have the same problem |
In my case, I rm'd |
After rebooting a server running Consul agent (in server mode), attempting to restart Consul's agent with its previously used parameters throws these errors:
==> Starting Consul agent...
==> Error starting agent: unexpected end of JSON input
Looking in the directories: <consul_data_dir>/checks & <consul_data_dir>/services, I see several files with size 0.
The server's reboot occurred at the same time that we registered health checks to Consul, so I believe that the files are corrupt (empty actually) because they are not being written atomically.
Thanks,
Amir
The text was updated successfully, but these errors were encountered: