-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spontaneous "cannot verify wal state" errors #395
Comments
Found some old logs. I may be mistaken, this might be an issue that occurs on startup; I'm having a hard time telling when fly is restarting the process. Here's an example of a 10m run:
I cannot account for the 10 minute gap between
|
Another data point:
I see logs like this each time the fly.io VM restarts:
I'm wondering whether it's detaching the volume forcibly, causing some kind of disk issue that exacerbates the startup process. I'm gonna try to attach the fly.io disk to a debugging VM and run |
Thanks for the testing, @ryanpbrewster. I'm closing this issue per the live replication announcement here: #8 (comment) |
I'm running litestream in a fly.io environment using live replication (based on https://github.com/benbjohnson/litestream-read-replica-example). It works well on startup, but after a few hours it reliably gets into a broken state where it emits logs like this:
I'm invoking my app via the litestream
--exec
flag:I can reliably fix the issue by restarting the container. Indeed I am currently working around this issue by piping
litestream
intohead -n 1000
so that it automatically exits (and is then restarted) anytime it begins spamming logs.A bit of extra info in case that helps:
I am running litestream v0.4.0-beta.2 installed via Docker with
The output of
litestream version
isThe folder it's complaining about actually does exist (at least at the points in time when I am manually inspecting it):
/app/config/primary.yaml looks like
I have a replica listening from another region, and that replica's config looks like
This seems to happen mostly when the database is idle, but I can't be too confident. I've had a sessions that lasted multiple hours before hitting this error, and some sessions that hit it within 10 minutes.
The text was updated successfully, but these errors were encountered: