Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node failed to recover after running out of disk space #2625

Closed
bowenwang1996 opened this issue May 10, 2020 · 5 comments
Closed

Node failed to recover after running out of disk space #2625

bowenwang1996 opened this issue May 10, 2020 · 5 comments
Assignees
Labels
A-storage Area: storage and databases

Comments

@bowenwang1996
Copy link
Collaborator

When I try to recover a node after it ran out of disk space, I got

thread 'main' panicked at 'Failed to open the database: DBError(Error { message:
 "Corruption: SST file is ahead of WALs" })', core/store/src/lib.rs:232:23
@bowenwang1996 bowenwang1996 added the A-storage Area: storage and databases label May 10, 2020
@ailisp
Copy link
Member

ailisp commented May 11, 2020

This can be seen as a bug of rocksdb but we should not care cases of run out of disk space (many parts of system would out of order once run out of disk). In short why it happens:

  1. rocksdb write create a SST file,
  2. rocksdb try to dump WAL (write ahead log) into the new SST
  3. because of disk out of space 2. was failed

so it appeared to SST ahead of WALs.
I think if delete the largest number SST file, and restart the node would work.

@bowenwang1996
Copy link
Collaborator Author

@ailisp It is fine if the node crashes when it runs out of disk space, but it should not result in an inconsistent state, i.e, we should be able to recover the node.

@ailisp
Copy link
Member

ailisp commented May 12, 2020 via email

@bowenwang1996
Copy link
Collaborator Author

I see. Is it a known issue or should we report it?

@ailisp
Copy link
Member

ailisp commented May 13, 2020

For me it looks like (highlevel, half guessed 😃 )

  • rocksdb opens both latest sst and wal file, they both opened successfully, sst file didn't exist before so create new empty file
  • write something into sst, failed due to disk space

It depends on how do you think write fail due to run out of space is a normal error situation and should be error handled. IMO it's not, disk space outage should be avoided

@ailisp ailisp closed this as completed May 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Area: storage and databases
Projects
None yet
Development

No branches or pull requests

2 participants