Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't auto resume after no space error #8217

Closed
karelrooted opened this issue Apr 22, 2021 · 6 comments
Closed

Can't auto resume after no space error #8217

karelrooted opened this issue Apr 22, 2021 · 6 comments

Comments

@karelrooted
Copy link

karelrooted commented Apr 22, 2021

Can't auto resume after no space error when the no space error is only trigger by db_->Write without any other background action (compact/flush)

Expected behavior

Auto resume after the disk has space

Actual behavior

Can't Auto resume

Steps to reproduce the behavior

  1. disk 40G , fill the disk with ocupy_file 39G
  2. start db->write loop, trigger no space error
  3. rm ocupy_file, and the db will not auto resume in following scenario: no space error when the no space error is only trigger by db_->Write without any other background action (compact/flush) (trigger this while debug auto resume feature after no space error, this scenario happens 2 times in 30 test run)
@anand1976
Copy link
Contributor

Do you use 2PC (options.allow_2pc)? And is it able to auto resume when the error is triggered by flush/compaction?

@karelrooted
Copy link
Author

Do you use 2PC (options.allow_2pc)? And is it able to auto resume when the error is triggered by flush/compaction?

options.allow_2pc is default false

if EventListener::OnBackgroundError has any other no space error ( kMemtable , kCompact, kFlush) , the db is able to auto resume.

When can't auto resume scenario happens, EventListener::OnBackgroundError has only no space error (kWriteCallback)

@karelrooted
Copy link
Author

when the bug is triggered,the following loop is runnning in db_impl_write.cc

  1. line 67 DBImpl::WriteImpl
  2. line 231 status = PreprocessWrite() status is no space error
  3. line 254 if (status.ok()) {} because status is no ok , did not run the code that assign value to io_s
  4. line 419 WriteStatusCheck use old setbgerror func which did not triiger auto resume
  5. the next write go to step 1, and the loop continues

@anand1976
Copy link
Contributor

I can't think of why no space error with reason kWriteCallback would not trigger auto resume. Can you try to step through the code using gdb? ErrorHandler::SetBGError would be a good place to start.

@karelrooted
Copy link
Author

I can't think of why no space error with reason kWriteCallback would not trigger auto resume. Can you try to step through the code using gdb? ErrorHandler::SetBGError would be a good place to start.

when step through the code using gdb, the only clue is in my last comment above, the code can't run into new setbgerror(io_status), the old setbgerror(status) will only trigger auto resume at first time when error level is the same (in the test run, when the first time kHardError is triggered, the disk is full, so auto resume failed , the second time kHardError is triggered, setbgerror directly return error without trying to auto resume)

@anand1976
Copy link
Contributor

This issue might be fixed in #8376. Please try with it and reopen this if not resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants