Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_gdbm.error: Database needs recovery #1398

Open
HungPhann opened this issue Oct 2, 2024 · 3 comments
Open

_gdbm.error: Database needs recovery #1398

HungPhann opened this issue Oct 2, 2024 · 3 comments
Labels

Comments

@HungPhann
Copy link

HungPhann commented Oct 2, 2024

Describe the bug
I'm running Flower service (flower==1.2.0) in K8s with replicas=1. The database file for Flower is maintained in K8s Persistent Volum:
Screenshot 2024-10-02 at 11 47 13 AM
Screenshot 2024-10-02 at 11 47 13 AM

From time to time, the Flower service crashes because of the corrupted database:

celery -A main flower --address=0.0.0.0 --port=5555 --purge_offline_workers=0 --max_tasks=100000 --persistent=True --db=/var/flower/flower.db --state_save_interval=5000
Traceback (most recent call last):
File "/usr/local/bin/celery", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/celery/__main__.py", line 15, in main
sys.exit(_main())
File "/usr/local/lib/python3.9/site-packages/celery/bin/celery.py", line 217, in main
return celery(auto_envvar_prefix="CELERY")
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/flower/command.py", line 42, in flower
flower = Flower(capp=app, options=options, **settings)
File "/usr/local/lib/python3.9/site-packages/flower/app.py", line 62, in __init__
self.events = events or Events(
File "/usr/local/lib/python3.9/site-packages/flower/events.py", line 128, in __init__
if state:
File "/usr/local/lib/python3.9/shelve.py", line 99, in __len__
return len(self.dict)
_gdbm.error: Database needs recovery
make: *** [Makefile:13: flower] Error 1

Expected behavior
The database corruption should never happen since the database file is only accessed by Flower service.

@HungPhann HungPhann added the bug label Oct 2, 2024
@timorickli
Copy link

I have the same issue. Does somebody already know what the issue is?

@erwannbst
Copy link

Same issue here

@HungPhann
Copy link
Author

I can't find how to avoid this issue, so I run a side-car container to periodically check and fix the database file if needed by using gdbmtool.

"""Checking and recovering Flower's db corruption"""

import shelve
import time
import os

import _gdbm

DB_PATH = "/var/flower/flower.db"


def check_flower_db(path: str) -> bool:
    """
    Check flower's database corruption

    :return False if the database need to be fixed, True otherwise
    """

    try:
        with shelve.open(path, "r") as db_file:
            list(db_file.keys())     # Accessing the keys as a basic check
        logger.info(f"Database '{path}' is valid and not corrupted.")
        return True
    except _gdbm.error as exc:
        if "Database needs recovery" in str(exc):
            logger.error(f"Database '{path}' is corrupted and needs recovery.")
            return False

        logger.error(f"Database '{path}' encountered a gdbm error: {exc}")
        return True
    except Exception as exc:    # pylint: disable=broad-exception-caught
        logger.error(f"An error occurred while checking the database '{path}': {exc}")
        return True


def recover_flower_db(path: str):
    """Recover flower's database"""

    os.system(f"echo \"recover verbose summary\" | gdbmtool {DB_PATH}")


if __name__ == "__main__":
    logger.info(f"Checking database: {DB_PATH}")

    while True:
        if not check_flower_db(path=DB_PATH):
            recover_flower_db(path=DB_PATH)

        time.sleep(60)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants