Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle case where background thread dies due to out of memory #92

Open
baragona opened this issue Mar 26, 2022 · 3 comments
Open

Handle case where background thread dies due to out of memory #92

baragona opened this issue Mar 26, 2022 · 3 comments

Comments

@baragona
Copy link

I suspect I hit a case where out-of-memory on Linux triggered the OOM killer and might have picked a background refresher from this library.

After the OOM condition was resolved, many processes now seemingly hold the same locks, which ought to be impossible. (Unless the background refresher died, that is)

Note, I'm not completely sure this is how it happened.

Does anyone know if the Linux OOM killer will behave this way, and will Python behave this way (let the rest of the process continue if a thread dies)?

I'm not sure how one would even solve this in this library though - how can you react to a thread crashing without using another thread to check on it? That wouldn't solve anything.

@ionelmc
Copy link
Owner

ionelmc commented Mar 26, 2022

The OOM killer will simply send a SIGKILL to processes that use too much memory. That means the process is aborted abruptly.

Threads share the same address space as the main thread so everything gets killed.

Perhaps your actual problem is the background refresher not working properly. Are you using anything that can affect how python threads work (like eventlet or gevent)?

@baragona
Copy link
Author

Thanks for the info! Not using anything like eventlet or gevent here.

I looked at the error logging and I do see a bunch of these when the problem occurred:

        raise NotAcquired("Lock %s is not acquired or it already expired." % self._name)

So maybe an alternate theory is that if a refresher gets blocked for too long (due to high load or swapping), some other process may grab the lock.

It might be nice to detect this and optionally raise an error in the thread that thinks it holds a lock but doesn't.

@ionelmc
Copy link
Owner

ionelmc commented Mar 26, 2022

Well in theory there could be some platform-specific handling (like for linux there could be another thread checking that the lock is still valid and sigalarm the mainthread for interruption that would raise the exception you want) but there are so many corner cases and problems it's not worth doing at all.

We should discus what eventually can lead to a reproducer instead. Describe your code more. Does it really swap a lot? Do you run code that blocks the GIL (and doesn't allow the renewal thread to actually run)? Do you have any connection failures? What do you have in the logs? Perhaps configure logging to allow redis-lock's DEBUG level logging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants