Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hitting the limit of open files on Linux when using selectors #117

Open
pjurgielewicz opened this issue Jul 31, 2023 · 6 comments
Open

Hitting the limit of open files on Linux when using selectors #117

pjurgielewicz opened this issue Jul 31, 2023 · 6 comments

Comments

@pjurgielewicz
Copy link

pjurgielewicz commented Jul 31, 2023

First of all - I would like to congratulate you on PEP 703 acceptance @colesbury, thank you for all the effort you are putting into it!

Recently I have found myself in trouble tracking the problem that is present on Linux when using nogil 3.9. Long story short: any code using the selectors module (only on nogil CPython on Linux) must take care of unregistering and closing the selector when its instance is not needed anymore, otherwise the app will quickly hit the system limit of open files and the OSError will be raised.

An example of the vulnerable code can be inputimeout package. Please compare the original code:

https://github.com/johejo/inputimeout/blob/master/inputimeout/inputimeout.py

with my slightly modified fork:

https://github.com/pjurgielewicz/inputimeout/blob/master/inputimeout/inputimeout.py

Exactly the same behaviour can be found for the psutil package (https://pypi.org/project/psutil/). I suppose that these errors can be the result of the module design flaw rather than a problem with nogil as such but this finding can speed up the development and testing of packages that will be ported in the future.

I want to let you know that this problem happens only when 2 conditions are mutually met: nogil 3.9 (I tried the newest version and f7e45d6) on Linux (latest ubuntu and Debian). Regular CPython 3.9 on Linux is not affected (on Windows everything works also).

UPDATE: I tested this also with nogil 3.12 on Linux. With the same behaviour. Here is the MWE:

from inputimeout import inputimeout as inp
from threading import Thread

def fun():
    for i in range(2000):
        try:
            # The problem does not rely on how frequently input is checked, 
            # program breaks also when it is set to 1 but the test takes much longer
            inp(f"{i}", timeout=0.01) 
        except TimeoutError:
            pass

# Running in separate thread breaks around 1000th iteration (depends on user max open files limit, I have it set to 1024)
t = Thread(target=fun)        
t.start()
t.join()

# Running in the main thread, however, does not break - no matter whether the selector cleanup is performed or not
# fun()

Congratulations once again, you are creating the future!

@davfsa
Copy link

davfsa commented Jul 31, 2023

Hey @pjurgielewicz!

I don't have much to do with this project, I just like to keep an eye on it as I find it super interesting and the writeup to be amazing.

I just wanted to inform you that it might be worth testing this issue against the 3.12 version of nogil, which is stored at https://github.com/colesbury/nogil-3.12, as that is more up to date and your issue might be fixed there.

@pjurgielewicz
Copy link
Author

Hi @davfsa,

I am aware of the nogil 3.12, but as @colesbury is saying, its sole purpose was to get performance metrics against the newest regular CPython. That is why, there is no support for modules like Numpy and many others that are required by the software we, at the Dose-3D, project are developing.

Nevertheless, I built nogil 3.12 and recreated the test - the same behaviour is observed.

I updated the original issue with MWE - interestingly it breaks only when selectors are not freed explicitly and they are created by child threads.

@colesbury
Copy link
Owner

Hi @pjurgielewicz, I can reproduce the issue when I lower the file descriptor limit. It seems like this is an issue with reference cycles. Adding a gc.collect() after the inp() call avoids the error, although it runs much more slowly.

The nogil forks tend to trigger the GC less frequently, so if there is a reference cycle involving file descriptors, it's more likely to hit the system limit. I'm not sure what creates the cycle. The inputtimeout package looks fine, so maybe it's something in the selectors package.

@colesbury
Copy link
Owner

I've pushed a commit that avoids the reference cycle when creating selectors.

@Evan6998
Copy link

Hi @colesbury !
I am curious how your commit avoids the reference cycle. I notice that, in the original code, self._map was set to None when a selector is close(). Doesn't this action release the reference?

@colesbury
Copy link
Owner

Hi @Evan6998 - yes, if selector.close() was called then the reference cycle would be broken. The problem was that inputtimeout and psutil did not call close(). The commit avoids the reference cycle by removing the reference from the selector to the _SelectorMap (it removed self._map).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants