Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible leak when exception is raised in inner coroutine #3346

Open
ian-h-chamberlain opened this issue Nov 8, 2023 · 1 comment
Open
Labels

Comments

@ian-h-chamberlain
Copy link

ian-h-chamberlain commented Nov 8, 2023

Hi, I have been debugging a memory leak in an application that uses Tornado and I was able to narrow down the behavior to what I believe is a representative minimum example, based on an existing unit test.

Edit:

  • Reproducing with CPython 3.9.16 on macOS, although the original issue was on Linux with CPython 3.6.8 (I know, super old! but hopefully any workaround would still apply).
  • Tested against master as of a6dfd70 but from what I can tell this reproduces very far back to old versions of Tornado (e.g. 4.5+)

Possibly related, from what I can gather (although closed for a long time now): #1872 #2229

I have a couple questions:

  1. Is this expected behavior / is there some reason I'm missing for why a reference cycle forms here? I don't have a ton of experience with Tornado directly so maybe this is just a side effect of how the IOLoop is implemented or something, and it's expected that GC is needed to clean up in this scenario.
  2. Is there a "correct" workaround that would help free the exception / traceback, etc.? I can del the local variables, but my fear is that in that case the remaining stack frames, tracebacks etc. would stick around, possibly resulting in other leaked variables that aren't deleted (particularly if the exception is bubbled through several coroutines in a more complicated example).

Here is the test I'm reproducing with, adapted from the existing one in gen_test.py:

    @skipNotCPython
    @unittest.skipIf(
        (3,) < sys.version_info < (3, 6), "asyncio.Future has reference cycles"
    )
    def test_coroutine_refcounting(self):
        # On CPython, tasks and their arguments should be released immediately
        # without waiting for garbage collection.
        @gen.coroutine
        def raise_exc():
            yield
            raise ValueError("Some error")

        @gen.coroutine
        def inner():
            class Foo(object):
                pass

            local_var = Foo()
            self.local_ref = weakref.ref(local_var)

            try:
                yield raise_exc()
            except ValueError:
                # del local_var # <- this works to free the variable
                pass

        @gen.coroutine
        def inner2():
            yield inner()

            # Error! There is a still a strong reference to the local variable via
            # the ValueError traceback -> `inner` frame, but why is that?
            self.assertIsNone(self.local_ref())

        self.io_loop.run_sync(inner2, timeout=3)

        self.assertIsNone(self.local_ref())
        self.finished = True

Thanks!

@bdarnell
Copy link
Member

Is this expected behavior / is there some reason I'm missing for why a reference cycle forms here? I don't have a ton of experience with Tornado directly so maybe this is just a side effect of how the IOLoop is implemented or something, and it's expected that GC is needed to clean up in this scenario.

I wouldn't say it's "expected" (I do treat this kind of thing as a bug when I find it), but it's a fairly common consequence of the way the coroutine decorator and Runner work. It's reimplementing (in python) a lot of things that the python interpreter does, and especially on the exception-handling path it's really easy to get reference cycles through tracebacks. The answer is usually to del a local variable somewhere in tornado.gen.Runner.

FWIW the situation is improved with native (async def) coroutines, although I think the issue is still more likely to occur than in normal synchronous code.

Is there a "correct" workaround that would help free the exception / traceback, etc.?

So far we've always been able to fix these kinds of issues by fixing the coroutine runner so no workaround is needed in the application. I'd have to dig into this to see where exactly the issue is but with a reproducible test case it shouldn't be too hard. The relatively-new circlerefs_test.py is how I diagnose these issues and keep them from recurring.

@bdarnell bdarnell added the gen label Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants