Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace worker use of threading.local() with ContextVar #5485

Open
gjoseph92 opened this issue Nov 1, 2021 · 6 comments · May be fixed by #5517
Open

Replace worker use of threading.local() with ContextVar #5485

gjoseph92 opened this issue Nov 1, 2021 · 6 comments · May be fixed by #5517
Labels
enhancement Improve existing functionality or make things work better

Comments

@gjoseph92
Copy link
Collaborator

Context variables are now the recommended way to manage thread/context-local state:

Context managers that have state should use Context Variables instead of threading.local() to prevent their state from bleeding to other code unexpectedly, when used in concurrent code.
https://docs.python.org/3/library/contextvars.html

They work for threads, but as a bonus, also work for async code.

In many places in the worker (and even more in distributed in general), we use threading.local() variables currently. However, workers also support being used in an async mode (used in most tests), and increasingly support running async functions (#5151). The fact that thread-local variables can have the wrong value when used in an async context means there are likely lots of bugs around async mode. #4959 is just one example.

If we want to support async mode properly, switching Worker use of thread-local variables to ContextVars would be a good place to start. These sorts of bugs are generally very confusing and tedious to track down. It might be worth getting ahead of these bugs instead of repeatedly hunting them down.

As a bonus, we're inconsistent about ContextVars vs thread-local currently (for example, get_client checks a ContextVar, but get_worker checks a thread-local var). Consolidating on one interface might make future development smoother.

cc @jcrist @crusaderky @fjetter

@fjetter
Copy link
Member

fjetter commented Nov 2, 2021

I often stumbled over incorrect async detection when using a Client due to this. This feels like it should be a more robust way of working with it?

@jrbourbeau jrbourbeau added the enhancement Improve existing functionality or make things work better label Nov 2, 2021
@jrbourbeau
Copy link
Member

Thanks for the suggestion @gjoseph92. It sounds like switching to using context variables will resolve some tricky issues that already exist and help prevent future ones from popping up. The fact that it's also explicitly recommended over threading.local() in the Python docs is a good signal too

@crusaderky
Copy link
Collaborator

+1 for replacing thread-local variables with contetxvars wherever possible.

gjoseph92 added a commit to gjoseph92/distributed that referenced this issue Nov 16, 2021
Progress towards dask#5485.

This was surprisingly easy and actually seems to work. Not sure yet what it breaks.

I was pleasantly surprised to find that Tornado `loop.add_callback`s, `PeriodicCallback`s, etc. all play correctly with asyncio's [built-in contextvar management](https://www.python.org/dev/peps/pep-0567/#asyncio). With that, just setting the contextvar during `__init__` and `start` probably catches almost all cases, because all the long-running callbacks/coroutines (including comms) will inherit the context that's set when they're created. Where else should we add this `as_current_worker` decorator?

This gives me confidence we'll be able to use the same pattern for a single current client contextvar as mentioned in dask#5467 (comment).
@gjoseph92 gjoseph92 linked a pull request Nov 16, 2021 that will close this issue
3 tasks
@fjetter
Copy link
Member

fjetter commented May 25, 2022

See also #6107 for a summary of an earlier attempt to introduce contextvars specifically for usage with stimulus_ids

@fjetter
Copy link
Member

fjetter commented Jun 14, 2022

FYI This is soft-blocked by us still using tornado callbacks. Tornado callbacks are not dealing with ctxvars properly as pointed out in #6107 and we'd need to build our own handling for this. A refactoring like

@gjoseph92
Copy link
Collaborator Author

I'm surprised to hear that. I thought I found the opposite was the case in #5517.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improve existing functionality or make things work better
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants