Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pymc3.model.Context is not thread-safe #1552

Closed
burnpanck opened this issue Nov 25, 2016 · 1 comment
Closed

pymc3.model.Context is not thread-safe #1552

burnpanck opened this issue Nov 25, 2016 · 1 comment

Comments

@burnpanck
Copy link
Contributor

When I first saw the context manager redesign in PyMC3 of what was a module level global in PyMC2, I welcomed this as a step forward in library compatibility by not relying on global state. However, the implementation is in fact the opposite: It relies on hidden global state, making the situation worse: The context manager suggests that the global activation of the model is scoped to the code within the context manager, but in fact it is global, only scoped in time! While for the average interactive statistician, this will probably not be much of a problem, but it makes the incorporation of PyMC3(-models) into libraries difficult, because a library should never modify global state.

I think the solution is simple: Context should use thread-local storage to store it's active model stack, however, I am just discovering PyMC3, so I could be missing something. Still, unless there are objections by more seasoned developers, I'd volunteer to write a PR for this.

@fonnesbeck
Copy link
Member

I was not aware of this, so thanks for pointing it out. We would obviously welcome a PR to address this.

burnpanck added a commit to burnpanck/pymc3 that referenced this issue Nov 28, 2016
burnpanck added a commit to burnpanck/pymc3 that referenced this issue Nov 29, 2016
twiecki pushed a commit that referenced this issue Nov 30, 2016
* placed context stack inside thread-local data space

* fixed thread-local context manager, and added a regression test

* fixed docstring of #1552 regression test
ColCarroll pushed a commit to ColCarroll/pymc3 that referenced this issue Dec 2, 2016
* placed context stack inside thread-local data space

* fixed thread-local context manager, and added a regression test

* fixed docstring of pymc-devs#1552 regression test
ColCarroll pushed a commit to ColCarroll/pymc3 that referenced this issue Dec 3, 2016
* placed context stack inside thread-local data space

* fixed thread-local context manager, and added a regression test

* fixed docstring of pymc-devs#1552 regression test
thomasaarholt added a commit to thomasaarholt/pymc that referenced this issue Aug 23, 2024
We create a global instance of this model within this module.

We inherit from threading.local to ensure thread safety when working
with models on multiple threads. See pymc-devs#1552 for the reasoning. This is
already tested in `test_thread_safety`.
thomasaarholt added a commit to thomasaarholt/pymc that referenced this issue Aug 23, 2024
We create a global instance of it within this module, which is similar
to how it worked before, where a `context_class` attribute was attached
to the Model class.

We inherit from threading.local to ensure thread safety when working
with models on multiple threads. See pymc-devs#1552 for the reasoning. This is
already tested in `test_thread_safety`.
thomasaarholt added a commit to thomasaarholt/pymc that referenced this issue Aug 28, 2024
We create a global instance of it within this module, which is similar
to how it worked before, where a `context_class` attribute was attached
to the Model class.

We inherit from threading.local to ensure thread safety when working
with models on multiple threads. See pymc-devs#1552 for the reasoning. This is
already tested in `test_thread_safety`.
thomasaarholt added a commit to thomasaarholt/pymc that referenced this issue Oct 4, 2024
We create a global instance of it within this module, which is similar
to how it worked before, where a `context_class` attribute was attached
to the Model class.

We inherit from threading.local to ensure thread safety when working
with models on multiple threads. See pymc-devs#1552 for the reasoning. This is
already tested in `test_thread_safety`.
thomasaarholt added a commit to thomasaarholt/pymc that referenced this issue Oct 9, 2024
We create a global instance of it within this module, which is similar
to how it worked before, where a `context_class` attribute was attached
to the Model class.

We inherit from threading.local to ensure thread safety when working
with models on multiple threads. See pymc-devs#1552 for the reasoning. This is
already tested in `test_thread_safety`.
thomasaarholt added a commit to thomasaarholt/pymc that referenced this issue Oct 10, 2024
We create a global instance of it within this module, which is similar
to how it worked before, where a `context_class` attribute was attached
to the Model class.

We inherit from threading.local to ensure thread safety when working
with models on multiple threads. See pymc-devs#1552 for the reasoning. This is
already tested in `test_thread_safety`.
ricardoV94 pushed a commit that referenced this issue Oct 10, 2024
* Type get_context correctly

get_context returns an instance of a Model, not a ContextMeta object
We don't need the typevar, since we don't use it for anything special

* Import from future to use delayed evaluation of annotations

All of these are supported on python>=3.9.

* New ModelManager class for managing model contexts

We create a global instance of it within this module, which is similar
to how it worked before, where a `context_class` attribute was attached
to the Model class.

We inherit from threading.local to ensure thread safety when working
with models on multiple threads. See #1552 for the reasoning. This is
already tested in `test_thread_safety`.

* Model class is now the context manager directly

* Fix type of UNSET in type definition

UNSET is the instance of the _UnsetType type.
We should be typing the latter here.

* Set model parent in init rather than in __new__

We use the new ModelManager.parent_context property to reliably set any
parent context, or else set it to None.

* Replace get_context in metaclass with classmethod

We set this directly on the class as a classmethod, which is clearer
than going via the metaclass.

* Remove get_contexts from metaclass

The original function does not behave as I expected.
In the following example I expected that it would return only the final
model, not root.

This method is not used anywhere in the pymc codebase, so I have dropped
it from the codebase. I originally included the following code to replace
it, but since it is not used anyway, it is better to remove it.

```python`
@classmethod
def get_contexts(cls) -> list[Model]:
    """Return a list of the currently active model contexts."""
    return MODEL_MANAGER.active_contexts
```

Example for testing behaviour in current main branch:
```python
import pymc as pm

with pm.Model(name="root") as root:
    print([c.name for c in pm.Model.get_contexts()])
    with pm.Model(name="first") as first:
        print([c.name for c in pm.Model.get_contexts()])
    with pm.Model(name="m_with_model_None", model=None) as m_with_model_None:
        # This one doesn't make much sense:
        print([c.name for c in pm.Model.get_contexts()])
```

* Simplify ContextMeta

We only keep the __call__ method, which is necessary to keep the
model context itself active during that model's __init__.

* Type Model.register_rv for for downstream typing

In pymc/distributions/distribution.py, this change allows the type
checker to infer that `rv_out` can only be a TensorVariable.

Thanks to @ricardoV94 for type hint on rv_var.

* Include np.ndarray as possible type for coord values

I originally tried numpy's ArrayLike, replacing Sequence entirely, but then I realized
that ArrayLike also allows non-sequences like integers and floats.

I am not certain if `values="a string"` should be legal. With the type hint sequence, it is.
Might be more accurate, but verbose to use `list | tuple | set | np.ndarray | None`.

* Use function-scoped new_dims to handle type hint varying throughout function

We don't want to allow the user to pass a `dims=[None, None]` to our function, but current behaviour
set `dims=[None] * N` at the end of `determine_coords`.

To handle this, I created a `new_dims` with a larger type scope which matches
the return type of `dims` in `determine_coords`.

Then I did the same within def Data to support this new type hint.

* Fix case of dims = [None, None, ...]

The only case where dims=[None, ...] is when the user has passed dims=None. Since the user passed dims=None,
they shouldn't be expecting any coords to match that dimension. Thus we don't need to try to add any
more coords to the model.

* Remove unused hack
mkusnetsov pushed a commit to mkusnetsov/pymc that referenced this issue Oct 26, 2024
* Type get_context correctly

get_context returns an instance of a Model, not a ContextMeta object
We don't need the typevar, since we don't use it for anything special

* Import from future to use delayed evaluation of annotations

All of these are supported on python>=3.9.

* New ModelManager class for managing model contexts

We create a global instance of it within this module, which is similar
to how it worked before, where a `context_class` attribute was attached
to the Model class.

We inherit from threading.local to ensure thread safety when working
with models on multiple threads. See pymc-devs#1552 for the reasoning. This is
already tested in `test_thread_safety`.

* Model class is now the context manager directly

* Fix type of UNSET in type definition

UNSET is the instance of the _UnsetType type.
We should be typing the latter here.

* Set model parent in init rather than in __new__

We use the new ModelManager.parent_context property to reliably set any
parent context, or else set it to None.

* Replace get_context in metaclass with classmethod

We set this directly on the class as a classmethod, which is clearer
than going via the metaclass.

* Remove get_contexts from metaclass

The original function does not behave as I expected.
In the following example I expected that it would return only the final
model, not root.

This method is not used anywhere in the pymc codebase, so I have dropped
it from the codebase. I originally included the following code to replace
it, but since it is not used anyway, it is better to remove it.

```python`
@classmethod
def get_contexts(cls) -> list[Model]:
    """Return a list of the currently active model contexts."""
    return MODEL_MANAGER.active_contexts
```

Example for testing behaviour in current main branch:
```python
import pymc as pm

with pm.Model(name="root") as root:
    print([c.name for c in pm.Model.get_contexts()])
    with pm.Model(name="first") as first:
        print([c.name for c in pm.Model.get_contexts()])
    with pm.Model(name="m_with_model_None", model=None) as m_with_model_None:
        # This one doesn't make much sense:
        print([c.name for c in pm.Model.get_contexts()])
```

* Simplify ContextMeta

We only keep the __call__ method, which is necessary to keep the
model context itself active during that model's __init__.

* Type Model.register_rv for for downstream typing

In pymc/distributions/distribution.py, this change allows the type
checker to infer that `rv_out` can only be a TensorVariable.

Thanks to @ricardoV94 for type hint on rv_var.

* Include np.ndarray as possible type for coord values

I originally tried numpy's ArrayLike, replacing Sequence entirely, but then I realized
that ArrayLike also allows non-sequences like integers and floats.

I am not certain if `values="a string"` should be legal. With the type hint sequence, it is.
Might be more accurate, but verbose to use `list | tuple | set | np.ndarray | None`.

* Use function-scoped new_dims to handle type hint varying throughout function

We don't want to allow the user to pass a `dims=[None, None]` to our function, but current behaviour
set `dims=[None] * N` at the end of `determine_coords`.

To handle this, I created a `new_dims` with a larger type scope which matches
the return type of `dims` in `determine_coords`.

Then I did the same within def Data to support this new type hint.

* Fix case of dims = [None, None, ...]

The only case where dims=[None, ...] is when the user has passed dims=None. Since the user passed dims=None,
they shouldn't be expecting any coords to match that dimension. Thus we don't need to try to add any
more coords to the model.

* Remove unused hack
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants