Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multicore inference not working #445

Closed
dharasim opened this issue Jan 28, 2022 · 7 comments
Closed

Multicore inference not working #445

dharasim opened this issue Jan 28, 2022 · 7 comments

Comments

@dharasim
Copy link

First of all, thank you very much for working on this great project!

My issue is that even for very simple models, running chains on multiple cores doesn't work.

Content of test_bambi.py:

import bambi as bmb
import pandas as pd
import numpy as np

data = pd.DataFrame({
    "y": np.random.normal(size=50),
    "x1": np.random.normal(size=50),
    "x2": np.random.normal(size=50)
})

model = bmb.Model("y ~ x1 + x2", data)
fitted = model.fit(cores=2)

Output:

> python3 test_bambi.py               
/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/statsmodels/compat/pandas.py:65: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import Int64Index as NumericIndex
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [y_sigma, x2, x1, Intercept]
/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/statsmodels/compat/pandas.py:65: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import Int64Index as NumericIndex
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [y_sigma, x2, x1, Intercept]
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/forkserver.py", line 274, in main
    code = _serve_one(child_r, fds,
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/forkserver.py", line 313, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/harasim/Documents/repos/python/test-bambi/test_bambi.py", line 12, in <module>
    fitted = model.fit(cores=2)
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/bambi/models.py", line 278, in fit
    return self.backend.run(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/bambi/backend/pymc.py", line 90, in run
    result = self._run_mcmc(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/bambi/backend/pymc.py", line 217, in _run_mcmc
    idata = pm.sample(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/sampling.py", line 559, in sample
    trace = _mp_sample(**sample_args, **parallel_args)
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/sampling.py", line 1461, in _mp_sample
    sampler = ps.ParallelSampler(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 431, in __init__
    self._samplers = [
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 432, in <listcomp>
    ProcessAdapter(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 292, in __init__
    self._process.start()
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 291, in _Popen
    return Popen(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_forkserver.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Traceback (most recent call last):
  File "/Users/harasim/Documents/repos/python/test-bambi/test_bambi.py", line 12, in <module>
    fitted = model.fit(cores=2)
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/bambi/models.py", line 278, in fit
    return self.backend.run(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/bambi/backend/pymc.py", line 90, in run
    result = self._run_mcmc(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/bambi/backend/pymc.py", line 217, in _run_mcmc
    idata = pm.sample(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/sampling.py", line 559, in sample
    trace = _mp_sample(**sample_args, **parallel_args)
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/sampling.py", line 1461, in _mp_sample
    sampler = ps.ParallelSampler(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 431, in __init__
    self._samplers = [
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 432, in <listcomp>
    ProcessAdapter(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 292, in __init__
    self._process.start()
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 291, in _Popen
    return Popen(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_forkserver.py", line 58, in _launch
    f.write(buf.getbuffer())
BrokenPipeError: [Errno 32] Broken pipe

However, if I set model.fit(cores=1) it runs the chains sequentially and succeeds.

> python3 test_bambi.py
/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/statsmodels/compat/pandas.py:65: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import Int64Index as NumericIndex
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [y_sigma, x2, x1, Intercept]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 3 seconds.0% [2000/2000 00:01<00:00 Sampling chain 1, 0 divergences]

I used a fresh installation in a virtual env with python 3.9.7 on macOS 11.6 on a machine with a 2.6 GHz 6-Core Intel Core i7.

> python3 -m pip install bambi        
Collecting bambi
  Using cached bambi-0.7.1-py3-none-any.whl (72 kB)
Collecting scipy>=1.7.0
  Using cached scipy-1.7.3-cp39-cp39-macosx_10_9_x86_64.whl (33.2 MB)
Collecting formulae==0.2.0
  Using cached formulae-0.2.0-py3-none-any.whl (43 kB)
Collecting pymc3>=3.9.0
  Using cached pymc3-3.11.4-py3-none-any.whl (869 kB)
Collecting pandas>=1.0.0
  Using cached pandas-1.4.0-cp39-cp39-macosx_10_9_x86_64.whl (11.5 MB)
Collecting statsmodels>=0.9
  Using cached statsmodels-0.13.1-cp39-cp39-macosx_10_15_x86_64.whl (9.6 MB)
Collecting arviz>=0.11.2
  Using cached arviz-0.11.4-py3-none-any.whl (1.6 MB)
Collecting numpy<1.22.0,>=1.16.1
  Using cached numpy-1.21.5-cp39-cp39-macosx_10_9_x86_64.whl (17.0 MB)
Collecting xarray>=0.16.1
  Using cached xarray-0.20.2-py3-none-any.whl (845 kB)
Collecting packaging
  Using cached packaging-21.3-py3-none-any.whl (40 kB)
Collecting typing-extensions<4,>=3.7.4.3
  Using cached typing_extensions-3.10.0.2-py3-none-any.whl (26 kB)
Requirement already satisfied: setuptools>=38.4 in ./env/lib/python3.9/site-packages (from arviz>=0.11.2->bambi) (57.4.0)
Collecting netcdf4
  Using cached netCDF4-1.5.8-cp39-cp39-macosx_10_9_x86_64.whl (4.2 MB)
Collecting matplotlib>=3.0
  Using cached matplotlib-3.5.1-cp39-cp39-macosx_10_9_x86_64.whl (7.3 MB)
Collecting python-dateutil>=2.8.1
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting pytz>=2020.1
  Using cached pytz-2021.3-py2.py3-none-any.whl (503 kB)
Collecting theano-pymc==1.1.2
  Using cached Theano_PyMC-1.1.2-py3-none-any.whl
Collecting semver>=2.13.0
  Using cached semver-2.13.0-py2.py3-none-any.whl (12 kB)
Collecting fastprogress>=0.2.0
  Using cached fastprogress-1.0.0-py3-none-any.whl (12 kB)
Collecting cachetools>=4.2.1
  Using cached cachetools-5.0.0-py3-none-any.whl (9.1 kB)
Collecting patsy>=0.5.1
  Using cached patsy-0.5.2-py2.py3-none-any.whl (233 kB)
Collecting dill
  Using cached dill-0.3.4-py2.py3-none-any.whl (86 kB)
Collecting filelock
  Using cached filelock-3.4.2-py3-none-any.whl (9.9 kB)
Collecting cycler>=0.10
  Using cached cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting fonttools>=4.22.0
  Using cached fonttools-4.29.0-py3-none-any.whl (895 kB)
Collecting pillow>=6.2.0
  Using cached Pillow-9.0.0-cp39-cp39-macosx_10_10_x86_64.whl (3.0 MB)
Collecting pyparsing>=2.2.1
  Using cached pyparsing-3.0.7-py3-none-any.whl (98 kB)
Collecting kiwisolver>=1.0.1
  Using cached kiwisolver-1.3.2-cp39-cp39-macosx_10_9_x86_64.whl (61 kB)
Collecting six
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting cftime
  Using cached cftime-1.5.2-cp39-cp39-macosx_10_9_x86_64.whl (222 kB)
Installing collected packages: six, pytz, python-dateutil, pyparsing, numpy, pillow, pandas, packaging, kiwisolver, fonttools, cycler, cftime, xarray, typing-extensions, scipy, netcdf4, matplotlib, filelock, theano-pymc, semver, patsy, fastprogress, dill, cachetools, arviz, statsmodels, pymc3, formulae, bambi
Successfully installed arviz-0.11.4 bambi-0.7.1 cachetools-5.0.0 cftime-1.5.2 cycler-0.11.0 dill-0.3.4 fastprogress-1.0.0 filelock-3.4.2 fonttools-4.29.0 formulae-0.2.0 kiwisolver-1.3.2 matplotlib-3.5.1 netcdf4-1.5.8 numpy-1.21.5 packaging-21.3 pandas-1.4.0 patsy-0.5.2 pillow-9.0.0 pymc3-3.11.4 pyparsing-3.0.7 python-dateutil-2.8.2 pytz-2021.3 scipy-1.7.3 semver-2.13.0 six-1.16.0 statsmodels-0.13.1 theano-pymc-1.1.2 typing-extensions-3.10.0.2 xarray-0.20.2
@tomicapretto
Copy link
Collaborator

I'm sorry but I cannot reproduce your error. I'm using Ubuntu 20.4.3 LTS, so that makes me think that may be a macOS specific problem.

Do you have another computer to test the same code?

I'm not familiar with macOS at all. I could try to bring some folks who use macOS to this issue.

@hadjipantelis
Copy link

hadjipantelis commented Jan 30, 2022

Just for reference I use both a Mac and an Ubuntu system and on the Mac @dharasim code worked fine. The error is not reproducible on my side either.

Python 3.9.10 | packaged by conda-forge | (main, Jan 28 2022, 19:24:57) 
[Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import bambi as bmb
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
/Users/phadjipa/opt/anaconda3/envs/pymc3_env/lib/python3.9/site-packages/statsmodels/compat/pandas.py:65: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import Int64Index as NumericIndex
>>> import pandas as pd
>>> import numpy as np
>>> 
>>> data = pd.DataFrame({
...     "y": np.random.normal(size=50),
...     "x1": np.random.normal(size=50),
...     "x2": np.random.normal(size=50)
... })
>>> 
>>> model = bmb.Model("y ~ x1 + x2", data)
>>> fitted = model.fit(cores=2)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [y_sigma, x2, x1, Intercept]
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 12 seconds. [4000/4000 00:03<00:00 Sampling 2 chains, 0 divergences]
>>> 

I am on macOS 11.5.2. It seems to me that this is Python-related rather than bambi-related. I know this is a long shot but maybe you want to try using a clean conda enviorment where you install pymc3 according to https://github.com/pymc-devs/pymc/wiki/Installation-Guide-(MacOS)?

@dharasim
Copy link
Author

Thank you very much for the suggestion! I tried it, installing pymc3 and bambi in a clean conda environment, but I get the same error unfortunately.

@hadjipantelis
Copy link

Do PyMC3 works? Can you try running: https://docs.pymc.io/en/v3/pymc-examples/examples/generalized_linear_models/GLM-binomial-regression.html (so we take bambi out of the equation if unnecessary).

@dharasim
Copy link
Author

Indeed, I get the same error running that code. So I confirm it's not a bambi issue.

@dharasim
Copy link
Author

This pymc3 issue seems to be relevant: pymc-devs/pymc#3140

One possible workaround is to wrap the sampling into an if statement:

import bambi as bmb
import pandas as pd
import numpy as np

data = pd.DataFrame({
    "y": np.random.normal(size=50),
    "x1": np.random.normal(size=50),
    "x2": np.random.normal(size=50)
})

if __name__ == '__main__':
    model = bmb.Model("y ~ x1 + x2", data)
    fitted = model.fit()
    print('done')

This code works for me.

@tomicapretto
Copy link
Collaborator

@dharasim thanks for reporting a possible solution! I was completely unaware of this problem. Glad you found a solution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants