-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiprocessing fails when sampling multiple chains using multiple cores #3140
Comments
Possible Windows related... @aseyboldt |
Yes, this looks like an issue with multiprocessing on windows. Can you try this: import numpy as np
import pandas as pd
import theano
import pymc3 as pm
print('*** Start script ***')
print(f'{pm.__name__}: v. {pm.__version__}')
print(f'{theano.__name__}: v. {theano.__version__}')
if __name__ == '__main__':
SEED = 20180730
np.random.seed(SEED)
# Generate data
mu_real = 0
sd_real = 1
n_samples = 1000
y = np.random.normal(loc=mu_real, scale=sd_real, size=n_samples)
# Bayesian modelling
with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sd=10)
sd = pm.HalfNormal('sd', sd=10)
# Likelihood
likelihood = pm.Normal('likelihood', mu=mu, sd=sd, observed=y)
trace = pm.sample(chains=2, cores=2, random_seed=SEED)
print('Done!') But I don't really understand why it has trouble in the notebook. Can you post the versions of pyzmq, jupyter and ipython? |
If I use the if statement then the sampling works. Still, the print statements are executed multiple times: *** Start script ***
pymc3: v. 3.5
theano: v. 1.0.2
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, mu]
*** Start script ***
pymc3: v. 3.5
theano: v. 1.0.2
*** Start script ***
pymc3: v. 3.5
theano: v. 1.0.2
Sampling 2 chains: 100%|███████████████████████████████████████████████████████████████████| 2000/2000 [00:02<00:00, 724.48draws/s] Done!
Comment on the Jupyter notebook.This particular script runs fine on Jupyter notebook (I crashed 1 time only after several attempts). Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [beta_]
Sampling 2 chains: 0%| | 0/8000 [00:00<?, ?draws/s]
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
libifcoremd.dll 00007FFC9C7F94C4 Unknown Unknown Unknown
KERNELBASE.dll 00007FFCD18B56FD Unknown Unknown Unknown
KERNEL32.DLL 00007FFCD38E3034 Unknown Unknown Unknown
ntdll.dll 00007FFCD4A11431 Unknown Unknown Unknown
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
libifcoremd.dll 00007FFC9C7F94C4 Unknown Unknown Unknown
KERNELBASE.dll 00007FFCD18B56FD Unknown Unknown Unknown
KERNEL32.DLL 00007FFCD38E3034 Unknown Unknown Unknown
ntdll.dll 00007FFCD4A11431 Unknown Unknown Unknown
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
libifcoremd.dll 00007FFC9C7F94C4 Unknown Unknown Unknown
KERNELBASE.dll 00007FFCD18B56FD Unknown Unknown Unknown
KERNEL32.DLL 00007FFCD38E3034 Unknown Unknown Unknown
ntdll.dll 00007FFCD4A11431 Unknown Unknown Unknown
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
libifcoremd.dll 00007FFC9C7F94C4 Unknown Unknown Unknown
KERNELBASE.dll 00007FFCD18B56FD Unknown Unknown Unknown
KERNEL32.DLL 00007FFCD38E3034 Unknown Unknown Unknown
ntdll.dll 00007FFCD4A11431 Unknown Unknown Unknown
[I 14:26:40.033 NotebookApp] Interrupted...
[I 14:26:40.033 NotebookApp] Shutting down 2 kernels
[I 14:26:40.135 NotebookApp] Kernel shutdown: eaa60eb4-6bae-4c91-82bf-6bd5648ddf35
[I 14:26:40.135 NotebookApp] Kernel shutdown: e41f13f3-e731-4812-8130-97a7a6220fd7 If I run the softmax regression script as python script (without the Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [beta_]
3.5
1.0.2
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [beta_]
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
Traceback (most recent call last):
File "test_softmax_multicore.py", line 38, in <module>
run_name="__mp_main__")
File "C:\Miniconda3\envs\bayes\lib\runpy.py", line 263, in run_path
trace = pm.sample(draws=3000, tune=1000, chains=2, cores=4, random_seed=SEED)
File "d:\dev\pymc3\pymc3\sampling.py", line 451, in sample
pkg_name=pkg_name, script_name=fname)
File "C:\Miniconda3\envs\bayes\lib\runpy.py", line 96, in _run_module_code
trace = _mp_sample(**sample_args)
File "d:\dev\pymc3\pymc3\sampling.py", line 998, in _mp_sample
mod_name, mod_spec, pkg_name, script_name)
File "C:\Miniconda3\envs\bayes\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)chain, progressbar)
File "D:\dev\GLM_with_PyMC3\notebooks\test_softmax_multicore.py", line 38, in <module>
File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in __init__
trace = pm.sample(draws=3000, tune=1000, chains=2, cores=4, random_seed=SEED)
File "d:\dev\pymc3\pymc3\sampling.py", line 451, in sample
for chain, seed, start in zip(range(chains), seeds, start_points)
File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in <listcomp>
trace = _mp_sample(**sample_args)
File "d:\dev\pymc3\pymc3\sampling.py", line 998, in _mp_sample
for chain, seed, start in zip(range(chains), seeds, start_points)
File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 182, in __init__
self._process.start()chain, progressbar)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\process.py", line 105, in start
File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in __init__
self._popen = self._Popen(self)
for chain, seed, start in zip(range(chains), seeds, start_points) File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 223, in _Popen
File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 275, in <listcomp>
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 322, in _Popen
for chain, seed, start in zip(range(chains), seeds, start_points)
File "d:\dev\pymc3\pymc3\parallel_sampling.py", line 182, in __init__
return Popen(process_obj)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
self._process.start()
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\process.py", line 105, in start
reduction.dump(process_obj, to_child)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\reduction.py", line 60, in dump
self._popen = self._Popen(self)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 223, in _Popen
ForkingPickler(file, protocol).dump(obj)
BrokenPipeErrorreturn _default_context.get_context().Process._Popen(process_obj):
[Errno 32] Broken pipe File "C:\Miniconda3\envs\bayes\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable. If I wrap the script into sampling 2 chains: 0%| | 0/8000 [00:00<?, ?draws/s] You can find the C code in this temporary file: C:\Users\moran\AppData\Local\Temp\theano_compilation_error__a0g2s_m
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1082, in _constructor_Function
f = maker.create(input_storage, trustme=True)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1715, in create
input_storage=input_storage_lists, storage_map=storage_map)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\link.py", line 699, in make_thunk
storage_map=storage_map)[:3]
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\vm.py", line 1091, in make_all
impl=impl))
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 955, in make_thunk
no_recycling)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 858, in make_c_thunk
output_storage=node_output_storage)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1217, in make_thunk
keep_lock=keep_lock)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1157, in __compile__
keep_lock=keep_lock)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1620, in cthunk_factory
key=key, lnk=self, keep_lock=keep_lock)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cmodule.py", line 1181, in module_from_key
module = lnk.compile_cmodule(location)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1523, in compile_cmodule
preargs=preargs)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cmodule.py", line 2388, in compile_str
(status, compile_stderr.replace('\n', '. ')))
Exception: ('The following error happened while compiling the node', Softmax(Dot22.0), '\n', 'Compilation failed (return status=3): ', '[Softmax(<TensorType(float64, matrix)>)]')
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
libifcoremd.dll 00007FFC98B294C4 Unknown Unknown Unknown
KERNELBASE.dll 00007FFCD18B56FD Unknown Unknown Unknown
KERNEL32.DLL 00007FFCD38E3034 Unknown Unknown Unknown
ntdll.dll 00007FFCD4A11431 Unknown Unknown Unknown
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
libifcoremd.dll 00007FFC98B294C4 Unknown Unknown Unknown
KERNELBASE.dll 00007FFCD18B56FD Unknown Unknown Unknown
KERNEL32.DLL 00007FFCD38E3034 Unknown Unknown Unknown
ntdll.dll 00007FFCD4A11431 Unknown Unknown Unknown |
So it seems that there are two issues here:
I'm trying to reproduce this locally, can you send me an example that fails with the second error? I have some vague ideas where this might be coming from, and if my hunch is right, setting one of import os
# one of
os.environ['MKL_THREADING_LAYER'] = 'sequential'
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['MKL_THREADING_LAYER'] = 'GNU' before you import anything else. And thank you for reporting this :-) |
The mkl_info:
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:/Miniconda3/envs/bayes\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Miniconda3/envs/bayes\\Library\\include']
blas_mkl_info:
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:/Miniconda3/envs/bayes\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Miniconda3/envs/bayes\\Library\\include']
blas_opt_info:
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:/Miniconda3/envs/bayes\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Miniconda3/envs/bayes\\Library\\include']
lapack_mkl_info:
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:/Miniconda3/envs/bayes\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Miniconda3/envs/bayes\\Library\\include']
lapack_opt_info:
libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
library_dirs = ['C:/Miniconda3/envs/bayes\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Miniconda3/envs/bayes\\Library\\include'] I tried to set the environment variables but it does not solve the issue, unfortunately. Thank you for looking into this. |
It works for me, but I have a different blas installed. How did you install python/numpy/pymc3? |
Can you maybe also post the output of |
I installed numpy (and scipy and all the PyMC3 dependencies) via Conda list
pip freeze
|
I did some digging. I found out that the error Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1082, in _constructor_Function
f = maker.create(input_storage, trustme=True)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1715, in create
Traceback (most recent call last):
File "<string>", line 1, in <module>
input_storage=input_storage_lists, storage_map=storage_map)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\link.py", line 699, in make_thunk
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Miniconda3\envs\bayes\lib\multiprocessing\spawn.py", line 115, in _main
storage_map=storage_map)[:3] self = reduction.pickle.load(from_parent)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\vm.py", line 1091, in make_all
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1082, in _constructor_Function
impl=impl))
f = maker.create(input_storage, trustme=True) File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 955, in make_thunk
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\compile\function_module.py", line 1715, in create
no_recycling)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 858, in make_c_thunk
input_storage=input_storage_lists, storage_map=storage_map)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\link.py", line 699, in make_thunk
output_storage=node_output_storage)storage_map=storage_map)[:3]
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1217, in make_thunk
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\vm.py", line 1091, in make_all
impl=impl))
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 955, in make_thunk
keep_lock=keep_lock)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1157, in __compile__
no_recycling)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\op.py", line 858, in make_c_thunk
keep_lock=keep_lock)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1620, in cthunk_factory
output_storage=node_output_storage)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1217, in make_thunk
key=key, lnk=self, keep_lock=keep_lock)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cmodule.py", line 1151, in module_from_key
keep_lock=keep_lock)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1157, in __compile__
with compilelock.lock_ctx(keep_lock=keep_lock):
File "C:\Miniconda3\envs\bayes\lib\contextlib.py", line 81, in __enter__
keep_lock=keep_lock)return next(self.gen)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1620, in cthunk_factory
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\compilelock.py", line 40, in lock_ctx
get_lock(lock_dir=lock_dir, **kw)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\compilelock.py", line 86, in _get_lock
lock(get_lock.lock_dir, **kw)
key=key, lnk=self, keep_lock=keep_lock) File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\compilelock.py", line 273, in lock
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cmodule.py", line 1181, in module_from_key
time.sleep(random.uniform(min_wait, max_wait))
KeyboardInterrupt
module = lnk.compile_cmodule(location)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cc.py", line 1523, in compile_cmodule
preargs=preargs)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\gof\cmodule.py", line 2343, in compile_str
p_out = output_subprocess_Popen(cmd)
File "C:\Miniconda3\envs\bayes\lib\site-packages\theano\misc\windows.py", line 80, in output_subprocess_Popen
out = p.communicate()
File "C:\Miniconda3\envs\bayes\lib\subprocess.py", line 843, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "C:\Miniconda3\envs\bayes\lib\subprocess.py", line 1092, in _communicate
self.stdout_thread.join(self._remaining_time(endtime))
File "C:\Miniconda3\envs\bayes\lib\threading.py", line 1056, in join
self._wait_for_tstate_lock()
File "C:\Miniconda3\envs\bayes\lib\threading.py", line 1072, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt
[I 11:43:03.371 NotebookApp] Interrupted...
[I 11:43:03.371 NotebookApp] Shutting down 1 kernel
[I 11:43:08.431 NotebookApp] Kernel shutdown: cb25a99e-15f2-4f7f-b3c0-9706ab711a70 I hope this helps to shed light on the issue. |
I have similar error (windows 2012 + pymc3 3.5(master) + theano 1.0.3 (master)
|
I also have a short program that blows up (with a "broken pipe" message) as soon as I set chains > 1. I have a multicore machine (but then who doesn't). The code:
I have a GeForce GTX 1050 GPU running CUDA 8.0, CUDNN 7.1.3, theano 1.0.3, pymc3 3.5, python 3.6.6 [global] [nvcc] [lib] [gpuarray] [scan] The error message:
|
I'm pretty sure that is not the same issue as the original (which is windows related). About the original bug: I've been trying to reproduce this on my own machine for some time, but so far I haven't managed to do that. This makes it rather hard to fix. @Jeff-Winchell About your problem:
|
Have you run the very short code example I gave and replicated the bug? If you have, its not clear why most of your post was written. If you haven't run it, it's unclear why any of your post was written. I was frankly taken aback by your post, but maybe you don't see why. I'm a software engineer, not a hacker. My teachers (and LinkedIn connections) include Ward Cunningham, Bertrand Meyer, Meilir Page-Jones, Gerry Weinberg, James Bach, Andy Hunt. I don't ship code to production code with known bugs in it. Ever. If you can't replicate the bug I'd be happy to help come up with ideas why not. Otherwise, it's unproductive. |
@Jeff-Winchell have you try running the suggestion by @aseyboldt? These are all valid suggestions, what would be productive is that you try to follow these suggestions first. Also, name-dropping is not a valid way to have a productive conversation. We do not appreciate these hostile attitudes towards our developers/users, if you keep doing this (either privately or publicly) I will have to block and report you according to our community guidelines. |
The first message to me was more hostile than my response was. Different people have different ideas about name dropping. So I guess you can ban me for saying my address is Jeff_Winchell@g.HARVARD.edu. What else was hostile besides making it clear that I know a lot more than the first poster assumed I did when asking me to do a bunch of things that aren't useful? |
FYI, none of those names I mentioned would even DREAM of banning someone for posting the message I did. |
So go ahead and block me. The mere threat you made about doing so, so frivolously makes me want to challenge bullies publicly, just like they challenge me. |
Related discourse thread |
I looked at that thread. If I move ONLY the pymc3.sample function into a if name=='main' block AND I make sure my GPU is globally turned off, then it won't crash. As I ran into the same problem with some other code that uses the NUTS sampler, I saw that the same workaround corrects that. However, disabling the GPU globally is not a great solutions, so the GPU problem needs to be fixed, and I don't know how more complex code can be managed with the if name workaround. The real solution is to change the pycm3/theano/whatever code so that it executes under both LINUX and Windows instead of only worrying about Linux and ignoring the most widely used OS from the company with the largest market capitalization in the world. |
The main problem is that the broken pipe error is not helpful for debugging. We have seen that the broken pipe is raised by the main process. When it tries to spawn the worker pool that should do the sampling, the workers raise exceptions before they have spawned and were created, so they don't manage to communicate their failure to the main process, and once the main process tries to communicate with the pool, it finds the communication pipe broken. The main issue that we are focusing to fix first is to capture the exceptions raised during the spawning of the worker pool. These exceptions are the keys to debug the sources of the failures. Some of them were caused by the lack of the if name main block, and others were caused because of functions that were not pickleable. Once we sort that out, we will be able to help better with whatever is happening because of the GPU. |
* Fix for #3225. Made Triangular `c` attribute be handled consistently with scipy.stats. Added test and updated example code. * Added a more detailed error message for Broken pipes. * Not a fix for #3140 * Fixed import of time. Trimmed the broken pipe exception handling. Added release notes. * Moved maintenance message to release notes of pymc3.7
Following commit 98fd63e, I ran again the script that kept failing under Windows. The script under test is: import pymc3 as pm
print(pm.__version__)
import theano.tensor as tt
import theano
print(theano.__version__)
import patsy
import pandas as pd
import numpy as np
SEED = 20180727
df = pd.read_csv(r'https://gist.githubusercontent.com/JackCaster/d74b36a66c172e80d1bdcee61d6975bf/raw/a2aab8690af7cebbe39ec5e5b425fe9a9b9a674d/data.csv',
dtype={'Y':'category'})
_, X = patsy.dmatrices('Y ~ 1 + X1 + X2', data=df)
# Number of categories
n_cat = df.Y.cat.categories.size
# Number of predictors
n_pred = X.shape[1]
with pm.Model() as model:
## `p`--quantity that I want to model--needs to have size (n_obs, n_cat).
## Because `X` has size (n_obs, n_pred), then `beta` needs to have size (n_pred, n_cat)
# priors for categories 1-2, excluding reference category 0 which is set to zero below (see DBDA2 p. 651 for explanation).
beta_ = pm.Normal('beta_', mu=0, sd=50, shape=(n_pred, n_cat-1))
# add prior values zero for reference category 0. (add a column)
beta = pm.Deterministic('beta', tt.concatenate([tt.zeros((n_pred, 1)), beta_], axis=1))
# The softmax function will squash the values in the range 0-1
p = tt.nnet.softmax(tt.dot(np.asarray(X), beta))
likelihood = pm.Categorical('likelihood', p=p, observed=df.Y.cat.codes.values)
trace = pm.sample(chains=2, cores=2)
print('DONE') Unfortunately, the sampling still fails with
The traceback, which points to a compilation error, was:
The temporary, compiled C code reports in the last line
Does this shed more light on this matter? EDIT: I also confirmed (as suggested by @elfwired) that setting |
This looks like a Theano problem, can you open an issue there? It looks very archaic to me. |
Done, let's see 🤞 EDIT: Just a note. When there is a compilation error, the traceback points to the temporary C code. At the end of that code, there is a line saying:
I tried to run the command post-mortem, but the temp folder |
I got similar error for this snippet in MCMC application
Error message:
Running on Windows 10 with latest packages of everything. |
**same thing for me (windows 10, spyder, installed through anaconda) Multiprocess sampling (4 chains in 4 jobs) File "", line 18, in File "C:\Users\butle\Anaconda3\lib\site-packages\pymc3\sampling.py", line 437, in sample File "C:\Users\butle\Anaconda3\lib\site-packages\pymc3\sampling.py", line 965, in _mp_sample File "C:\Users\butle\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py", line 361, in init File "C:\Users\butle\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py", line 361, in File "C:\Users\butle\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py", line 251, in init RuntimeError: The communication pipe between the main process and its spawned children is broken. |
Same for me. Windows 10. cores=1 works fine. Theano with cuda. I am just getting into pymc and was following along the code on Osvaldo Martin's book. import numpy as np
from scipy import stats
import pymc3 as pm
np.random.seed(123)
if __name__ == "__main__":
trials = 4
theta_real = 0.35
data = stats.bernoulli.rvs(p=theta_real, size=trials)
with pm.Model() as our_first_model:
theta = pm.Beta("theta", alpha=1., beta=1.)
y = pm.Bernoulli("y", p=theta, observed=data)
trace = pm.sample(1000, random_seed=123) The following is the trace Traceback (most recent call last):
File "test.py", line 16, in <module>
trace = pm.sample(1000, random_seed=123)
File "C:\Anaconda\lib\site-packages\pymc3\sampling.py", line 437, in sample
trace = _mp_sample(**sample_args)
File "C:\Anaconda\lib\site-packages\pymc3\sampling.py", line 965, in _mp_sample
chain, progressbar)
File "C:\Anaconda\lib\site-packages\pymc3\parallel_sampling.py", line 361, in __init__
for chain, seed, start in zip(range(chains), seeds, start_points)
File "C:\Anaconda\lib\site-packages\pymc3\parallel_sampling.py", line 361, in <listcomp>
for chain, seed, start in zip(range(chains), seeds, start_points)
File "C:\Anaconda\lib\site-packages\pymc3\parallel_sampling.py", line 251, in __init__
raise exc
RuntimeError: The communication pipe between the main process and its spawned children is broken.
In Windows OS, this usually means that the child process raised an exception while it was being spawned, before it was setup to communicate to the main process.
The exceptions raised by the child process while spawning cannot be caught or handled from the main process, and when running from an IPython or jupyter notebook interactive kernel, the child's exception and traceback appears to be lost.
A known way to see the child's error, and try to fix or handle it, is to run the problematic code as a batch script from a system's Command Prompt. The child's exception will be printed to the Command Promt's stderr, and it should be visible above this error and traceback.
Note that if running a jupyter notebook that was invoked from a Command Prompt, the child's exception should have been printed to the Command Prompt on which the notebook is running. |
I am facing the same issue on a Debian machine. In particular the default ones on Google Dataproc https://cloud.google.com/compute/docs/images#debian 1.5-debian. Setting one of:
allowed me to make the thing run but I suspect this is preventing me to scale things up. Indeed I noticed single chains appear to use just one cpu each. Is this a known issue for certain linux distributions? Is there a linux distro where multiprocessing is know to work well? |
Hi all, Just a note to say that as a new user running simple example code, I'm also seeing this problem in Spyder 4.2.0 on Windows using a fresh install of pymc3==3.8. Adding a |
Can you try with pymc3 3.10?
…On Thu, Jan 21, 2021 at 12:00 PM Charles Baynham ***@***.***> wrote:
Hi all,
Just a note to say that as a new user running simple example code, I'm
also seeing this problem in Spyder 4.2.0 on Windows using a fresh install
of pymc3==3.8. Adding a if __name__ == "__main__": guard sorts it out
(but of course removes a lot of Spyder's functionality).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3140 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFETGGWF26H3SOHUR6PPODS3ACMBANCNFSM4FOHA3HQ>
.
|
Also in 3.10 I'm afraid: I'm seeing the same duplicated messages followed by the
error. My code is
|
And same in 3.11 too |
I found the following work around for the issue described above. The work around involves specifying OpenMP environment variable so that the number of threads of the sampling process is reduced to 1 by setting the following at the beginning of the python script (after imports):
When specifying I 'd be interested to know if anyone has successfully managed to execute multiple chain processes, with multiple threads per chain process. |
Thanks for reporting back, @charlesbaynham can you test if this works for you too? |
@twiecki I should have said that I have managed to get this behaviour with # Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
arviz 0.11.4 pyhd8ed1ab_0 conda-forge
asttokens 2.0.5 pyhd8ed1ab_0 conda-forge
automateml 0.0.0 dev_0 <develop>
azure-core 1.22.0 pypi_0 pypi
azure-cosmos 4.2.0 pypi_0 pypi
azure-storage-blob 12.9.0 pypi_0 pypi
backcall 0.2.0 pyh9f0ad1d_0 conda-forge
backports 1.0 py_2 conda-forge
backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge
black 22.1.0 pyhd8ed1ab_0 conda-forge
brotli 1.0.9 h7f98852_5 conda-forge
brotli-bin 1.0.9 h7f98852_5 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.18.1 h7f8727e_0
ca-certificates 2021.10.8 ha878542_0 conda-forge
cachetools 5.0.0 pyhd8ed1ab_0 conda-forge
certifi 2021.10.8 py39hf3d152e_1 conda-forge
cffi 1.15.0 pypi_0 pypi
cftime 1.5.1.1 py39hce1f21e_0
charset-normalizer 2.0.11 pypi_0 pypi
click 8.0.3 py39hf3d152e_1 conda-forge
cryptography 36.0.1 pypi_0 pypi
curl 7.80.0 h7f8727e_0
cycler 0.11.0 pyhd8ed1ab_0 conda-forge
dataclasses 0.8 pyhc8e2a94_3 conda-forge
dbus 1.13.6 he372182_0 conda-forge
debugpy 1.5.1 py39h295c915_0
decorator 5.1.1 pyhd8ed1ab_0 conda-forge
dill 0.3.4 pyhd8ed1ab_0 conda-forge
entrypoints 0.4 pyhd8ed1ab_0 conda-forge
executing 0.8.2 pyhd8ed1ab_0 conda-forge
expat 2.2.10 h9c3ff4c_0 conda-forge
fastprogress 1.0.0 py_0 conda-forge
filelock 3.4.2 pyhd8ed1ab_1 conda-forge
fontconfig 2.13.1 h6c09931_0
fonttools 4.25.0 pyhd3eb1b0_0
freetype 2.10.4 h0708190_1 conda-forge
glib 2.69.1 h4ff587b_1
gst-plugins-base 1.14.0 hbbd80ab_1
gstreamer 1.14.0 h28cd5cc_2
hdf4 4.2.13 h3ca952b_2
hdf5 1.10.6 nompi_h6a2412b_1114 conda-forge
icu 58.2 hf484d3e_1000 conda-forge
idna 3.3 pypi_0 pypi
importlib-metadata 4.10.1 py39hf3d152e_0 conda-forge
importlib_metadata 4.10.1 hd8ed1ab_0 conda-forge
intel-openmp 2021.4.0 h06a4308_3561
ipykernel 6.9.0 py39hef51801_0 conda-forge
ipython 8.0.1 py39hf3d152e_0 conda-forge
isodate 0.6.1 pypi_0 pypi
jedi 0.18.1 py39hf3d152e_0 conda-forge
joblib 1.1.0 pyhd8ed1ab_0 conda-forge
jpeg 9d h7f8727e_0
jupyter_client 7.1.2 pyhd8ed1ab_0 conda-forge
jupyter_core 4.9.1 py39hf3d152e_1 conda-forge
kiwisolver 1.3.1 py39h2531618_0
krb5 1.19.2 hcc1bbae_0 conda-forge
lcms2 2.12 hddcbb42_0 conda-forge
ld_impl_linux-64 2.35.1 h7274673_9
libblas 3.9.0 11_linux64_openblas conda-forge
libbrotlicommon 1.0.9 h7f98852_5 conda-forge
libbrotlidec 1.0.9 h7f98852_5 conda-forge
libbrotlienc 1.0.9 h7f98852_5 conda-forge
libcblas 3.9.0 11_linux64_openblas conda-forge
libcurl 7.80.0 h0b77cf5_0
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5101ec6_17
libgfortran-ng 11.2.0 h69a702a_12 conda-forge
libgfortran5 11.2.0 h5c6108e_12 conda-forge
libgomp 9.3.0 h5101ec6_17
liblapack 3.9.0 11_linux64_openblas conda-forge
libnetcdf 4.8.1 h42ceab0_1
libnghttp2 1.46.0 hce63b2e_0
libopenblas 0.3.17 pthreads_h8fe5266_1 conda-forge
libpng 1.6.37 h21135ba_2 conda-forge
libsodium 1.0.18 h36c2ea0_1 conda-forge
libssh2 1.9.0 h1ba5d50_1
libstdcxx-ng 9.3.0 hd4cf53a_17
libtiff 4.2.0 h85742a9_0
libuuid 1.0.3 h7f8727e_2
libwebp-base 1.2.0 h27cfd23_0
libxcb 1.13 h7f98852_1003 conda-forge
libxml2 2.9.12 h03d6c58_0
libzip 1.8.0 h4de3113_0 conda-forge
lz4-c 1.9.3 h9c3ff4c_1 conda-forge
matplotlib 3.4.3 py39hf3d152e_2 conda-forge
matplotlib-base 3.4.3 py39hbbc1b5f_0
matplotlib-inline 0.1.3 pyhd8ed1ab_0 conda-forge
mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py39h3811e60_0 conda-forge
msrest 0.6.21 pypi_0 pypi
munkres 1.1.4 pyh9f0ad1d_0 conda-forge
mypy_extensions 0.4.3 py39hf3d152e_4 conda-forge
ncurses 6.3 h7f8727e_2
nest-asyncio 1.5.4 pyhd8ed1ab_0 conda-forge
netcdf4 1.5.7 py39ha0f2276_1
numpy 1.20.3 py39hdbf815f_1 conda-forge
oauthlib 3.2.0 pypi_0 pypi
olefile 0.46 pyh9f0ad1d_1 conda-forge
openssl 1.1.1m h7f8727e_0
packaging 21.3 pyhd8ed1ab_0 conda-forge
pandas 1.2.3 py39hde0f152_0 conda-forge
parso 0.8.3 pyhd8ed1ab_0 conda-forge
pathspec 0.9.0 pyhd8ed1ab_0 conda-forge
patsy 0.5.2 pyhd8ed1ab_0 conda-forge
pcre 8.45 h9c3ff4c_0 conda-forge
pexpect 4.8.0 pyh9f0ad1d_2 conda-forge
pickleshare 0.7.5 py_1003 conda-forge
pillow 7.2.0 py39h6f3857e_2 conda-forge
pip 21.2.4 py39h06a4308_0
platformdirs 2.4.1 pyhd8ed1ab_1 conda-forge
prompt-toolkit 3.0.26 pyha770c72_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge
pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge
pycparser 2.21 pypi_0 pypi
pydantic 1.9.0 pypi_0 pypi
pygments 2.11.2 pyhd8ed1ab_0 conda-forge
pymc3 3.11.4 py39hb070fc8_0
pyparsing 3.0.7 pyhd8ed1ab_0 conda-forge
pyqt 5.9.2 py39h2531618_6
python 3.9.7 h12debd9_1
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python_abi 3.9 2_cp39 conda-forge
pytz 2021.3 pyhd8ed1ab_0 conda-forge
pyzmq 19.0.2 py39hb69f2a1_2 conda-forge
qt 5.9.7 h5867ecd_1
readline 8.1.2 h7f8727e_1
requests 2.27.1 pypi_0 pypi
requests-oauthlib 1.3.1 pypi_0 pypi
scikit-learn 1.0.2 pypi_0 pypi
scipy 1.5.3 py39hee8e79c_0 conda-forge
seaborn 0.11.2 hd8ed1ab_0 conda-forge
seaborn-base 0.11.2 pyhd8ed1ab_0 conda-forge
semver 2.13.0 pyh9f0ad1d_0 conda-forge
setuptools 58.0.4 py39h06a4308_0
sip 4.19.13 py39h295c915_0
six 1.16.0 pyh6c4a22f_0 conda-forge
smartreturntools 0.1.4 dev_0 <develop>
sqlite 3.37.2 hc218d9a_0
stack_data 0.1.4 pyhd8ed1ab_0 conda-forge
statsmodels 0.13.0 pypi_0 pypi
theano-pymc 1.1.2 py39h51133e4_0
threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge
tk 8.6.11 h1ccaba5_0
tomli 2.0.1 pyhd8ed1ab_0 conda-forge
tornado 6.1 py39h3811e60_1 conda-forge
traitlets 5.1.1 pyhd8ed1ab_0 conda-forge
typed-ast 1.4.3 py39h3811e60_0 conda-forge
typing-extensions 3.10.0.2 hd8ed1ab_0 conda-forge
typing_extensions 3.10.0.2 pyha770c72_0 conda-forge
tzdata 2021e hda174b7_0
urllib3 1.26.8 pypi_0 pypi
wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge
wheel 0.37.1 pyhd3eb1b0_0
xarray 0.21.1 pyhd8ed1ab_0 conda-forge
xorg-libxau 1.0.9 h7f98852_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xz 5.2.5 h7b6447c_0
zeromq 4.3.4 h9c3ff4c_0 conda-forge
zipp 3.7.0 pyhd8ed1ab_1 conda-forge
zlib 1.2.11 h7f8727e_4
zstd 1.4.9 ha95c52a_0 conda-forge These examples were executed in a python script (not in a notebook), in a TMUX session. The script was encapsulated as follows: import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib import cm
from pathlib import Path
import os
from sklearn.model_selection import train_test_split
import patsy
import pymc3 as pm
import pickle
#os.environ['MKL_THREADING_LAYER'] = 'sequential'
os.environ['OMP_NUM_THREADS'] = '1'
if __name__ == '__main__':
# Bayesian Multiple Regression
sample_params = {
'draws' : 10000,
'chains' : 4,
'tune' : 4000,
'target_accept' : 0.87,
'random_seed' : 123456,
'init' : 'auto',
'cores' : 4
}
# Model inference
run_posterior_sampling = True
n_inference_samples = 5000
## DATA_CONDITIONING
X_full_norm_dwsmp = get_data_from_pkl(pkl_file) # This is a simplication for my data processing steps, but the data arrives as a numpy array of regressors (cols) and samples (rows).
## BAYESIAN MODELLING
# Construct the wavelength column names and the PATSY string model formula
model_formula_str="y ~ "
columns = []
for i in np.arange(0,X_full_norm_dwsmp.shape[1],1):
columns.append("w"+str(i))
if i == X_full_norm_dwsmp.shape[1] - 1:
model_formula_str = model_formula_str + "w{}".format(i)
else:
model_formula_str = model_formula_str + "w{} + ".format(i)
# Make the data dataframe with the wavelengths and target value.
X_df = pd.DataFrame(X_full_norm_dwsmp, columns=[columns])
Y_df = pd.DataFrame(y, columns=['y'])
data_df = Y_df.join(X_df, how='left')
data_df.columns = ['y'] + columns
print('Model formula:: {}'.format(model_formula_str))
print('[INFO] - PATSY model configuration')
# Define model formula.
formula = model_formula_str
# Create features.
y, x = patsy.dmatrices(formula_like=formula, data=data_df)
y = np.asarray(y).flatten()
labels = x.design_info.column_names
x = np.asarray(x)
print('[INFO] - Train Test Splitting Data')
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.9, random_state=123456)
print('[INFO] - Starting the modelling...', end='')
with pm.Model() as model:
# Set data container.
print('creating data container...', end='')
data = pm.Data("data", x_train)
# Define GLM family.
family = pm.glm.families.Normal()
# Set priors.
#priors = {
# "Intercept": pm.Normal.dist(mu=0, sd=10),
# "x1": pm.Normal.dist(mu=0, sd=10),
# "x2": pm.Normal.dist(mu=0, sd=10),
# "x1:x2": pm.Normal.dist(mu=0, sd=10),
#}
# Specify model.
print('Building the model...', end='')
pm.glm.GLM(y=y_train, x=data, family=family, intercept=False, labels=labels)# , priors=priors)
print('Complete.')
print('[INFO] - Sampling the model...')
# Configure sampler.
trace = pm.sample(**sample_params)
trained_model = {
'data' : data,
'model' : model,
'trace' : trace
}
if run_posterior_sampling:
print('[INFO] - Running inference')
# Update data reference.
pm.set_data({"data": x_test}, model=model)
# Generate posterior samples.
trained_model['ppc_test'] = pm.sample_posterior_predictive(trace, model=model, samples=n_inference_samples)
# Update to pickle file reflect ppc results included
ppc_pkl_file = os.path.join(str(Path(output_results_file).absolute().parent), Path(output_results_file).parts[1].split('.')[0]+'_ppc.pkl')
with open(output_results_file, 'wb') as outfile:
pickle.dump(trained_model, outfile) |
@twiecki , I also have another question. What's the rational behind limiting the number of cores ( I am not very well acquainted with the code, but if the sampling of the chains is a separate MCMC process that is independent of each other, then running more than 4 chains in parallel would be still be reasonable? |
@ecm200 You can certainly run more but usually 4x1000 posterior samples is enough for most convergence diagnostics. |
@twiecki thanks very much for feedback, very much appreciated. Any idea how the sampler scales with threads? My machine is set by default (or whether that is variable set in the Theano backend I don't know) to 4 threads per sampling process. So, to take advantage of the compute power we have, I was considering that perhaps an increased threaded sequential chain sampling setup would be more efficient? Ideally of course, it would be nice if the multiprocessing sampler played nicely with multi-threaded processes in parallel, with the obvious caveat of making sure that one doesn't oversubscribe the discrete compute cores of the system. |
I would like to further motivate the above question. Suppose I would like to apply MCMC methods to an existing type of model. (I.e. I use some black-box likelihood function as a theano/aesara op.) Suppose further that the existing code for this model can also be parallelized in certain cases. Then — is there any best-practice for allocating the parallel compute resources between sampling, openmp computations, and this black-box likelihood? I understand that the likelihood complication is somewhat outside the scope of the pymc-dev's influence, and some empirical testing might be the way to go. Perhaps it's also a pretty rare case. I just mean to say though: in general, it could be very nice to have in the documentation some deeper explanation/exploration of how parallelization occurs in pymc, and best practices for tuning it. |
The general logic is that you only need to parallelize at the highest level if you can max out at that level. But I agree that some more info would be helpful, can you help drafting something? |
I have a similar issue with, I have tried
|
Can you try upgrading to pymc 4.0.0b3? |
My initial script was an attempt to use bambi which is built on top of PyMC3. So I tried another example, and not even the import statement
|
@JackCaster You can try again with pymc 4.0.0b3. |
I also reproduced the same issue. Works in a Jupyter notebook but not in a script. This is on OSX. |
Looks like this thread has accumulated a bunch of different issues that are unrelated to the original one from four years ago. |
Since PR #3011 I have been having troubles sampling multiple chains with multiple cores. In Jupyter notebook I get random kernel shutdowns and therefore I haven't managed to pinpoint what is the problem (it seems that the more complicated the model is, the higher the crash rate). However, I found a systematic issue when using the python interpreter only (not the Jupyter kernel): if I sample more than one chain using more than 1 core (say, 2 chains and 2 cores) Python crashes. Sampling multiple chains with 1 core, or 1 chain with multiple cores is fine. On a Jupyter notebook I do not encounter any problems.
The minimal example is attached (please run it as a script, and not on a Jupyter kernel):
Running with
chains=2
andcores=2
throws the error:The interesting thing is that the print statements in the script are duplicated (which does not happen when
chains=2
andcores=1
, orchains=1
andcores=2
)I am on master on both PyMC3 and Theano.
The text was updated successfully, but these errors were encountered: