Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation error with multiprocessing under Windows #6678

Open
JackCaster opened this issue Jan 14, 2019 · 9 comments
Open

Compilation error with multiprocessing under Windows #6678

JackCaster opened this issue Jan 14, 2019 · 9 comments

Comments

@JackCaster
Copy link

As outlined in pymc-devs/pymc#3140, I am having troubles to use PyMC3 under Windows with multiprocessing. I was suggested by @twiecki to open an issue also here, as the problem seems to be related to Theano---backend of PyMC3.

In short:

  • the problem is limited to Windows;
  • the graph compilation succeeds either by disabling multiprocessing or by setting theano.config.mode = 'FAST_COMPILE';
  • the problem happens both when using Jupyter and when using the Python script only.

The traceback points to a compilation error:

You can find the C code in this temporary file: C:\Users\moran\AppData\Local\Temp\theano_compilation_error_jfoh7gz2
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\compile\function_module.py", line 1082, in _constructor_Function
    f = maker.create(input_storage, trustme=True)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\compile\function_module.py", line 1715, in create
    input_storage=input_storage_lists, storage_map=storage_map)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\link.py", line 699, in make_thunk
    storage_map=storage_map)[:3]
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\compile\debugmode.py", line 1678, in make_all
    no_recycling)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\op.py", line 858, in make_c_thunk
    output_storage=node_output_storage)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cc.py", line 1217, in make_thunk
    keep_lock=keep_lock)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cc.py", line 1157, in __compile__
    keep_lock=keep_lock)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cc.py", line 1620, in cthunk_factory
    key=key, lnk=self, keep_lock=keep_lock)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cmodule.py", line 1181, in module_from_key
    module = lnk.compile_cmodule(location)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cc.py", line 1523, in compile_cmodule
    preargs=preargs)
  File "C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\cmodule.py", line 2391, in compile_str
    (status, compile_stderr.replace('\n', '. ')))
Exception: ('Compilation failed (return status=3): ', '[Elemwise{Composite{((-(i0 * i1)) / i2)}}(<TensorType(float64, matrix)>, <TensorType(float64, matrix)>, <TensorType(float64, row)>)]')
forrtl: error (200): program aborting due to control-C event
Image              PC                Routine            Line        Source
libifcoremd.dll    00007FFE107494C4  Unknown               Unknown  Unknown
KERNELBASE.dll     00007FFE79672763  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFE7ABD7E94  Unknown               Unknown  Unknown
ntdll.dll          00007FFE7D2CA251  Unknown               Unknown  Unknown
forrtl: error (200): program aborting due to control-C event
Image              PC                Routine            Line        Source
libifcoremd.dll    00007FFE107494C4  Unknown               Unknown  Unknown
KERNELBASE.dll     00007FFE79672763  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFE7ABD7E94  Unknown               Unknown  Unknown
ntdll.dll          00007FFE7D2CA251  Unknown               Unknown  Unknown

The temporary, compiled C code reports in the last line:

Problem occurred during compilation with the command line below:
"c:\miniconda3\envs\intro_to_pymc3\library\mingw-w64\bin\g++.exe" -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -march=haswell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-clwb -mno-pcommit -mno-mwaitx --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=haswell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -DMS_WIN64 -I"C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\numpy\core\include" -I"C:\Miniconda3\envs\intro_to_pymc3\include" -I"C:\Miniconda3\envs\intro_to_pymc3\lib\site-packages\theano\gof\c_code" -L"C:\Miniconda3\envs\intro_to_pymc3\libs" -L"C:\Miniconda3\envs\intro_to_pymc3" -o "C:\Users\moran\AppData\Local\Theano\compiledir_Windows-10-10.0.17763-SP0-Intel64_Family_6_Model_69_Stepping_1_GenuineIntel-3.6.6-64\tmp2h9a4y2s\mfbb242647a0afa02ea639375cc1adad8acf3db0451156e5a8136d7bd222ef702.pyd" "C:\Users\moran\AppData\Local\Theano\compiledir_Windows-10-10.0.17763-SP0-Intel64_Family_6_Model_69_Stepping_1_GenuineIntel-3.6.6-64\tmp2h9a4y2s\mod.cpp" -lpython36

I also noticed that the tmp folder ...\tmp2h9a4y2s\..., which is references in the compilation command, does not exist.

I have updated both PyMC3 and Theano to master.
Please, let me know if you need further info to investigate the issue.

@nouiz
Copy link
Member

nouiz commented Jan 16, 2019 via email

@JackCaster
Copy link
Author

Did you try to run once with just one thread to make sure that Theano compile all files he need?

Thank you for the answer. Yes, the compilation is successful with 1 thread only. I will ping @aseyboldt and @lucianopaz (PyMC3 developers) as they may have a better understanding on what's going on.

@lucianopaz
Copy link
Contributor

lucianopaz commented Jan 17, 2019

From my experience with errors like these with pymc3 multiprocessing on windows, the problem is that something usually goes wrong during unpicking.

Briefly, pymc3 tries to create a pool of worker processes using multiprocessing. On windows, these are created with spawn. The original process sends two packs of data to open the spawn. These are serialized with ForkingPickle. Usually what happens is that the receiver's side ForkingPickle fails to unpickle the second serialized message (that contains serialized theano graphs, ops, variables, etc), and the spawn dies before having finished its initialization. I don't understand why the unpickling should recompile anything.

@lucianopaz
Copy link
Contributor

I'm linking a discourse thread where a pymc3 user experienced a similar compilation error.

@nouiz, both issues seem to happen because a multiprocessing.Process is spawned but fail to unpickle an already theano.functioned expression. Would it help in any way to compile the expressions in the spawned process instead of trying to transfer the theano.function to the spawn?

@lucianopaz
Copy link
Contributor

@JackCaster, I just made a branch that does not move compiled theano.function's around. Could you check it out and see if you still get the error? I can't reproduce the error you get on my machine, so I cannot test if it fixes the problem or not.

@JackCaster
Copy link
Author

@JackCaster, I just made a branch that does not move compiled theano.function's around. Could you check it out and see if you still get the error? I can't reproduce the error you get on my machine, so I cannot test if it fixes the problem or not.

Sorry, missed the notification. I will try tomorrow/Monday.

@JackCaster
Copy link
Author

Hi @lucianopaz. I ran the code

import numpy as np
import pandas as pd
import patsy
import pymc3 as pm
import theano
import theano.tensor as tt

SEED = 20180727

df = pd.read_csv(r'https://gist.githubusercontent.com/JackCaster/d74b36a66c172e80d1bdcee61d6975bf/raw/a2aab8690af7cebbe39ec5e5b425fe9a9b9a674d/data.csv', 
                 dtype={'Y':'category'})

_, X = patsy.dmatrices('Y ~ 1 + X1 + X2', data=df)

n_cat = df.Y.cat.categories.size
n_pred = X.shape[1]

if __name__ == "__main__":
    with pm.Model() as model:
        beta_ = pm.Normal('beta_', mu=0, sd=50, shape=(n_pred, n_cat-1))
        beta = pm.Deterministic('beta', tt.concatenate([tt.zeros((n_pred, 1)), beta_], axis=1))
        p = tt.nnet.softmax(tt.dot(np.asarray(X), beta))
        likelihood = pm.Categorical('likelihood', p=p, observed=df.Y.cat.codes.values)
        trace = pm.sample(draws=1000, tune=1000, chains=2, cores=1)

in a fresh environment defined as:

name: test_theano

channels:
 - defaults
 - msys2 
 - conda-forge

dependencies:
 - python=3.6
 - git
 - libpython
 - m2w64-toolchain
 - mkl-service
 - cython
 - matplotlib
 - seaborn
 - scipy 
 - patsy 
 - pandas
 - theano
 - joblib
 - tqdm
 - h5py
 - pip:
   - git+https://github.com/lucianopaz/pymc3.git@differ_step_compile#egg=pymc3

It failed with cores=2 and the traceback was

You can find the C code in this temporary file: C:\Users\moran\AppData\Local\Temp\theano_compilation_error_ausi2ns4
ERROR (theano.gof.opt): Optimization failure due to: constant_folding
ERROR (theano.gof.opt): node: MakeVector{dtype='int64'}(TensorConstant{3}, TensorConstant{3})
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
  File "C:\Miniconda3\envs\test_theano\lib\site-packages\theano\gof\opt.py", line 2034, in process_node
    replacements = lopt.transform(node)
  File "C:\Miniconda3\envs\test_theano\lib\site-packages\theano\tensor\opt.py", line 6516, in constant_folding
    no_recycling=[], impl=impl)
  File "C:\Miniconda3\envs\test_theano\lib\site-packages\theano\gof\op.py", line 955, in make_thunk
    no_recycling)
  File "C:\Miniconda3\envs\test_theano\lib\site-packages\theano\gof\op.py", line 858, in make_c_thunk
    output_storage=node_output_storage)
  File "C:\Miniconda3\envs\test_theano\lib\site-packages\theano\gof\cc.py", line 1217, in make_thunk
    keep_lock=keep_lock)
  File "C:\Miniconda3\envs\test_theano\lib\site-packages\theano\gof\cc.py", line 1157, in __compile__
    keep_lock=keep_lock)
  File "C:\Miniconda3\envs\test_theano\lib\site-packages\theano\gof\cc.py", line 1620, in cthunk_factory
    key=key, lnk=self, keep_lock=keep_lock)
  File "C:\Miniconda3\envs\test_theano\lib\site-packages\theano\gof\cmodule.py", line 1181, in module_from_key
    module = lnk.compile_cmodule(location)
  File "C:\Miniconda3\envs\test_theano\lib\site-packages\theano\gof\cc.py", line 1523, in compile_cmodule
    preargs=preargs)
  File "C:\Miniconda3\envs\test_theano\lib\site-packages\theano\gof\cmodule.py", line 2391, in compile_str
    (status, compile_stderr.replace('\n', '. ')))
Exception: ('Compilation failed (return status=3): ', "[MakeVector{dtype='int64'}(TensorConstant{3}, TensorConstant{3})]")

forrtl: error (200): program aborting due to control-C event
Image              PC                Routine            Line        Source
libifcoremd.dll    00007FFD9A2E3B58  Unknown               Unknown  Unknown
KERNELBASE.dll     00007FFE1A0C2763  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFE1B847E94  Unknown               Unknown  Unknown
ntdll.dll          00007FFE1D39A251  Unknown               Unknown  Unknown
forrtl: error (200): program aborting due to control-C event
Image              PC                Routine            Line        Source
libifcoremd.dll    00007FFD9A2E3B58  Unknown               Unknown  Unknown
KERNELBASE.dll     00007FFE1A0C2763  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFE1B847E94  Unknown               Unknown  Unknown
ntdll.dll          00007FFE1D39A251  Unknown               Unknown  Unknown
forrtl: error (200): program aborting due to control-C event
Image              PC                Routine            Line        Source
libifcoremd.dll    00007FFD9A2E3B58  Unknown               Unknown  Unknown
KERNELBASE.dll     00007FFE1A0C2763  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFE1B847E94  Unknown               Unknown  Unknown
ntdll.dll          00007FFE1D39A251  Unknown               Unknown  Unknown

@camantis
Copy link

Any update on this? I'm having the exact same issues with the same error on windows using pymc3 in jupyter notebook and running from the console.

theano 1.0.4+23.g630974a7b

@lucianopaz
Copy link
Contributor

@camantis, sorry but no update. Could I ask you some details of your installation?

  • Did you install pymc3 into a conda environment? If yes, was it the root conda environment?
  • Did you also install mkl along with theano?
  • Do you have other c compilers installed outside of the environment where you installed pymc3?
  • Do you have other blas libraries installed other than mkl outside the environment where you installed pymc3?

I'm asking all of these questions because we were never able to reliably reproduce this error ourselves. We only have guesses as to what is causing what you are experiencing. The current guess is that python subprocesses don't conda activate into the environment of the root process, they just copy over the environment variables. Maybe some encapsulation is being broken with the spawn, and the incorrect c compiler/headers/libraries are being used, and that breaks the unpickling and compilation of the model.
The only thing that we can recommend you to do for now, is to sample with a single core

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants