Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New DimShuffle C-code fails on Windows #707

Closed
ricardoV94 opened this issue Dec 23, 2021 · 8 comments · Fixed by #762
Closed

New DimShuffle C-code fails on Windows #707

ricardoV94 opened this issue Dec 23, 2021 · 8 comments · Fixed by #762
Labels
bug Something isn't working C-backend help wanted Extra attention is needed Windows

Comments

@ricardoV94
Copy link
Contributor

This was first seen in pymc-devs/pymc#5279

The following tests are failing on my Windows machine:

  • test_elemwise.py::TestDimShuffle::test_infer_shape
  • test_elemwise.py::TestDimShuffle::test_too_big_rank
  • test_elemwise.py::TestDimShuffle::test_c_views

The first two tests which precede #701 pass before the relevant commit: e593b0a and fail after.

For sanity check, all tests in test_elemwise.py::TestBroadcast work fine in main.

Traceback

When running test_elemwise.py::TestDimShuffle::test_c_views:

C:\Users\ricar\miniconda3\envs\aesara-dev-custom\python.exe "C:\Program Files\JetBrains\PyCharm Community Edition 2020.1.1\plugins\python-ce\helpers\pycharm\_jb_pytest_runner.py" --target test_elemwise.py::TestDimShuffle.test_c_views
Launching pytest with arguments test_elemwise.py::TestDimShuffle::test_c_views in C:\Users\ricar\Documents\aesara\tests\tensor

============================= test session starts =============================
platform win32 -- Python 3.9.9, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- C:\Users\ricar\miniconda3\envs\aesara-dev-custom\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\ricar\Documents\aesara, configfile: setup.cfg
collecting ... collected 1 item

test_elemwise.py::TestDimShuffle::test_c_views Windows fatal exception: code 0xc0000374

Current thread 0x00002704 (most recent call first):
  File "C:\Users\ricar\Documents\aesara\aesara\link\c\basic.py", line 1747 in __call__
  File "C:\Users\ricar\Documents\aesara\tests\tensor\test_elemwise.py", line 135 in test_c_views
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\_pytest\python.py", line 183 in pytest_pyfunc_call
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\_pytest\python.py", line 1641 in runtest
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\_pytest\runner.py", line 162 in pytest_runtest_call
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\_pytest\runner.py", line 255 in <lambda>
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\_pytest\runner.py", line 311 in from_call
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\_pytest\runner.py", line 254 in call_runtest_hook
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\_pytest\runner.py", line 215 in call_and_report
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\_pytest\runner.py", line 126 in runtestprotocol
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\_pytest\runner.py", line 109 in pytest_runtest_protocol
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\_pytest\main.py", line 348 in pytest_runtestloop
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\_pytest\main.py", line 323 in _main
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\_pytest\main.py", line 269 in wrap_session
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\_pytest\main.py", line 316 in pytest_cmdline_main
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
  File "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\lib\site-packages\_pytest\config\__init__.py", line 162 in main
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.1.1\plugins\python-ce\helpers\pycharm\_jb_pytest_runner.py", line 43 in <module>

Process finished with exit code -1073740940 (0xC0000374)

Versions and main components

  • Aesara version: main
  • Aesara config (python -c "import aesara; print(aesara.config)")
  • Python version: 3.9.9
  • Operating system: Windows 10
  • How did you install Aesara: used the following conda enviroment.yml:
name: aesara-dev-custom
channels:
- conda-forge
- defaults
dependencies:
 # base dependencies (see install guide for Windows)
- aesara=2.3.3
- pip
- python=3.9
# Extra stuff for dev, testing and docs build
- ipython
- pre-commit
- pytest

And then removed aesara with conda remove --force aesara to use the local branch

@ricardoV94 ricardoV94 added bug Something isn't working Windows labels Dec 23, 2021
@ricardoV94
Copy link
Contributor Author

Quick Google suggests it may be a reference count issue: https://stackoverflow.com/a/64960890

@brandonwillard
Copy link
Member

What's the python -c "import aesara; print(aesara.config) output?

@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Dec 28, 2021

What's the python -c "import aesara; print(aesara.config) output?

local
floatX ({'float64', 'float16', 'float32'}) 
    Doc:  Default floating-point precision for python casts.

Note: float16 support is experimental, use at your own risk.
    Value:  float64

warn_float64 ({'warn', 'raise', 'pdb', 'ignore'}) 
    Doc:  Do an action when a tensor variable with float64 dtype is created. They can't be run on the GPU with the current(old) gpu back-end and are slow with gamer GPUs.
    Value:  ignore

pickle_test_value (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F80210310>>) 
    Doc:  Dump test values while pickling model. If True, test values will be dumped with model.
    Value:  True

cast_policy ({'custom', 'numpy+floatX'}) 
    Doc:  Rules for implicit type casting
    Value:  custom

deterministic ({'more', 'default'}) 
    Doc:  If `more`, sometimes we will select some implementation that are more deterministic, but slower. In particular, on the GPU, we will avoid using AtomicAdd. Sometimes we will still use non-deterministic implementation, e.g. when we do not have a GPU implementation that is deterministic. Also see the dnn.conv.algo* flags to cover more cases.
    Value:  default

device (cpu, opencl*, cuda*) 
    Doc:  Default device for computations. If cuda* or opencl*, change thedefault to try to move computation to the GPU. Do not use upper caseletters, only lower case even if NVIDIA uses capital letters. 'gpu' means let the driver select the gpu (needed for gpu in exclusive mode). 'gpuX' mean use the gpu number X.
    Value:  cpu

init_gpu_device (, opencl*, cuda*) 
    Doc:  Initialize the gpu device to use, works only if device=cpu. Unlike 'device', setting this option will NOT move computations, nor shared variables, to the specified GPU. It can be used to run GPU-specific tests on a particular GPU.
    Value:  

force_device (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F834D5CD0>>) 
    Doc:  Raise an error if we can't use the specified device
    Value:  False

conv__assert_shape (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F834D5580>>) 
    Doc:  If True, AbstractConv* ops will verify that user-provided shapes match the runtime shapes (debugging option, may slow down compilation)
    Value:  False

print_global_stats (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F834D5940>>) 
    Doc:  Print some global statistics (time spent) at the end
    Value:  False

<aesara.configparser.ContextsParam object at 0x0000020F834D5A60>
    Doc:  
        Context map for multi-gpu operation. Format is a
        semicolon-separated list of names and device names in the
        'name->dev_name' format. An example that would map name 'test' to
        device 'cuda0' and name 'test2' to device 'opencl0:0' follows:
        "test->cuda0;test2->opencl0:0".

        Invalid context names are 'cpu', 'cuda*' and 'opencl*'
        
    Value:  

print_active_device (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F834D5A00>>) 
    Doc:  Print active device at when the GPU device is initialized.
    Value:  True

gpuarray__preallocate (<class 'float'>) 
    Doc:  If negative it disables the allocation cache. If
                 between 0 and 1 it enables the allocation cache and
                 preallocates that fraction of the total GPU memory.  If 1
                 or greater it will preallocate that amount of memory (in
                 megabytes).
    Value:  0.0

gpuarray__sched ({'single', 'multi', 'default'}) 
    Doc:  The sched parameter passed for context creation to pygpu.
                    With CUDA, using "multi" is equivalent to using the parameter
                    cudaDeviceScheduleBlockingSync. This is useful to lower the
                    CPU overhead when waiting for GPU. One user found that it
                    speeds up his other processes that was doing data augmentation.
                 
    Value:  default

gpuarray__single_stream (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F834F2B20>>) 
    Doc:  
                 If your computations are mostly lots of small elements,
                 using single-stream will avoid the synchronization
                 overhead and usually be faster.  For larger elements it
                 does not make a difference yet.  In the future when true
                 multi-stream is enabled in libgpuarray, this may change.
                 If you want to make sure to have optimal performance,
                 check both options.
                 
    Value:  True

cuda__root (<class 'str'>) 
    Doc:  Location of the cuda installation
    Value:  

cuda__include_path (<class 'str'>) 
    Doc:  Location of the cuda includes
    Value:  

assert_no_cpu_op ({'warn', 'raise', 'pdb', 'ignore'}) 
    Doc:  Raise an error/warning if there is a CPU op in the computational graph.
    Value:  ignore

unpickle_function (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F834F2C40>>) 
    Doc:  Replace unpickled Aesara functions with None. This is useful to unpickle old graphs that pickled them when it shouldn't
    Value:  True

reoptimize_unpickled_function (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F834F2CA0>>) 
    Doc:  Re-optimize the graph when an Aesara function is unpickled from the disk.
    Value:  False

dnn__conv__algo_fwd ({'winograd', 'fft_tiling', 'none', 'guess_on_shape_change', 'small', 'time_on_shape_change', 'large', 'guess_once', 'winograd_non_fused', 'time_once', 'fft'}) 
    Doc:  Default implementation to use for cuDNN forward convolution.
    Value:  small

dnn__conv__algo_bwd_data ({'winograd', 'fft_tiling', 'deterministic', 'none', 'guess_on_shape_change', 'time_on_shape_change', 'guess_once', 'winograd_non_fused', 'time_once', 'fft'}) 
    Doc:  Default implementation to use for cuDNN backward convolution to get the gradients of the convolution with regard to the inputs.
    Value:  none

dnn__conv__algo_bwd_filter ({'fft_tiling', 'deterministic', 'none', 'small', 'guess_on_shape_change', 'time_on_shape_change', 'guess_once', 'winograd_non_fused', 'time_once', 'fft'}) 
    Doc:  Default implementation to use for cuDNN backward convolution to get the gradients of the convolution with regard to the filters.
    Value:  none

dnn__conv__precision ({'float32', 'float16', 'float64', 'as_input_f32', 'as_input'}) 
    Doc:  Default data precision to use for the computation in cuDNN convolutions (defaults to the same dtype as the inputs of the convolutions, or float32 if inputs are float16).
    Value:  as_input_f32

dnn__base_path (<class 'str'>) 
    Doc:  Install location of cuDNN.
    Value:  

dnn__include_path (<class 'str'>) 
    Doc:  Location of the cudnn header
    Value:  

dnn__library_path (<class 'str'>) 
    Doc:  Location of the cudnn link library.
    Value:  

dnn__bin_path (<class 'str'>) 
    Doc:  Location of the cuDNN load library (on non-windows platforms, this is the same as dnn__library_path)
    Value:  

dnn__enabled ({'no_check', 'False', 'True', 'auto'}) 
    Doc:  'auto', use cuDNN if available, but silently fall back to not using it if not present. If True and cuDNN can not be used, raise an error. If False, disable cudnn even if present. If no_check, assume present and the version between header and library match (so less compilation at context init)
    Value:  auto

magma__include_path (<class 'str'>) 
    Doc:  Location of the magma header
    Value:  

magma__library_path (<class 'str'>) 
    Doc:  Location of the magma library
    Value:  

magma__enabled (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503070>>) 
    Doc:   If True, use magma for matrix computation. If False, disable magma
    Value:  False

<aesara.configparser.ConfigParam object at 0x0000020F83503040>
    Doc:  Default compilation mode
    Value:  Mode

cxx (<class 'str'>) 
    Doc:  The C++ compiler to use. Currently only g++ is supported, but supporting additional compilers should not be too difficult. If it is empty, no C++ code is compiled.
    Value:  "C:\Users\ricar\miniconda3\envs\aesara-dev-custom\Library\mingw-w64\bin\g++.exe"

linker ({'c', 'py', 'cvm', 'c|py_nogc', 'cvm_nogc', 'vm', 'vm_nogc', 'c|py'}) 
    Doc:  Default linker used if the aesara flags mode is Mode
    Value:  cvm

allow_gc (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F835033D0>>) 
    Doc:  Do we default to delete intermediate results during Aesara function calls? Doing so lowers the memory requirement, but asks that we reallocate memory at the next function call. This is implemented for the default linker, but may not work for all linkers.
    Value:  True

optimizer ({'o2', 'o4', 'o3', 'merge', 'None', 'fast_compile', 'unsafe', 'o1', 'fast_run'}) 
    Doc:  Default optimizer. If not None, will use this optimizer with the Mode
    Value:  o4

optimizer_verbose (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503130>>) 
    Doc:  If True, we print all optimization being applied
    Value:  False

on_opt_error ({'warn', 'raise', 'pdb', 'ignore'}) 
    Doc:  What to do when an optimization crashes: warn and skip it, raise the exception, or fall into the pdb debugger.
    Value:  warn

nocleanup (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503280>>) 
    Doc:  Suppress the deletion of code files that did not compile cleanly
    Value:  False

on_unused_input ({'warn', 'raise', 'ignore'}) 
    Doc:  What to do if a variable in the 'inputs' list of  aesara.function() is not used in the graph.
    Value:  raise

gcc__cxxflags (<class 'str'>) 
    Doc:  Extra compiler flags for gcc
    Value:  

cmodule__warn_no_version (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F835033A0>>) 
    Doc:  If True, will print a warning when compiling one or more Op with C code that can't be cached because there is no c_code_cache_version() function associated to at least one of those Ops.
    Value:  False

cmodule__remove_gxx_opt (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503220>>) 
    Doc:  If True, will remove the -O* parameter passed to g++.This is useful to debug in gdb modules compiled by Aesara.The parameter -g is passed by default to g++
    Value:  False

cmodule__compilation_warning (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503340>>) 
    Doc:  If True, will print compilation warnings.
    Value:  False

cmodule__preload_cache (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F835032E0>>) 
    Doc:  If set to True, will preload the C module cache at import time
    Value:  False

cmodule__age_thresh_use (<class 'int'>) 
    Doc:  In seconds. The time after which Aesara won't reuse a compile c module.
    Value:  2073600

cmodule__debug (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503400>>) 
    Doc:  If True, define a DEBUG macro (if not exists) for any compiled C code.
    Value:  False

compile__wait (<class 'int'>) 
    Doc:  Time to wait before retrying to acquire the compile lock.
    Value:  5

compile__timeout (<class 'int'>) 
    Doc:  In seconds, time that a process will wait before deciding to
    override an existing lock. An override only happens when the existing
    lock is held by the same owner *and* has not been 'refreshed' by this
    owner for more than this period. Refreshes are done every half timeout
    period for running processes.
    Value:  120

ctc__root (<class 'str'>) 
    Doc:  Directory which contains the root of Baidu CTC library. It is assumed         that the compiled library is either inside the build, lib or lib64         subdirectory, and the header inside the include directory.
    Value:  

tensor__cmp_sloppy (<class 'int'>) 
    Doc:  Relax aesara.tensor.math._allclose (0) not at all, (1) a bit, (2) more
    Value:  0

tensor__local_elemwise_fusion (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F835035B0>>) 
    Doc:  Enable or not in fast_run mode(fast_run optimization) the elemwise fusion optimization
    Value:  True

lib__amblibm (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503670>>) 
    Doc:  Use amd's amdlibm numerical library
    Value:  False

tensor__insert_inplace_optimizer_validate_nb (<class 'int'>) 
    Doc:  -1: auto, if graph have less then 500 nodes 1, else 10
    Value:  -1

traceback__limit (<class 'int'>) 
    Doc:  The number of stack to trace. -1 mean all.
    Value:  8

traceback__compile_limit (<class 'int'>) 
    Doc:  The number of stack to trace to keep during compilation. -1 mean all. If greater then 0, will also make us save Aesara internal stack trace.
    Value:  0

experimental__unpickle_gpu_on_cpu (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F835037F0>>) 
    Doc:  Allow unpickling of pickled GpuArrays as numpy.ndarrays.This is useful, if you want to open a GpuArray without having cuda installed.If you have cuda installed, this will force unpickling tobe done on the cpu to numpy.ndarray.Please be aware that this may get you access to the data,however, trying to unpicke gpu functions will not succeed.This flag is experimental and may be removed any time, whengpu<>cpu transparency is solved.
    Value:  False

experimental__local_alloc_elemwise (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503820>>) 
    Doc:  DEPRECATED: If True, enable the experimental optimization local_alloc_elemwise. Generates error if not True. Use optimizer_excluding=local_alloc_elemwise to disable.
    Value:  True

experimental__local_alloc_elemwise_assert (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F835038B0>>) 
    Doc:  When the local_alloc_elemwise is applied, add an assert to highlight shape errors.
    Value:  True

warn__ignore_bug_before ({'1.0', '0.9', '0.4', 'None', '0.8.2', '0.3', '1.0.3', '0.5', 'all', '1.0.2', '0.10', '0.7', '0.4.1', '0.6', '0.8.1', '1.0.4', '1.0.1', '1.0.5', '0.8'}) 
    Doc:  If 'None', we warn about all Aesara bugs found by default. If 'all', we don't warn about Aesara bugs found by default. If a version, we print only the warnings relative to Aesara bugs found after that version. Warning for specific bugs can be configured with specific [warn] flags.
    Value:  0.9

exception_verbosity ({'high', 'low'}) 
    Doc:  If 'low', the text of exceptions will generally refer to apply nodes with short names such as Elemwise{add_no_inplace}. If 'high', some exceptions will also refer to apply nodes with long descriptions  like:
        A. Elemwise{add_no_inplace}
                B. log_likelihood_v_given_h
                C. log_likelihood_h
    Value:  low

print_test_value (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F835039A0>>) 
    Doc:  If 'True', the __eval__ of an Aesara variable will return its test_value when this is available. This has the practical conseguence that, e.g., in debugging `my_var` will print the same as `my_var.tag.test_value` when a test value is defined.
    Value:  False

compute_test_value ({'raise', 'ignore', 'off', 'pdb', 'warn'}) 
    Doc:  If 'True', Aesara will run each op at graph build time, using Constants, SharedVariables and the tag 'test_value' as inputs to the function. This helps the user track down problems in the graph before it gets optimized.
    Value:  off

compute_test_value_opt ({'raise', 'ignore', 'off', 'pdb', 'warn'}) 
    Doc:  For debugging Aesara optimization only. Same as compute_test_value, but is used during Aesara optimization
    Value:  off

check_input (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503A00>>) 
    Doc:  Specify if types should check their input in their C code. It can be used to speed up compilation, reduce overhead (particularly for scalars) and reduce the number of generated C files.
    Value:  True

NanGuardMode__nan_is_error (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503A30>>) 
    Doc:  Default value for nan_is_error
    Value:  True

NanGuardMode__inf_is_error (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503A90>>) 
    Doc:  Default value for inf_is_error
    Value:  True

NanGuardMode__big_is_error (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503B20>>) 
    Doc:  Default value for big_is_error
    Value:  True

NanGuardMode__action ({'warn', 'raise', 'pdb'}) 
    Doc:  What NanGuardMode does when it finds a problem
    Value:  raise

DebugMode__patience (<class 'int'>) 
    Doc:  Optimize graph this many times to detect inconsistency
    Value:  10

DebugMode__check_c (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503BB0>>) 
    Doc:  Run C implementations where possible
    Value:  True

DebugMode__check_py (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503C40>>) 
    Doc:  Run Python implementations where possible
    Value:  True

DebugMode__check_finite (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503C10>>) 
    Doc:  True -> complain about NaN/Inf results
    Value:  True

DebugMode__check_strides (<class 'int'>) 
    Doc:  Check that Python- and C-produced ndarrays have same strides. On difference: (0) - ignore, (1) warn, or (2) raise error
    Value:  0

DebugMode__warn_input_not_reused (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503CA0>>) 
    Doc:  Generate a warning when destroy_map or view_map says that an op works inplace, but the op did not reuse the input for its output.
    Value:  True

DebugMode__check_preallocated_output (<class 'str'>) 
    Doc:  Test thunks with pre-allocated memory as output storage. This is a list of strings separated by ":". Valid values are: "initial" (initial storage in storage map, happens with Scan),"previous" (previously-returned memory), "c_contiguous", "f_contiguous", "strided" (positive and negative strides), "wrong_size" (larger and smaller dimensions), and "ALL" (all of the above).
    Value:  

DebugMode__check_preallocated_output_ndim (<class 'int'>) 
    Doc:  When testing with "strided" preallocated output memory, test all combinations of strides over that number of (inner-most) dimensions. You may want to reduce that number to reduce memory or time usage, but it is advised to keep a minimum of 2.
    Value:  4

profiling__time_thunks (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503D30>>) 
    Doc:  Time individual thunks when profiling
    Value:  True

profiling__n_apply (<class 'int'>) 
    Doc:  Number of Apply instances to print by default
    Value:  20

profiling__n_ops (<class 'int'>) 
    Doc:  Number of Ops to print by default
    Value:  20

profiling__output_line_width (<class 'int'>) 
    Doc:  Max line width for the profiling output
    Value:  512

profiling__min_memory_size (<class 'int'>) 
    Doc:  For the memory profile, do not print Apply nodes if the size
                 of their outputs (in bytes) is lower than this threshold
    Value:  1024

profiling__min_peak_memory (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503E80>>) 
    Doc:  The min peak memory usage of the order
    Value:  False

profiling__destination (<class 'str'>) 
    Doc:  File destination of the profiling output
    Value:  stderr

profiling__debugprint (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503EE0>>) 
    Doc:  Do a debugprint of the profiled functions
    Value:  False

profiling__ignore_first_call (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503F10>>) 
    Doc:  Do we ignore the first call of an Aesara function.
    Value:  False

on_shape_error ({'warn', 'raise'}) 
    Doc:  warn: print a warning and use the default value. raise: raise an error
    Value:  warn

openmp (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83503F70>>) 
    Doc:  Allow (or not) parallel computation on the CPU with OpenMP. This is the default value used when creating an Op that supports OpenMP parallelization. It is preferable to define it via the Aesara configuration file ~/.aesararc or with the environment variable AESARA_FLAGS. Parallelization is only done for some operations that implement it, and even for operations that implement parallelism, each operation is free to respect this flag or not. You can control the number of threads used with the environment variable OMP_NUM_THREADS. If it is set to 1, we disable openmp in Aesara by default.
    Value:  False

openmp_elemwise_minsize (<class 'int'>) 
    Doc:  If OpenMP is enabled, this is the minimum size of vectors for which the openmp parallelization is enabled in element wise ops.
    Value:  200000

optimizer_excluding (<class 'str'>) 
    Doc:  When using the default mode, we will remove optimizer with these tags. Separate tags with ':'.
    Value:  

optimizer_including (<class 'str'>) 
    Doc:  When using the default mode, we will add optimizer with these tags. Separate tags with ':'.
    Value:  

optimizer_requiring (<class 'str'>) 
    Doc:  When using the default mode, we will require optimizer with these tags. Separate tags with ':'.
    Value:  

optdb__position_cutoff (<class 'float'>) 
    Doc:  Where to stop eariler during optimization. It represent the position of the optimizer where to stop.
    Value:  inf

optdb__max_use_ratio (<class 'float'>) 
    Doc:  A ratio that prevent infinite loop in EquilibriumOptimizer.
    Value:  8.0

cycle_detection ({'regular', 'fast'}) 
    Doc:  If cycle_detection is set to regular, most inplaces are allowed,but it is slower. If cycle_detection is set to faster, less inplacesare allowed, but it makes the compilation faster.The interaction of which one give the lower peak memory usage iscomplicated and not predictable, so if you are close to the peakmemory usage, triyng both could give you a small gain.
    Value:  regular

check_stack_trace ({'off', 'raise', 'warn', 'log'}) 
    Doc:  A flag for checking the stack trace during the optimization process. default (off): does not check the stack trace of any optimization log: inserts a dummy stack trace that identifies the optimizationthat inserted the variable that had an empty stack trace.warn: prints a warning if a stack trace is missing and also a dummystack trace is inserted that indicates which optimization insertedthe variable that had an empty stack trace.raise: raises an exception if a stack trace is missing
    Value:  off

metaopt__verbose (<class 'int'>) 
    Doc:  0 for silent, 1 for only warnings, 2 for full output withtimings and selected implementation
    Value:  0

metaopt__optimizer_excluding (<class 'str'>) 
    Doc:  exclude optimizers with these tags. Separate tags with ':'.
    Value:  

metaopt__optimizer_including (<class 'str'>) 
    Doc:  include optimizers with these tags. Separate tags with ':'.
    Value:  

profile (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F8350B280>>) 
    Doc:  If VM should collect profile information
    Value:  False

profile_optimizer (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F8350B2E0>>) 
    Doc:  If VM should collect optimizer profile information
    Value:  False

profile_memory (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F8350B310>>) 
    Doc:  If VM should collect memory profile information and print it
    Value:  False

<aesara.configparser.ConfigParam object at 0x0000020F8350B340>
    Doc:  Useful only for the vm linkers. When lazy is None, auto detect if lazy evaluation is needed and use the appropriate version. If lazy is True/False, force the version used between Loop/LoopGC and Stack.
    Value:  None

cache_optimizations (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F8350B3A0>>) 
    Doc:  WARNING: work in progress, does not work yet. Specify if the optimization cache should be used. This cache will any optimized graph and its optimization. Actually slow downs a lot the first optimization, and could possibly still contains some bugs. Use at your own risks.
    Value:  False

unittests__rseed (<class 'str'>) 
    Doc:  Seed to use for randomized unit tests. Special value 'random' means using a seed of None.
    Value:  666

warn__round (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F8350B460>>) 
    Doc:  Warn when using `tensor.round` with the default mode. Round changed its default from `half_away_from_zero` to `half_to_even` to have the same default as NumPy.
    Value:  False

compiledir_format (<class 'str'>) 
    Doc:  Format string for platform-dependent compiled module subdirectory
(relative to base_compiledir). Available keys: aesara_version, device,
gxx_version, hostname, numpy_version, platform, processor,
python_bitwidth, python_int_bitwidth, python_version, short_platform.
Defaults to compiledir_%(short_platform)s-%(processor)s-%(python_versi
on)s-%(python_bitwidth)s.
    Value:  compiledir_%(short_platform)s-%(processor)s-%(python_version)s-%(python_bitwidth)s

<aesara.configparser.ConfigParam object at 0x0000020F8350B8E0>
    Doc:  platform-independent root directory for compiled modules
    Value:  C:\Users\ricar\AppData\Local\Aesara

<aesara.configparser.ConfigParam object at 0x0000020F8350B8B0>
    Doc:  platform-dependent cache directory for compiled modules
    Value:  C:\Users\ricar\AppData\Local\Aesara\compiledir_Windows-10-10.0.19041-SP0-AMD64_Family_23_Model_17_Stepping_0_AuthenticAMD-3.9.9-64

<aesara.configparser.ConfigParam object at 0x0000020F8350B940>
    Doc:  Directory to cache pre-compiled kernels for the gpuarray backend.
    Value:  C:\Users\ricar\AppData\Local\Aesara\compiledir_Windows-10-10.0.19041-SP0-AMD64_Family_23_Model_17_Stepping_0_AuthenticAMD-3.9.9-64\gpuarray_kernels

blas__ldflags (<class 'str'>) 
    Doc:  lib[s] to include for [Fortran] level-3 blas implementation
    Value:  

blas__check_openmp (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F83920550>>) 
    Doc:  Check for openmp library conflict.
WARNING: Setting this to False leaves you open to wrong results in blas-related operations.
    Value:  True

scan__allow_gc (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F8350B970>>) 
    Doc:  Allow/disallow gc inside of Scan (default: False)
    Value:  False

scan__allow_output_prealloc (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x0000020F85ECCA00>>) 
    Doc:  Allow/disallow memory preallocation for outputs inside of scan (default: True)
    Value:  True

@brandonwillard brandonwillard added help wanted Extra attention is needed C-backend labels Jan 7, 2022
@twiecki
Copy link
Contributor

twiecki commented Jan 10, 2022

Any ideas on a path forward here?

@brandonwillard
Copy link
Member

Any ideas on a path forward here?

Someone with a good Windows development setup that can reproduce the issue needs to start debugging it. My first assumption is that this is just another reference count problem.

Hopefully, there's just a bug in the new implementation and one can find it by manually tracking the reference counts (e.g. print them all throughout the DimShuffle C code and look for cases where the count is 0 and the variable is still being actively used or passed off to be used). The fix would then be the addition of strategically placed Py_INCREFs and/or Py_DECREFs.

Worst case, the issue could be caused by a CPython version/implementation discrepancy and the above might fix things for Windows but break things in Linux (or introduce a memory leak).

Regardless, someone needs to do some simple debugging (and not forget to aesara-cache clear between changes).

@brandonwillard
Copy link
Member

brandonwillard commented Jan 17, 2022

The issue seems to be the use of PyDimMem_FREE here. Changing it to free(reshape_shape.ptr) or free(_reshape_shape) fixes the issue.

@twiecki
Copy link
Contributor

twiecki commented Jan 18, 2022

👍 how painful was that to find?

@brandonwillard
Copy link
Member

brandonwillard commented Jan 18, 2022

+1 how painful was that to find?

It took literally five minutes to find it after about an hour of building a Windows VM, setting up a dev environment, finding out how nearly impossible it is to get gdb working with a Conda m2w64-toolchain setup, etc., etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working C-backend help wanted Extra attention is needed Windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants