Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numba Scan fails when multiple None values are passed in outputs_info #1176

Closed
jessegrabowski opened this issue Sep 13, 2022 · 3 comments · Fixed by #1198
Closed

Numba Scan fails when multiple None values are passed in outputs_info #1176

jessegrabowski opened this issue Sep 13, 2022 · 3 comments · Fixed by #1198
Labels
bug Something isn't working help wanted Extra attention is needed Numba Involves Numba transpilation Scan Involves the `Scan` `Op`

Comments

@jessegrabowski
Copy link
Contributor

Referencing a error raised in #1174, when an aesara Scan outputs_info includes more than one None (more than one output that behaves as a map), the placeholder variable name for this output appears to be re-used, resulting in a syntax error in the generated numba code.

MRP:

k = at.iscalar('k')
A = at.dvector('A')

def power_step(prior_result, x):
    return prior_result * x, prior_result * x * x, prior_result * x * x * x

result, _ = aesara.scan(power_step,
                        non_sequences=[A],
                        outputs_info=[at.ones_like(A), None, None],
                        n_steps=k)

numba_power = aesara.function([k, A], result, mode='NUMBA')
SyntaxError: duplicate argument 'auto_25426' in function definition

Traceback (most recent call last):

  File ~/opt/anaconda3/envs/econ/lib/python3.10/site-packages/IPython/core/interactiveshell.py:3398 in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  Input In [16] in <cell line: 1>
    numba_power = aesara.function([k, A], result, mode='NUMBA')

  File ~/opt/anaconda3/envs/econ/lib/python3.10/site-packages/aesara/compile/function/__init__.py:317 in function
    fn = pfunc(

  File ~/opt/anaconda3/envs/econ/lib/python3.10/site-packages/aesara/compile/function/pfunc.py:374 in pfunc
    return orig_function(

  File ~/opt/anaconda3/envs/econ/lib/python3.10/site-packages/aesara/compile/function/types.py:1763 in orig_function
    fn = m.create(defaults)

  File ~/opt/anaconda3/envs/econ/lib/python3.10/site-packages/aesara/compile/function/types.py:1656 in create
    _fn, _i, _o = self.linker.make_thunk(

  File ~/opt/anaconda3/envs/econ/lib/python3.10/site-packages/aesara/link/basic.py:254 in make_thunk
    return self.make_all(

  File ~/opt/anaconda3/envs/econ/lib/python3.10/site-packages/aesara/link/basic.py:698 in make_all
    thunks, nodes, jit_fn = self.create_jitable_thunk(

  File ~/opt/anaconda3/envs/econ/lib/python3.10/site-packages/aesara/link/basic.py:642 in create_jitable_thunk
    converted_fgraph = self.fgraph_convert(

  File ~/opt/anaconda3/envs/econ/lib/python3.10/site-packages/aesara/link/numba/linker.py:10 in fgraph_convert
    return numba_funcify(fgraph, **kwargs)

  File ~/opt/anaconda3/envs/econ/lib/python3.10/functools.py:889 in wrapper
    return dispatch(args[0].__class__)(*args, **kw)

  File ~/opt/anaconda3/envs/econ/lib/python3.10/site-packages/aesara/link/numba/dispatch/basic.py:381 in numba_funcify_FunctionGraph
    return fgraph_to_python(

  File ~/opt/anaconda3/envs/econ/lib/python3.10/site-packages/aesara/link/utils.py:741 in fgraph_to_python
    compiled_func = op_conversion_fn(

  File ~/opt/anaconda3/envs/econ/lib/python3.10/functools.py:889 in wrapper
    return dispatch(args[0].__class__)(*args, **kw)

  File ~/opt/anaconda3/envs/econ/lib/python3.10/site-packages/aesara/link/numba/dispatch/scan.py:153 in numba_funcify_Scan
    scalar_op_fn = compile_function_src(

  File ~/opt/anaconda3/envs/econ/lib/python3.10/site-packages/aesara/link/utils.py:605 in compile_function_src
    mod_code = compile(src, filename, mode="exec")

  File /var/folders/wy/ph4j9vrx23v000gc9flt78y40000gn/T/tmpq2whnvfo:2
    def scan(n_steps, auto_29768, auto_25426, auto_25426, auto_25427):
                                              ^
SyntaxError: duplicate argument 'auto_25426' in function definition

Versions and main components

  • Aesara version: 2.7.5
Aesara config (`python -c "import aesara; print(aesara.config)"`)

floatX ({'float64', 'float32', 'float16'}) 
    Doc:  Default floating-point precision for python casts.

Note: float16 support is experimental, use at your own risk.
    Value:  float64

warn_float64 ({'raise', 'ignore', 'pdb', 'warn'}) 
    Doc:  Do an action when a tensor variable with float64 dtype is created.
    Value:  ignore

pickle_test_value (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x10fea35b0>>) 
    Doc:  Dump test values while pickling model. If True, test values will be dumped with model.
    Value:  True

cast_policy ({'custom', 'numpy+floatX'}) 
    Doc:  Rules for implicit type casting
    Value:  custom

deterministic ({'more', 'default'}) 
    Doc:  If `more`, sometimes we will select some implementation that are more deterministic, but slower.  Also see the dnn.conv.algo* flags to cover more cases.
    Value:  default

device (cpu)
    Doc:  Default device for computations. only cpu is supported for now
    Value:  cpu

force_device (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea41f0>>) 
    Doc:  Raise an error if we can't use the specified device
    Value:  False

conv__assert_shape (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea4250>>) 
    Doc:  If True, AbstractConv* ops will verify that user-provided shapes match the runtime shapes (debugging option, may slow down compilation)
    Value:  False

print_global_stats (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea4280>>) 
    Doc:  Print some global statistics (time spent) at the end
    Value:  False

assert_no_cpu_op ({'raise', 'ignore', 'pdb', 'warn'}) 
    Doc:  Raise an error/warning if there is a CPU op in the computational graph.
    Value:  ignore

unpickle_function (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea5ba0>>) 
    Doc:  Replace unpickled Aesara functions with None. This is useful to unpickle old graphs that pickled them when it shouldn't
    Value:  True

<aesara.configparser.ConfigParam object at 0x115ea5bd0>
    Doc:  Default compilation mode
    Value:  Mode

cxx (<class 'str'>) 
    Doc:  The C++ compiler to use. Currently only g++ is supported, but supporting additional compilers should not be too difficult. If it is empty, no C++ code is compiled.
    Value:  /Users/jessegrabowski/opt/anaconda3/envs/econ/bin/clang++

linker ({'cvm_nogc', 'py', 'c|py_nogc', 'c|py', 'cvm', 'c', 'vm', 'vm_nogc'}) 
    Doc:  Default linker used if the aesara flags mode is Mode
    Value:  cvm

allow_gc (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea5b40>>) 
    Doc:  Do we default to delete intermediate results during Aesara function calls? Doing so lowers the memory requirement, but asks that we reallocate memory at the next function call. This is implemented for the default linker, but may not work for all linkers.
    Value:  True

optimizer ({'fast_compile', 'o2', 'o3', 'fast_run', 'None', 'unsafe', 'o1', 'o4', 'merge'}) 
    Doc:  Default optimizer. If not None, will use this optimizer with the Mode
    Value:  o4

optimizer_verbose (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea5d50>>) 
    Doc:  If True, we print all optimization being applied
    Value:  False

on_opt_error ({'raise', 'ignore', 'pdb', 'warn'}) 
    Doc:  What to do when an optimization crashes: warn and skip it, raise the exception, or fall into the pdb debugger.
    Value:  warn

nocleanup (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea5de0>>) 
    Doc:  Suppress the deletion of code files that did not compile cleanly
    Value:  False

on_unused_input ({'raise', 'ignore', 'warn'}) 
    Doc:  What to do if a variable in the 'inputs' list of  aesara.function() is not used in the graph.
    Value:  raise

gcc__cxxflags (<class 'str'>) 
    Doc:  Extra compiler flags for gcc
    Value:   -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables

cmodule__warn_no_version (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea5c90>>) 
    Doc:  If True, will print a warning when compiling one or more Op with C code that can't be cached because there is no c_code_cache_version() function associated to at least one of those Ops.
    Value:  False

cmodule__remove_gxx_opt (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea5e40>>) 
    Doc:  If True, will remove the -O* parameter passed to g++.This is useful to debug in gdb modules compiled by Aesara.The parameter -g is passed by default to g++
    Value:  False

cmodule__compilation_warning (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea5ed0>>) 
    Doc:  If True, will print compilation warnings.
    Value:  False

cmodule__preload_cache (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea5f00>>) 
    Doc:  If set to True, will preload the C module cache at import time
    Value:  False

cmodule__age_thresh_use (<class 'int'>) 
    Doc:  In seconds. The time after which Aesara won't reuse a compile c module.
    Value:  2073600

cmodule__debug (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea5f60>>) 
    Doc:  If True, define a DEBUG macro (if not exists) for any compiled C code.
    Value:  False

compile__wait (<class 'int'>) 
    Doc:  Time to wait before retrying to acquire the compile lock.
    Value:  5

compile__timeout (<class 'int'>) 
    Doc:  In seconds, time that a process will wait before deciding to
    override an existing lock. An override only happens when the existing
    lock is held by the same owner *and* has not been 'refreshed' by this
    owner for more than this period. Refreshes are done every half timeout
    period for running processes.
    Value:  120

ctc__root (<class 'str'>) 
    Doc:  Directory which contains the root of Baidu CTC library. It is assumed         that the compiled library is either inside the build, lib or lib64         subdirectory, and the header inside the include directory.
    Value:  

tensor__cmp_sloppy (<class 'int'>) 
    Doc:  Relax aesara.tensor.math._allclose (0) not at all, (1) a bit, (2) more
    Value:  0

tensor__local_elemwise_fusion (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6110>>) 
    Doc:  Enable or not in fast_run mode(fast_run optimization) the elemwise fusion optimization
    Value:  True

lib__amblibm (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea61a0>>) 
    Doc:  Use amd's amdlibm numerical library
    Value:  False

tensor__insert_inplace_optimizer_validate_nb (<class 'int'>) 
    Doc:  -1: auto, if graph have less then 500 nodes 1, else 10
    Value:  -1

traceback__limit (<class 'int'>) 
    Doc:  The number of stack to trace. -1 mean all.
    Value:  8

traceback__compile_limit (<class 'int'>) 
    Doc:  The number of stack to trace to keep during compilation. -1 mean all. If greater then 0, will also make us save Aesara internal stack trace.
    Value:  0

experimental__local_alloc_elemwise (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6320>>) 
    Doc:  DEPRECATED: If True, enable the experimental optimization local_alloc_elemwise. Generates error if not True. Use optimizer_excluding=local_alloc_elemwise to disable.
    Value:  True

experimental__local_alloc_elemwise_assert (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6350>>) 
    Doc:  When the local_alloc_elemwise is applied, add an assert to highlight shape errors.
    Value:  True

warn__ignore_bug_before ({'1.0.2', '0.4', '1.0.4', '1.0', '0.9', '0.8.1', '0.10', 'None', '0.8.2', '0.5', '0.3', '0.8', '0.7', '1.0.5', '1.0.3', '0.4.1', 'all', '1.0.1', '0.6'}) 
    Doc:  If 'None', we warn about all Aesara bugs found by default. If 'all', we don't warn about Aesara bugs found by default. If a version, we print only the warnings relative to Aesara bugs found after that version. Warning for specific bugs can be configured with specific [warn] flags.
    Value:  0.9

exception_verbosity ({'low', 'high'}) 
    Doc:  If 'low', the text of exceptions will generally refer to apply nodes with short names such as Elemwise{add_no_inplace}. If 'high', some exceptions will also refer to apply nodes with long descriptions  like:
        A. Elemwise{add_no_inplace}
                B. log_likelihood_v_given_h
                C. log_likelihood_h
    Value:  low

print_test_value (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea64a0>>) 
    Doc:  If 'True', the __eval__ of an Aesara variable will return its test_value when this is available. This has the practical conseguence that, e.g., in debugging `my_var` will print the same as `my_var.tag.test_value` when a test value is defined.
    Value:  False

compute_test_value ({'raise', 'ignore', 'pdb', 'off', 'warn'}) 
    Doc:  If 'True', Aesara will run each op at graph build time, using Constants, SharedVariables and the tag 'test_value' as inputs to the function. This helps the user track down problems in the graph before it gets optimized.
    Value:  off

compute_test_value_opt ({'raise', 'ignore', 'pdb', 'off', 'warn'}) 
    Doc:  For debugging Aesara optimization only. Same as compute_test_value, but is used during Aesara optimization
    Value:  off

check_input (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6530>>) 
    Doc:  Specify if types should check their input in their C code. It can be used to speed up compilation, reduce overhead (particularly for scalars) and reduce the number of generated C files.
    Value:  True

NanGuardMode__nan_is_error (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6560>>) 
    Doc:  Default value for nan_is_error
    Value:  True

NanGuardMode__inf_is_error (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6590>>) 
    Doc:  Default value for inf_is_error
    Value:  True

NanGuardMode__big_is_error (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6620>>) 
    Doc:  Default value for big_is_error
    Value:  True

NanGuardMode__action ({'raise', 'pdb', 'warn'}) 
    Doc:  What NanGuardMode does when it finds a problem
    Value:  raise

DebugMode__patience (<class 'int'>) 
    Doc:  Optimize graph this many times to detect inconsistency
    Value:  10

DebugMode__check_c (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea66b0>>) 
    Doc:  Run C implementations where possible
    Value:  True

DebugMode__check_py (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6740>>) 
    Doc:  Run Python implementations where possible
    Value:  True

DebugMode__check_finite (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6770>>) 
    Doc:  True -> complain about NaN/Inf results
    Value:  True

DebugMode__check_strides (<class 'int'>) 
    Doc:  Check that Python- and C-produced ndarrays have same strides. On difference: (0) - ignore, (1) warn, or (2) raise error
    Value:  0

DebugMode__warn_input_not_reused (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea67d0>>) 
    Doc:  Generate a warning when destroy_map or view_map says that an op works inplace, but the op did not reuse the input for its output.
    Value:  True

DebugMode__check_preallocated_output (<class 'str'>) 
    Doc:  Test thunks with pre-allocated memory as output storage. This is a list of strings separated by ":". Valid values are: "initial" (initial storage in storage map, happens with Scan),"previous" (previously-returned memory), "c_contiguous", "f_contiguous", "strided" (positive and negative strides), "wrong_size" (larger and smaller dimensions), and "ALL" (all of the above).
    Value:  

DebugMode__check_preallocated_output_ndim (<class 'int'>) 
    Doc:  When testing with "strided" preallocated output memory, test all combinations of strides over that number of (inner-most) dimensions. You may want to reduce that number to reduce memory or time usage, but it is advised to keep a minimum of 2.
    Value:  4

profiling__time_thunks (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6860>>) 
    Doc:  Time individual thunks when profiling
    Value:  True

profiling__n_apply (<class 'int'>) 
    Doc:  Number of Apply instances to print by default
    Value:  20

profiling__n_ops (<class 'int'>) 
    Doc:  Number of Ops to print by default
    Value:  20

profiling__output_line_width (<class 'int'>) 
    Doc:  Max line width for the profiling output
    Value:  512

profiling__min_memory_size (<class 'int'>) 
    Doc:  For the memory profile, do not print Apply nodes if the size
                 of their outputs (in bytes) is lower than this threshold
    Value:  1024

profiling__min_peak_memory (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea69b0>>) 
    Doc:  The min peak memory usage of the order
    Value:  False

profiling__destination (<class 'str'>) 
    Doc:  File destination of the profiling output
    Value:  stderr

profiling__debugprint (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6a10>>) 
    Doc:  Do a debugprint of the profiled functions
    Value:  False

profiling__ignore_first_call (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6a40>>) 
    Doc:  Do we ignore the first call of an Aesara function.
    Value:  False

on_shape_error ({'raise', 'warn'}) 
    Doc:  warn: print a warning and use the default value. raise: raise an error
    Value:  warn

openmp (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6aa0>>) 
    Doc:  Allow (or not) parallel computation on the CPU with OpenMP. This is the default value used when creating an Op that supports OpenMP parallelization. It is preferable to define it via the Aesara configuration file ~/.aesararc or with the environment variable AESARA_FLAGS. Parallelization is only done for some operations that implement it, and even for operations that implement parallelism, each operation is free to respect this flag or not. You can control the number of threads used with the environment variable OMP_NUM_THREADS. If it is set to 1, we disable openmp in Aesara by default.
    Value:  False

openmp_elemwise_minsize (<class 'int'>) 
    Doc:  If OpenMP is enabled, this is the minimum size of vectors for which the openmp parallelization is enabled in element wise ops.
    Value:  200000

optimizer_excluding (<class 'str'>) 
    Doc:  When using the default mode, we will remove optimizer with these tags. Separate tags with ':'.
    Value:  

optimizer_including (<class 'str'>) 
    Doc:  When using the default mode, we will add optimizer with these tags. Separate tags with ':'.
    Value:  

optimizer_requiring (<class 'str'>) 
    Doc:  When using the default mode, we will require optimizer with these tags. Separate tags with ':'.
    Value:  

optdb__position_cutoff (<class 'float'>) 
    Doc:  Where to stop eariler during optimization. It represent the position of the optimizer where to stop.
    Value:  inf

optdb__max_use_ratio (<class 'float'>) 
    Doc:  A ratio that prevent infinite loop in EquilibriumOptimizer.
    Value:  8.0

cycle_detection ({'fast', 'regular'}) 
    Doc:  If cycle_detection is set to regular, most inplaces are allowed,but it is slower. If cycle_detection is set to faster, less inplacesare allowed, but it makes the compilation faster.The interaction of which one give the lower peak memory usage iscomplicated and not predictable, so if you are close to the peakmemory usage, triyng both could give you a small gain.
    Value:  regular

check_stack_trace ({'off', 'raise', 'log', 'warn'}) 
    Doc:  A flag for checking the stack trace during the optimization process. default (off): does not check the stack trace of any optimization log: inserts a dummy stack trace that identifies the optimizationthat inserted the variable that had an empty stack trace.warn: prints a warning if a stack trace is missing and also a dummystack trace is inserted that indicates which optimization insertedthe variable that had an empty stack trace.raise: raises an exception if a stack trace is missing
    Value:  off

metaopt__verbose (<class 'int'>) 
    Doc:  0 for silent, 1 for only warnings, 2 for full output withtimings and selected implementation
    Value:  0

metaopt__optimizer_excluding (<class 'str'>) 
    Doc:  exclude optimizers with these tags. Separate tags with ':'.
    Value:  

metaopt__optimizer_including (<class 'str'>) 
    Doc:  include optimizers with these tags. Separate tags with ':'.
    Value:  

profile (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6da0>>) 
    Doc:  If VM should collect profile information
    Value:  False

profile_optimizer (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6dd0>>) 
    Doc:  If VM should collect optimizer profile information
    Value:  False

profile_memory (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6e00>>) 
    Doc:  If VM should collect memory profile information and print it
    Value:  False

<aesara.configparser.ConfigParam object at 0x115ea6e30>
    Doc:  Useful only for the VM Linkers. When lazy is None, auto detect if lazy evaluation is needed and use the appropriate version. If the C loop isn't being used and lazy is True, use the Stack VM; otherwise, use the Loop VM.
    Value:  None

unittests__rseed (<class 'str'>) 
    Doc:  Seed to use for randomized unit tests. Special value 'random' means using a seed of None.
    Value:  666

warn__round (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6ef0>>) 
    Doc:  Warn when using `tensor.round` with the default mode. Round changed its default from `half_away_from_zero` to `half_to_even` to have the same default as NumPy.
    Value:  False

numba__vectorize_target ({'cuda', 'cpu', 'parallel'}) 
    Doc:  Default target for numba.vectorize.
    Value:  cpu

numba__fastmath (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea6fb0>>) 
    Doc:  If True, use Numba's fastmath mode.
    Value:  True

numba__cache (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x115ea7040>>) 
    Doc:  If True, use Numba's file based caching.
    Value:  True

compiledir_format (<class 'str'>) 
    Doc:  Format string for platform-dependent compiled module subdirectory
(relative to base_compiledir). Available keys: aesara_version, device,
gxx_version, hostname, numpy_version, platform, processor,
python_bitwidth, python_int_bitwidth, python_version, short_platform.
Defaults to compiledir_%(short_platform)s-%(processor)s-
%(python_version)s-%(python_bitwidth)s.
    Value:  compiledir_%(short_platform)s-%(processor)s-%(python_version)s-%(python_bitwidth)s

<aesara.configparser.ConfigParam object at 0x115ea70d0>
    Doc:  platform-independent root directory for compiled modules
    Value:  /Users/jessegrabowski/.aesara

<aesara.configparser.ConfigParam object at 0x115ea7010>
    Doc:  platform-dependent cache directory for compiled modules
    Value:  /Users/jessegrabowski/.aesara/compiledir_macOS-10.15.7-x86_64-i386-64bit-i386-3.10.6-64

blas__ldflags (<class 'str'>) 
    Doc:  lib[s] to include for [Fortran] level-3 blas implementation
    Value:  -L/Users/jessegrabowski/opt/anaconda3/envs/econ/lib -lmkl_core -lmkl_intel_thread -lmkl_rt -Wl,-rpath,/Users/jessegrabowski/opt/anaconda3/envs/econ/lib

blas__check_openmp (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x11611d900>>) 
    Doc:  Check for openmp library conflict.
WARNING: Setting this to False leaves you open to wrong results in blas-related operations.
    Value:  True

scan__allow_gc (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x14c746740>>) 
    Doc:  Allow/disallow gc inside of Scan (default: False)
    Value:  False

scan__allow_output_prealloc (<bound method BoolParam._apply of <aesara.configparser.BoolParam object at 0x14c718910>>) 
    Doc:  Allow/disallow memory preallocation for outputs inside of scan (default: True)
    Value:  True

  • Python version: 3.10.6
  • Operating system: Mac OS Catalina 10.15.7
  • How did you install Aesara: conda
@brandonwillard brandonwillard added bug Something isn't working help wanted Extra attention is needed Numba Involves Numba transpilation Scan Involves the `Scan` `Op` labels Sep 13, 2022
@brandonwillard brandonwillard changed the title Numba Scan fails when multiple None values are passed in outputs_info Numba Scan fails when multiple None values are passed in outputs_info Sep 13, 2022
@rlouf
Copy link
Member

rlouf commented Sep 20, 2022

I could trace back the issue to the names automatically attributed to the nit-sot variables:

print(outer_nit_sot_names)
# ['auto_12', 'auto_12']

A quick hack indeed solves this issue:

input_names = [f"{n.auto_name}_{i}" for i,n in enumerate(node.inputs[1:])]

Instead of pushing a hack, let's try to understand the root cause and the assumptions in Variable or Scan that lead to this duplicate naming situation.

Let's add a breakpoint in aesara.link.numba.dispatch.scan.py and work our way up from there. The problem pre-exists the application of op_conversion_fn in fgraph_to_python. Indeed, if I stop right before the scan is transpiled to Numba:

op
# forall_inplace,cpu,scan_fn}(k, IncSubtensor{InplaceSet;:int64:}.0, k, k, A)

op.inputs
# [k, IncSubtensor{InplaceSet;:int64:}.0, k, k, A]

op.inputs[2] == op.inputs[3]
# True

op.inputs[2].dtype
# int32

The node, after optimization, contains two identical nit-sots, and for some reason they are equal to the variable k that represents the number of steps.

Let's go back for a second to the original graph, pre-compilation:

result[1].owner.inputs
# [k, IncSubtensor{Set;:int64:}.0, k, k, A]

result[1].owner.inputs[2] == result[1].owner.inputs[3]
# True

result[1].owner.inputs[1] == result[1].owner.inputs[3]
# True

We can thus eliminate optimizations from the list of potential culprits. But now we need to understand why the variable that corresponds to the number of steps (k) is used as dummy inputs for the nit-sots. First note that k is duplicated regardless of the number of nit-sots (I have tried removing one None and adding one).

Let's look at aesara.scan.basic.scan, right after the Op is created, in step 8. _scan_inputs is indeed the culprit as it uses actual_n_steps as a placeholder variable for the nit-sots:

    _scan_inputs = (
        scan_seqs
        + mit_mot_scan_inputs
        + mit_sot_scan_inputs
        + sit_sot_scan_inputs
        + shared_scan_inputs
        + [actual_n_steps for x in range(n_nit_sot)]
        + other_shared_scan_args
        + other_scan_args
    )
_scan_inputs
# [IncSubtensor{Set;:int64:}.0, k, k, A]

actual_n_step
# k

Now if I try to provide cloned actual_n_steps as nit_sots, Aesara complains during compilation that these cloned variables are not provided an initial value:

# using [actual_n_steps.clone() for x in range(n_nit_sot)] to define _scan_inputs
# aesara.graph.utils.MissingInputError: Input 2 (k) of the graph (indices start from 0), used to compute for{cpu,scan_fn}(k, IncSubtensor{Set;:int64:}.0, k, k, A), was not provided and not given a value. Use the Aesara flag exception_verbosity='high', for more information on this error.

So this issue touches to something more fundamental about Aesara's IR and how loops are represented in this IR. The hackish solution doesn't look so bad now, as a temporary fix.

I am now wondering how using actual_n_steps for nit_sots might trigger other bugs that we haven't encountered yet (I am thinking about AeHMC's NUTS sampler which has many None inputs to each scan loop).

To be continued

@brandonwillard
Copy link
Member

Instead of pushing a hack, let's try to understand the root cause and the assumptions in Variable or Scan that lead to this duplicate naming situation.

It looks like the only problem is that we're generating invalid Numba code because there are name collisions, so making sure that the names are unique is the solution.

@rlouf
Copy link
Member

rlouf commented Sep 21, 2022

The solution is to have different names for variables that are different in the first place. This is just a workaround imo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed Numba Involves Numba transpilation Scan Involves the `Scan` `Op`
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants