-
-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add numerical stabilization for difference of exponentials #1399
Conversation
I'm unable to test the changes locally because of some lock issues in macOS. If I run the following command: % pytest -v tests/tensor/test_math.py::test_log1mexp_grad_lim Timeout Error
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <filelock._unix.UnixFileLock object at 0x1368913c0>, timeout = 120
poll_interval = 0.05
def acquire(
self,
timeout: float | None = None,
poll_interval: float = 0.05,
*,
poll_intervall: float | None = None,
blocking: bool = True,
) -> AcquireReturnProxy:
"""
Try to acquire the file lock.
:param timeout: maximum wait time for acquiring the lock, ``None`` means use the default :attr:`~timeout` is and
if ``timeout < 0``, there is no timeout and this method will block until the lock could be acquired
:param poll_interval: interval of trying to acquire the lock file
:param poll_intervall: deprecated, kept for backwards compatibility, use ``poll_interval`` instead
:param blocking: defaults to True. If False, function will return immediately if it cannot obtain a lock on the
first attempt. Otherwise this method will block until the timeout expires or the lock is acquired.
:raises Timeout: if fails to acquire lock within the timeout period
:return: a context object that will unlock the file when the context is exited
.. code-block:: python
# You can use this method in the context manager (recommended)
with lock.acquire():
pass
# Or use an equivalent try-finally construct:
lock.acquire()
try:
pass
finally:
lock.release()
.. versionchanged:: 2.0.0
This method returns now a *proxy* object instead of *self*,
so that it can be used in a with statement without side effects.
"""
# Use the default timeout, if no timeout is provided.
if timeout is None:
timeout = self.timeout
if poll_intervall is not None:
msg = "use poll_interval instead of poll_intervall"
warnings.warn(msg, DeprecationWarning, stacklevel=2)
poll_interval = poll_intervall
# Increment the number right at the beginning. We can still undo it, if something fails.
with self._thread_lock:
self._lock_counter += 1
lock_id = id(self)
lock_filename = self._lock_file
start_time = time.monotonic()
try:
while True:
with self._thread_lock:
if not self.is_locked:
_LOGGER.debug("Attempting to acquire lock %s on %s", lock_id, lock_filename)
self._acquire()
if self.is_locked:
_LOGGER.debug("Lock %s acquired on %s", lock_id, lock_filename)
break
elif blocking is False:
_LOGGER.debug("Failed to immediately acquire lock %s on %s", lock_id, lock_filename)
raise Timeout(self._lock_file)
elif 0 <= timeout < time.monotonic() - start_time:
_LOGGER.debug("Timeout on acquiring lock %s on %s", lock_id, lock_filename)
> raise Timeout(self._lock_file)
E filelock._error.Timeout: The file lock 'The file lock '/Users/thebigbool/.aesara/compiledir_macOS-13.0-x86_64-i386-64bit-i386-3.10.8-64/.lock' could not be acquired.
E Apply node that caused the error: Elemwise{Composite{Switch(IsInf(Composite{(i0 / expm1((-i1)))}(i0, i1)), i2, Composite{(i0 / expm1((-i1)))}(i0, i1))}}(TensorConstant{-1.0}, x, TensorConstant{-inf})
E Toposort index: 0
E Inputs types: [TensorType(float64, ()), TensorType(float64, ()), TensorType(float32, ())]
E
E HINT: Use a linker other than the C linker to print the inputs' shapes and strides.
E HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
E HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.' could not be acquired.
../../opt/anaconda3/envs/aesara-dev/lib/python3.10/site-packages/filelock/_api.py:183: Timeout
============================= slowest 50 durations =============================
120.03s call tests/tensor/test_math.py::test_log1mexp_grad_lim
(2 durations < 0.005s hidden. Use -vv to show these durations.)
=========================== short test summary info ============================
FAILED tests/tensor/test_math.py::test_log1mexp_grad_lim - filelock._error.Timeout: The file lock 'The file lock '/Users/thebigbool/.a...
======================== 1 failed in 140.14s (0:02:20) ========================= |
Seems like some OS issue as the tests already run on CI. Any pointers on this @brandonwillard? |
769eeb3
to
7b6dd98
Compare
I would first check that Does this happen literally every time something is compiled in Aesara, or only when that particular test is run? |
This happens to me whenever I try to run the tests. |
Very interesting! Don't fix anything just yet; I would like to get on a call first and check it out. If it's an Aesara bug, this would be a great opportunity to squash it. In the meantime, you can always test things using the Python backend by setting the Aesara |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #1399 +/- ##
=======================================
Coverage 74.69% 74.69%
=======================================
Files 194 194
Lines 49730 49751 +21
Branches 10527 10532 +5
=======================================
+ Hits 37145 37162 +17
- Misses 10262 10264 +2
- Partials 2323 2325 +2
|
This was solved by re-constructing the environment using the |
8cbed4d
to
5fbbc8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good; we just need to squash before/while merging.
Looks good, merging! |
Closes #1303.