You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux CentOS 8/Windows 11
TensorFlow version and how it was installed (source or binary): 2.6.0 binary
TensorFlow-Addons version and how it was installed (source or binary): 0.14.0 binary
Python version: 3.8.6 (Linux)/3.9.7 (Windows)
Is GPU used? (yes/no): yes
Describe the bug
When trying to train a model with the Triangular2 cyclical learning rate policy with scale_mode as 'iterations' and step_size equal to the number of steps in an epoch, if a Tensorboard callback is included while training the model, training stops with the following error after 1 epoch -
Traceback (most recent call last):
File "F:\error_example.py", line 49, in <module>
model.fit(ds_train, epochs=6, validation_data=ds_test, callbacks=[tensorboard_callback])
File "C:\Users\varun\mlenv\lib\site-packages\keras\engine\training.py", line 1230, in fit
callbacks.on_epoch_end(epoch, epoch_logs)
File "C:\Users\varun\mlenv\lib\site-packages\keras\callbacks.py", line 413, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "C:\Users\varun\mlenv\lib\site-packages\keras\callbacks.py", line 2444, in on_epoch_end
self._log_epoch_metrics(epoch, logs)
File "C:\Users\varun\mlenv\lib\site-packages\keras\callbacks.py", line 2492, in _log_epoch_metrics
train_logs = self._collect_learning_rate(train_logs)
File "C:\Users\varun\mlenv\lib\site-packages\keras\callbacks.py", line 2471, in _collect_learning_rate
logs['learning_rate'] = lr_schedule(self.model.optimizer.iterations)
File "C:\Users\varun\mlenv\lib\site-packages\tensorflow_addons\optimizers\cyclical_learning_rate.py", line 102, in __call__
) * tf.maximum(tf.cast(0, dtype), (1 - x)) * self.scale_fn(mode_step)
File "C:\Users\varun\mlenv\lib\site-packages\tensorflow_addons\optimizers\cyclical_learning_rate.py", line 238, in <lambda>
scale_fn=lambda x: 1 / (2.0 ** (x - 1)),
File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1399, in r_binary_op_wrapper
y, x = maybe_promote_tensors(y, x)
File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1335, in maybe_promote_tensors
ops.convert_to_tensor(tensor, dtype, name="x"))
File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\profiler\trace.py", line 163, in wrapped
return func(*args, **kwargs)
File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\framework\ops.py", line 1566, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\framework\tensor_conversion_registry.py", line 52, in _default_conversion_function
return constant_op.constant(value, dtype, name=name)
File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\framework\constant_op.py", line 271, in constant
return _constant_impl(value, dtype, shape, name, verify_shape=False,
File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\framework\constant_op.py", line 283, in _constant_impl
return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\framework\constant_op.py", line 308, in _constant_eager_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\framework\constant_op.py", line 106, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
TypeError: Cannot convert 2.0 to EagerTensor of dtype int64
This error occurs irrespective of whether mixed precision was used or not. I was able to reproduce this error on both Linux and Windows environments. I attempted to use the nightly version of tensorflow-addons, but there was no luck. I also attempted defining my own lambda function for the scale_fn along with the CyclicalLearningRate class but the same error occurred.
This error is also confirmed to occur with the ExponentialCyclicLearningRate policy.
Only the TriangularCyclicLearningRate policy runs without errors of any sort.
The error only occurs when attempting to log training; if the Tensorboard callback is not included, training proceeds without any issues. Even if write_graph is set to True, this problem occurs.
Code to reproduce the issue
This code can reproduce the error -
This is a very weird bug; it'd be great if there was a workaround of some kind that does not prevent generation of logs.
The text was updated successfully, but these errors were encountered:
varun-parthasarathy
changed the title
Triangular2 cyclical learning rate does not work when logging with Tensorboard
Triangular2 and Exponential cyclical learning rates do not work when logging with Tensorboard
Oct 31, 2021
varun-parthasarathy
changed the title
Triangular2 and Exponential cyclical learning rates do not work when logging with Tensorboard
Triangular2/Exponential cyclical learning rates do not work when logging with Tensorboard
Oct 31, 2021
I have this problem as well with ExponentialCyclicalLearningRate will there be a fix? This is incredibly off putting when running several experiments trying out optimiser and learning rate details...
System information
Describe the bug
When trying to train a model with the Triangular2 cyclical learning rate policy with scale_mode as 'iterations' and step_size equal to the number of steps in an epoch, if a Tensorboard callback is included while training the model, training stops with the following error after 1 epoch -
This error occurs irrespective of whether mixed precision was used or not. I was able to reproduce this error on both Linux and Windows environments. I attempted to use the nightly version of tensorflow-addons, but there was no luck. I also attempted defining my own lambda function for the scale_fn along with the CyclicalLearningRate class but the same error occurred.
This error is also confirmed to occur with the ExponentialCyclicLearningRate policy.
Only the TriangularCyclicLearningRate policy runs without errors of any sort.
The error only occurs when attempting to log training; if the Tensorboard callback is not included, training proceeds without any issues. Even if write_graph is set to True, this problem occurs.
Code to reproduce the issue
This code can reproduce the error -
This is a very weird bug; it'd be great if there was a workaround of some kind that does not prevent generation of logs.
The text was updated successfully, but these errors were encountered: