Triangular2/Exponential cyclical learning rates do not work when logging with Tensorboard #2593

varun-parthasarathy · 2021-10-31T06:34:38Z

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux CentOS 8/Windows 11
TensorFlow version and how it was installed (source or binary): 2.6.0 binary
TensorFlow-Addons version and how it was installed (source or binary): 0.14.0 binary
Python version: 3.8.6 (Linux)/3.9.7 (Windows)
Is GPU used? (yes/no): yes

Describe the bug

When trying to train a model with the Triangular2 cyclical learning rate policy with scale_mode as 'iterations' and step_size equal to the number of steps in an epoch, if a Tensorboard callback is included while training the model, training stops with the following error after 1 epoch -

Traceback (most recent call last):
  File "F:\error_example.py", line 49, in <module>
    model.fit(ds_train, epochs=6, validation_data=ds_test, callbacks=[tensorboard_callback])
  File "C:\Users\varun\mlenv\lib\site-packages\keras\engine\training.py", line 1230, in fit
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "C:\Users\varun\mlenv\lib\site-packages\keras\callbacks.py", line 413, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "C:\Users\varun\mlenv\lib\site-packages\keras\callbacks.py", line 2444, in on_epoch_end
    self._log_epoch_metrics(epoch, logs)
  File "C:\Users\varun\mlenv\lib\site-packages\keras\callbacks.py", line 2492, in _log_epoch_metrics
    train_logs = self._collect_learning_rate(train_logs)
  File "C:\Users\varun\mlenv\lib\site-packages\keras\callbacks.py", line 2471, in _collect_learning_rate
    logs['learning_rate'] = lr_schedule(self.model.optimizer.iterations)
  File "C:\Users\varun\mlenv\lib\site-packages\tensorflow_addons\optimizers\cyclical_learning_rate.py", line 102, in __call__
    ) * tf.maximum(tf.cast(0, dtype), (1 - x)) * self.scale_fn(mode_step)
  File "C:\Users\varun\mlenv\lib\site-packages\tensorflow_addons\optimizers\cyclical_learning_rate.py", line 238, in <lambda>
    scale_fn=lambda x: 1 / (2.0 ** (x - 1)),
  File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1399, in r_binary_op_wrapper
    y, x = maybe_promote_tensors(y, x)
  File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1335, in maybe_promote_tensors
    ops.convert_to_tensor(tensor, dtype, name="x"))
  File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\profiler\trace.py", line 163, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\framework\ops.py", line 1566, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\framework\tensor_conversion_registry.py", line 52, in _default_conversion_function
    return constant_op.constant(value, dtype, name=name)
  File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\framework\constant_op.py", line 271, in constant
    return _constant_impl(value, dtype, shape, name, verify_shape=False,
  File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\framework\constant_op.py", line 283, in _constant_impl
    return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
  File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\framework\constant_op.py", line 308, in _constant_eager_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "C:\Users\varun\mlenv\lib\site-packages\tensorflow\python\framework\constant_op.py", line 106, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
TypeError: Cannot convert 2.0 to EagerTensor of dtype int64

This error occurs irrespective of whether mixed precision was used or not. I was able to reproduce this error on both Linux and Windows environments. I attempted to use the nightly version of tensorflow-addons, but there was no luck. I also attempted defining my own lambda function for the scale_fn along with the CyclicalLearningRate class but the same error occurred.

This error is also confirmed to occur with the ExponentialCyclicLearningRate policy.
Only the TriangularCyclicLearningRate policy runs without errors of any sort.

The error only occurs when attempting to log training; if the Tensorboard callback is not included, training proceeds without any issues. Even if write_graph is set to True, this problem occurs.

Code to reproduce the issue
This code can reproduce the error -

import tensorflow as tf
import tensorflow_addons as tfa
import tensorflow_datasets as tfds
from tensorflow.keras import mixed_precision

#policy = mixed_precision.Policy('mixed_float16')
#mixed_precision.set_global_policy(policy)

(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)
def normalize_img(image, label):
  return tf.cast(image, tf.float32) / 255., label

ds_train = ds_train.map(normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.AUTOTUNE)
ds_test = ds_test.map(normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.AUTOTUNE)

lr = tfa.optimizers.Triangular2CyclicalLearningRate(initial_learning_rate=0.001,
                                                    maximal_learning_rate=0.1,
                                                    step_size=200,
                                                    scale_mode='iterations')
log_dir = './logs/log_now'
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, update_freq=100,
                                                                  write_graph=False)

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, dtype=tf.float32)
])
model.compile(
    optimizer=tf.keras.optimizers.SGD(lr),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()]
)

model.fit(ds_train, epochs=6, validation_data=ds_test, callbacks=[tensorboard_callback])

This is a very weird bug; it'd be great if there was a workaround of some kind that does not prevent generation of logs.

The text was updated successfully, but these errors were encountered:

vulkomilev · 2021-11-30T16:18:14Z

Okay I can reproduce the bug .I will look at it .

vulkomilev · 2021-11-30T16:49:16Z

Okay I have fixed it. I just need to merge the solution

varun-parthasarathy · 2021-12-20T17:55:50Z

@vulkomilev can I ask - why was this occurring, and how did you fix it?

romanovzky · 2022-09-21T08:59:32Z

I have this problem as well with ExponentialCyclicalLearningRate will there be a fix? This is incredibly off putting when running several experiments trying out optimiser and learning rate details...

varun-parthasarathy changed the title ~~Triangular2 cyclical learning rate does not work when logging with Tensorboard~~ Triangular2 and Exponential cyclical learning rates do not work when logging with Tensorboard Oct 31, 2021

varun-parthasarathy changed the title ~~Triangular2 and Exponential cyclical learning rates do not work when logging with Tensorboard~~ Triangular2/Exponential cyclical learning rates do not work when logging with Tensorboard Oct 31, 2021

ImSo3K mentioned this issue Jan 15, 2023

ExponentialCyclicalLearningRate - TypeError: Cannot convert 1.0 to EagerTensor of dtype int64 #2799

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triangular2/Exponential cyclical learning rates do not work when logging with Tensorboard #2593

Triangular2/Exponential cyclical learning rates do not work when logging with Tensorboard #2593

varun-parthasarathy commented Oct 31, 2021 •

edited

Loading

vulkomilev commented Nov 30, 2021

vulkomilev commented Nov 30, 2021

varun-parthasarathy commented Dec 20, 2021

romanovzky commented Sep 21, 2022

Triangular2/Exponential cyclical learning rates do not work when logging with Tensorboard #2593

Triangular2/Exponential cyclical learning rates do not work when logging with Tensorboard #2593

Comments

varun-parthasarathy commented Oct 31, 2021 • edited Loading

vulkomilev commented Nov 30, 2021

vulkomilev commented Nov 30, 2021

varun-parthasarathy commented Dec 20, 2021

romanovzky commented Sep 21, 2022

varun-parthasarathy commented Oct 31, 2021 •

edited

Loading