Training a model with --fp16_run does not work #25

dyc3 · 2020-05-09T20:40:50Z

Logs

2020-05-09 09:07:55.688678: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-05-09 09:07:55.688739: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-05-09 09:07:55.688748: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-05-09 09:07:56.202999: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Using 16-bit float precision.
2020-05-09 09:07:56.205958: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-09 09:07:56.226625: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.227180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:2d:00.0 name: GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.845GHz coreCount: 48 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 462.00GiB/s
2020-05-09 09:07:56.227205: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-09 09:07:56.228990: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-09 09:07:56.230509: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-09 09:07:56.230693: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-09 09:07:56.232219: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-09 09:07:56.232921: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-09 09:07:56.235715: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-09 09:07:56.235830: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.236162: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.236420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-09 09:07:56.236714: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-05-09 09:07:56.267004: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3799930000 Hz
2020-05-09 09:07:56.268490: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x60950d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-09 09:07:56.268514: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-05-09 09:07:56.369166: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.370172: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x60fa9e0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-09 09:07:56.370195: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2080 SUPER, Compute Capability 7.5
2020-05-09 09:07:56.370362: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.371040: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:2d:00.0 name: GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.845GHz coreCount: 48 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 462.00GiB/s
2020-05-09 09:07:56.371082: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-09 09:07:56.371097: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-09 09:07:56.371109: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-09 09:07:56.371121: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-09 09:07:56.371133: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-09 09:07:56.371144: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-09 09:07:56.371156: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-09 09:07:56.371222: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.371764: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.372356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-09 09:07:56.373863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-09 09:07:56.373881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-05-09 09:07:56.373890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-05-09 09:07:56.374008: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.374597: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.376374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4858 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 SUPER, pci bus id: 0000:2d:00.0, compute capability: 7.5)
W0509 09:07:56.762424 139620787169088 run_common_voice.py:106] Physical devices cannot be modified after being initialized
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I0509 09:07:56.765624 139620787169088 mirrored_strategy.py:435] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
WARNING:tensorflow:From /home/carson/Documents/code/other/rnnt-speech-recognition/model.py:57: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
W0509 09:07:57.166774 139620787169088 deprecation.py:323] From /home/carson/Documents/code/other/rnnt-speech-recognition/model.py:57: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb70059240>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.167124 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb70059240>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:From /home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.add_weight` method instead.
W0509 09:07:57.169332 139620787169088 deprecation.py:323] From /home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.add_weight` method instead.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb700597f0>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.229205 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb700597f0>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb602a64a8>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.382876 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb602a64a8>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb60228c18>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.421375 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb60228c18>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb6019d198>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.460607 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb6019d198>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb601814e0>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.556079 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb601814e0>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb600cdc50>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.593875 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb600cdc50>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb600b3128>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.632421 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb600b3128>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb4c160080>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.943511 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb4c160080>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb0c15f0f0>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.987845 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb0c15f0f0>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
I0509 09:07:58.163432 139620787169088 run_common_voice.py:432] Using word-piece encoder with vocab size: 4088
Model: "encoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         [(None, None, 80)]        0
_________________________________________________________________
rnn (RNN)                    (None, None, 640)         7217152
_________________________________________________________________
layer_normalization (LayerNo (None, None, 640)         1280
_________________________________________________________________
rnn_1 (RNN)                  (None, None, 640)         11804672
_________________________________________________________________
layer_normalization_1 (Layer (None, None, 640)         1280
_________________________________________________________________
time_reduction (TimeReductio (None, None, 1280)        0
_________________________________________________________________
rnn_2 (RNN)                  (None, None, 640)         17047552
_________________________________________________________________
layer_normalization_2 (Layer (None, None, 640)         1280
_________________________________________________________________
rnn_3 (RNN)                  (None, None, 640)         11804672
_________________________________________________________________
layer_normalization_3 (Layer (None, None, 640)         1280
_________________________________________________________________
rnn_4 (RNN)                  (None, None, 640)         11804672
_________________________________________________________________
layer_normalization_4 (Layer (None, None, 640)         1280
_________________________________________________________________
rnn_5 (RNN)                  (None, None, 640)         11804672
_________________________________________________________________
layer_normalization_5 (Layer (None, None, 640)         1280
_________________________________________________________________
rnn_6 (RNN)                  (None, None, 640)         11804672
_________________________________________________________________
layer_normalization_6 (Layer (None, None, 640)         1280
_________________________________________________________________
rnn_7 (RNN)                  (None, None, 640)         11804672
_________________________________________________________________
layer_normalization_7 (Layer (None, None, 640)         1280
=================================================================
Total params: 95,102,976
Trainable params: 95,102,976
Non-trainable params: 0
_________________________________________________________________
Model: "prediction_network"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_2 (InputLayer)         [(None, None)]            0
_________________________________________________________________
embedding (Embedding)        (None, None, 384)         1569792
_________________________________________________________________
rnn_8 (RNN)                  (None, None, 640)         9707520
_________________________________________________________________
layer_normalization_8 (Layer (None, None, 640)         1280
_________________________________________________________________
rnn_9 (RNN)                  (None, None, 640)         11804672
_________________________________________________________________
layer_normalization_9 (Layer (None, None, 640)         1280
=================================================================
Total params: 23,084,544
Trainable params: 23,084,544
Non-trainable params: 0
_________________________________________________________________
Model: "transducer"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
mel_specs (InputLayer)          [(None, None, 80)]   0
__________________________________________________________________________________________________
pred_inp (InputLayer)           [(None, None)]       0
__________________________________________________________________________________________________
encoder (Model)                 (None, None, 640)    95102976    mel_specs[0][0]
__________________________________________________________________________________________________
prediction_network (Model)      (None, None, 640)    23084544    pred_inp[0][0]
__________________________________________________________________________________________________
tf_op_layer_Shape (TensorFlowOp [(3,)]               0           prediction_network[1][0]
__________________________________________________________________________________________________
tf_op_layer_Shape_1 (TensorFlow [(3,)]               0           encoder[1][0]
__________________________________________________________________________________________________
tf_op_layer_strided_slice (Tens [()]                 0           tf_op_layer_Shape[0][0]
__________________________________________________________________________________________________
tf_op_layer_strided_slice_1 (Te [()]                 0           tf_op_layer_Shape_1[0][0]
__________________________________________________________________________________________________
tf_op_layer_ExpandDims (TensorF [(None, None, 1, 640 0           encoder[1][0]
__________________________________________________________________________________________________
tf_op_layer_stack_inp_enc (Tens [(4,)]               0           tf_op_layer_strided_slice[0][0]
__________________________________________________________________________________________________
tf_op_layer_ExpandDims_1 (Tenso [(None, 1, None, 640 0           prediction_network[1][0]
__________________________________________________________________________________________________
tf_op_layer_stack_pred_out (Ten [(4,)]               0           tf_op_layer_strided_slice_1[0][0]
__________________________________________________________________________________________________
tf_op_layer_Tile (TensorFlowOpL [(None, None, None,  0           tf_op_layer_ExpandDims[0][0]
                                                                 tf_op_layer_stack_inp_enc[0][0]
__________________________________________________________________________________________________
tf_op_layer_Tile_1 (TensorFlowO [(None, None, None,  0           tf_op_layer_ExpandDims_1[0][0]
                                                                 tf_op_layer_stack_pred_out[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, None, None, 1 0           tf_op_layer_Tile[0][0]
                                                                 tf_op_layer_Tile_1[0][0]
__________________________________________________________________________________________________
dense (Dense)                   (None, None, None, 6 819840      concatenate[0][0]
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, None, None, 4 2620408     dense[0][0]
==================================================================================================
Total params: 121,627,768
Trainable params: 121,627,768
Non-trainable params: 0
__________________________________________________________________________________________________
Starting training.
Performing evaluation.
WARNING:tensorflow:AutoGraph could not transform <function warp_rnnt at 0x7efb72f16730> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.
W0509 09:07:59.091631 139604069963520 ag_logging.py:146] AutoGraph could not transform <function warp_rnnt at 0x7efb72f16730> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.
INFO:tensorflow:Error reported to Coordinator: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.
Traceback (most recent call last):
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 468, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1290, in convert_to_tensor
    (dtype.name, value.dtype.name, value))
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype float16: <tf.Tensor 'transducer/dense_1/BiasAdd:0' shape=(32, 162, 37, 4088) dtype=float16>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 522, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 332, in _call_unconverted
    return f(*args)
  File "<string>", line 81, in warp_rnnt
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 491, in _apply_op_helper
    (prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 468, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1290, in convert_to_tensor
    (dtype.name, value.dtype.name, value))
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype float16: <tf.Tensor 'transducer/dense_1/BiasAdd:0' shape=(32, 162, 37, 4088) dtype=float16>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/training/coordinator.py", line 297, in stop_on_exception
    yield
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 917, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/tmp/tmpd1pqtii1.py", line 26, in step_fn
    loss = ag__.converted_call(loss_fn, (labels, outputs), dict(spec_lengths=spec_lengths, label_lengths=label_lengths), fscope_1)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 565, in converted_call
    result = converted_f(*effective_args, **kwargs)
  File "/tmp/tmpv50tjgtx.py", line 29, in tf___loss_fn
    loss = ag__.converted_call(rnnt_loss, (y_pred, y_true, spec_lengths, label_lengths - 1), None, fscope)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 567, in converted_call
    result = converted_f(*effective_args)
  File "/tmp/tmpa6vi89_4.py", line 32, in tf__rnnt_loss
    loss, _ = ag__.converted_call(_warprnnt.warp_rnnt, (acts, labels, input_lengths, label_lengths, blank_label), None, fscope)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 560, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 332, in _call_unconverted
    return f(*args)
  File "<string>", line 81, in warp_rnnt
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 491, in _apply_op_helper
    (prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.
I0509 09:07:59.092102 139604069963520 coordinator.py:219] Error reported to Coordinator: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.
Traceback (most recent call last):
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 468, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1290, in convert_to_tensor
    (dtype.name, value.dtype.name, value))
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype float16: <tf.Tensor 'transducer/dense_1/BiasAdd:0' shape=(32, 162, 37, 4088) dtype=float16>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 522, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 332, in _call_unconverted
    return f(*args)
  File "<string>", line 81, in warp_rnnt
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 491, in _apply_op_helper
    (prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 468, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1290, in convert_to_tensor
    (dtype.name, value.dtype.name, value))
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype float16: <tf.Tensor 'transducer/dense_1/BiasAdd:0' shape=(32, 162, 37, 4088) dtype=float16>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/training/coordinator.py", line 297, in stop_on_exception
    yield
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 917, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/tmp/tmpd1pqtii1.py", line 26, in step_fn
    loss = ag__.converted_call(loss_fn, (labels, outputs), dict(spec_lengths=spec_lengths, label_lengths=label_lengths), fscope_1)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 565, in converted_call
    result = converted_f(*effective_args, **kwargs)
  File "/tmp/tmpv50tjgtx.py", line 29, in tf___loss_fn
    loss = ag__.converted_call(rnnt_loss, (y_pred, y_true, spec_lengths, label_lengths - 1), None, fscope)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 567, in converted_call
    result = converted_f(*effective_args)
  File "/tmp/tmpa6vi89_4.py", line 32, in tf__rnnt_loss
    loss, _ = ag__.converted_call(_warprnnt.warp_rnnt, (acts, labels, input_lengths, label_lengths, blank_label), None, fscope)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 560, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 332, in _call_unconverted
    return f(*args)
  File "<string>", line 81, in warp_rnnt
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 491, in _apply_op_helper
    (prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.
2020-05-09 09:07:59.108461: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled
2020-05-09 09:07:59.109964: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled
Traceback (most recent call last):
  File "run_common_voice.py", line 537, in <module>
    app.run(main)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "run_common_voice.py", line 497, in main
    gpus=gpus)
  File "run_common_voice.py", line 253, in run_training
    train()
  File "run_common_voice.py", line 203, in train
    gpus=gpus)
  File "run_common_voice.py", line 307, in run_evaluate
    loss, metrics_results = eval_step(inputs)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 568, in __call__
    result = self._call(*args, **kwds)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 615, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 497, in _initialize
    *args, **kwds))
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2389, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2703, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2593, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 978, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 439, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 968, in wrapper
    raise e.ag_error_metadata.to_exception(e)
TypeError: in converted code:

    run_common_voice.py:277 step_fn  *
        loss = loss_fn(labels, outputs,
    /home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py:763 experimental_run_v2
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /home/carson/Documents/code/other/rnnt-speech-recognition/utils/loss.py:32 _loss_fn  *
        loss = rnnt_loss(y_pred, y_true,
    /home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/warprnnt_tensorflow-0.1-py3.6-linux-x86_64.egg/warprnnt_tensorflow/__init__.py:32 rnnt_loss  *
        loss, _ = _warprnnt.warp_rnnt(acts, labels, input_lengths,
    <string>:81 warp_rnnt

    /home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py:491 _apply_op_helper
        (prefix, dtypes.as_dtype(input_arg.type).name))

    TypeError: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.

The text was updated successfully, but these errors were encountered:

noahchalifour · 2020-05-14T17:58:48Z

@dyc3 From what I can tell fp16_run doesn't work right now because the RNNT loss function does not support float16, I am not very good with C/C++ so if someone else would add float16 support to the warp-transducer package then I will gladly merge it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training a model with --fp16_run does not work #25

Training a model with --fp16_run does not work #25

dyc3 commented May 9, 2020

noahchalifour commented May 14, 2020

Training a model with --fp16_run does not work #25

Training a model with --fp16_run does not work #25

Comments

dyc3 commented May 9, 2020

noahchalifour commented May 14, 2020