Skip to content
This repository has been archived by the owner on Sep 6, 2022. It is now read-only.

Training a model with --fp16_run does not work #25

Open
dyc3 opened this issue May 9, 2020 · 1 comment
Open

Training a model with --fp16_run does not work #25

dyc3 opened this issue May 9, 2020 · 1 comment

Comments

@dyc3
Copy link

dyc3 commented May 9, 2020

Logs

2020-05-09 09:07:55.688678: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-05-09 09:07:55.688739: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-05-09 09:07:55.688748: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-05-09 09:07:56.202999: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Using 16-bit float precision.
2020-05-09 09:07:56.205958: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-09 09:07:56.226625: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.227180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:2d:00.0 name: GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.845GHz coreCount: 48 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 462.00GiB/s
2020-05-09 09:07:56.227205: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-09 09:07:56.228990: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-09 09:07:56.230509: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-09 09:07:56.230693: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-09 09:07:56.232219: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-09 09:07:56.232921: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-09 09:07:56.235715: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-09 09:07:56.235830: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.236162: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.236420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-09 09:07:56.236714: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-05-09 09:07:56.267004: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3799930000 Hz
2020-05-09 09:07:56.268490: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x60950d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-09 09:07:56.268514: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-05-09 09:07:56.369166: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.370172: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x60fa9e0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-09 09:07:56.370195: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2080 SUPER, Compute Capability 7.5
2020-05-09 09:07:56.370362: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.371040: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:2d:00.0 name: GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.845GHz coreCount: 48 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 462.00GiB/s
2020-05-09 09:07:56.371082: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-09 09:07:56.371097: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-09 09:07:56.371109: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-09 09:07:56.371121: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-09 09:07:56.371133: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-09 09:07:56.371144: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-09 09:07:56.371156: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-09 09:07:56.371222: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.371764: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.372356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-09 09:07:56.373863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-09 09:07:56.373881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-05-09 09:07:56.373890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-05-09 09:07:56.374008: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.374597: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-09 09:07:56.376374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4858 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 SUPER, pci bus id: 0000:2d:00.0, compute capability: 7.5)
W0509 09:07:56.762424 139620787169088 run_common_voice.py:106] Physical devices cannot be modified after being initialized
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I0509 09:07:56.765624 139620787169088 mirrored_strategy.py:435] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
WARNING:tensorflow:From /home/carson/Documents/code/other/rnnt-speech-recognition/model.py:57: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
W0509 09:07:57.166774 139620787169088 deprecation.py:323] From /home/carson/Documents/code/other/rnnt-speech-recognition/model.py:57: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb70059240>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.167124 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb70059240>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:From /home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.add_weight` method instead.
W0509 09:07:57.169332 139620787169088 deprecation.py:323] From /home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.add_weight` method instead.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb700597f0>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.229205 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb700597f0>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb602a64a8>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.382876 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb602a64a8>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb60228c18>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.421375 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb60228c18>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb6019d198>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.460607 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb6019d198>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb601814e0>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.556079 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb601814e0>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb600cdc50>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.593875 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb600cdc50>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb600b3128>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.632421 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb600b3128>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb4c160080>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.943511 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb4c160080>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb0c15f0f0>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
W0509 09:07:57.987845 139620787169088 rnn_cell_impl.py:904] <tensorflow.python.ops.rnn_cell_impl.LSTMCell object at 0x7efb0c15f0f0>: Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU.
I0509 09:07:58.163432 139620787169088 run_common_voice.py:432] Using word-piece encoder with vocab size: 4088
Model: "encoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         [(None, None, 80)]        0
_________________________________________________________________
rnn (RNN)                    (None, None, 640)         7217152
_________________________________________________________________
layer_normalization (LayerNo (None, None, 640)         1280
_________________________________________________________________
rnn_1 (RNN)                  (None, None, 640)         11804672
_________________________________________________________________
layer_normalization_1 (Layer (None, None, 640)         1280
_________________________________________________________________
time_reduction (TimeReductio (None, None, 1280)        0
_________________________________________________________________
rnn_2 (RNN)                  (None, None, 640)         17047552
_________________________________________________________________
layer_normalization_2 (Layer (None, None, 640)         1280
_________________________________________________________________
rnn_3 (RNN)                  (None, None, 640)         11804672
_________________________________________________________________
layer_normalization_3 (Layer (None, None, 640)         1280
_________________________________________________________________
rnn_4 (RNN)                  (None, None, 640)         11804672
_________________________________________________________________
layer_normalization_4 (Layer (None, None, 640)         1280
_________________________________________________________________
rnn_5 (RNN)                  (None, None, 640)         11804672
_________________________________________________________________
layer_normalization_5 (Layer (None, None, 640)         1280
_________________________________________________________________
rnn_6 (RNN)                  (None, None, 640)         11804672
_________________________________________________________________
layer_normalization_6 (Layer (None, None, 640)         1280
_________________________________________________________________
rnn_7 (RNN)                  (None, None, 640)         11804672
_________________________________________________________________
layer_normalization_7 (Layer (None, None, 640)         1280
=================================================================
Total params: 95,102,976
Trainable params: 95,102,976
Non-trainable params: 0
_________________________________________________________________
Model: "prediction_network"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_2 (InputLayer)         [(None, None)]            0
_________________________________________________________________
embedding (Embedding)        (None, None, 384)         1569792
_________________________________________________________________
rnn_8 (RNN)                  (None, None, 640)         9707520
_________________________________________________________________
layer_normalization_8 (Layer (None, None, 640)         1280
_________________________________________________________________
rnn_9 (RNN)                  (None, None, 640)         11804672
_________________________________________________________________
layer_normalization_9 (Layer (None, None, 640)         1280
=================================================================
Total params: 23,084,544
Trainable params: 23,084,544
Non-trainable params: 0
_________________________________________________________________
Model: "transducer"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
mel_specs (InputLayer)          [(None, None, 80)]   0
__________________________________________________________________________________________________
pred_inp (InputLayer)           [(None, None)]       0
__________________________________________________________________________________________________
encoder (Model)                 (None, None, 640)    95102976    mel_specs[0][0]
__________________________________________________________________________________________________
prediction_network (Model)      (None, None, 640)    23084544    pred_inp[0][0]
__________________________________________________________________________________________________
tf_op_layer_Shape (TensorFlowOp [(3,)]               0           prediction_network[1][0]
__________________________________________________________________________________________________
tf_op_layer_Shape_1 (TensorFlow [(3,)]               0           encoder[1][0]
__________________________________________________________________________________________________
tf_op_layer_strided_slice (Tens [()]                 0           tf_op_layer_Shape[0][0]
__________________________________________________________________________________________________
tf_op_layer_strided_slice_1 (Te [()]                 0           tf_op_layer_Shape_1[0][0]
__________________________________________________________________________________________________
tf_op_layer_ExpandDims (TensorF [(None, None, 1, 640 0           encoder[1][0]
__________________________________________________________________________________________________
tf_op_layer_stack_inp_enc (Tens [(4,)]               0           tf_op_layer_strided_slice[0][0]
__________________________________________________________________________________________________
tf_op_layer_ExpandDims_1 (Tenso [(None, 1, None, 640 0           prediction_network[1][0]
__________________________________________________________________________________________________
tf_op_layer_stack_pred_out (Ten [(4,)]               0           tf_op_layer_strided_slice_1[0][0]
__________________________________________________________________________________________________
tf_op_layer_Tile (TensorFlowOpL [(None, None, None,  0           tf_op_layer_ExpandDims[0][0]
                                                                 tf_op_layer_stack_inp_enc[0][0]
__________________________________________________________________________________________________
tf_op_layer_Tile_1 (TensorFlowO [(None, None, None,  0           tf_op_layer_ExpandDims_1[0][0]
                                                                 tf_op_layer_stack_pred_out[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, None, None, 1 0           tf_op_layer_Tile[0][0]
                                                                 tf_op_layer_Tile_1[0][0]
__________________________________________________________________________________________________
dense (Dense)                   (None, None, None, 6 819840      concatenate[0][0]
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, None, None, 4 2620408     dense[0][0]
==================================================================================================
Total params: 121,627,768
Trainable params: 121,627,768
Non-trainable params: 0
__________________________________________________________________________________________________
Starting training.
Performing evaluation.
WARNING:tensorflow:AutoGraph could not transform <function warp_rnnt at 0x7efb72f16730> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.
W0509 09:07:59.091631 139604069963520 ag_logging.py:146] AutoGraph could not transform <function warp_rnnt at 0x7efb72f16730> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.
INFO:tensorflow:Error reported to Coordinator: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.
Traceback (most recent call last):
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 468, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1290, in convert_to_tensor
    (dtype.name, value.dtype.name, value))
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype float16: <tf.Tensor 'transducer/dense_1/BiasAdd:0' shape=(32, 162, 37, 4088) dtype=float16>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 522, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 332, in _call_unconverted
    return f(*args)
  File "<string>", line 81, in warp_rnnt
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 491, in _apply_op_helper
    (prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 468, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1290, in convert_to_tensor
    (dtype.name, value.dtype.name, value))
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype float16: <tf.Tensor 'transducer/dense_1/BiasAdd:0' shape=(32, 162, 37, 4088) dtype=float16>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/training/coordinator.py", line 297, in stop_on_exception
    yield
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 917, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/tmp/tmpd1pqtii1.py", line 26, in step_fn
    loss = ag__.converted_call(loss_fn, (labels, outputs), dict(spec_lengths=spec_lengths, label_lengths=label_lengths), fscope_1)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 565, in converted_call
    result = converted_f(*effective_args, **kwargs)
  File "/tmp/tmpv50tjgtx.py", line 29, in tf___loss_fn
    loss = ag__.converted_call(rnnt_loss, (y_pred, y_true, spec_lengths, label_lengths - 1), None, fscope)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 567, in converted_call
    result = converted_f(*effective_args)
  File "/tmp/tmpa6vi89_4.py", line 32, in tf__rnnt_loss
    loss, _ = ag__.converted_call(_warprnnt.warp_rnnt, (acts, labels, input_lengths, label_lengths, blank_label), None, fscope)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 560, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 332, in _call_unconverted
    return f(*args)
  File "<string>", line 81, in warp_rnnt
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 491, in _apply_op_helper
    (prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.
I0509 09:07:59.092102 139604069963520 coordinator.py:219] Error reported to Coordinator: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.
Traceback (most recent call last):
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 468, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1290, in convert_to_tensor
    (dtype.name, value.dtype.name, value))
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype float16: <tf.Tensor 'transducer/dense_1/BiasAdd:0' shape=(32, 162, 37, 4088) dtype=float16>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 522, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 332, in _call_unconverted
    return f(*args)
  File "<string>", line 81, in warp_rnnt
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 491, in _apply_op_helper
    (prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 468, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1290, in convert_to_tensor
    (dtype.name, value.dtype.name, value))
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype float16: <tf.Tensor 'transducer/dense_1/BiasAdd:0' shape=(32, 162, 37, 4088) dtype=float16>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/training/coordinator.py", line 297, in stop_on_exception
    yield
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 917, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/tmp/tmpd1pqtii1.py", line 26, in step_fn
    loss = ag__.converted_call(loss_fn, (labels, outputs), dict(spec_lengths=spec_lengths, label_lengths=label_lengths), fscope_1)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 565, in converted_call
    result = converted_f(*effective_args, **kwargs)
  File "/tmp/tmpv50tjgtx.py", line 29, in tf___loss_fn
    loss = ag__.converted_call(rnnt_loss, (y_pred, y_true, spec_lengths, label_lengths - 1), None, fscope)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 567, in converted_call
    result = converted_f(*effective_args)
  File "/tmp/tmpa6vi89_4.py", line 32, in tf__rnnt_loss
    loss, _ = ag__.converted_call(_warprnnt.warp_rnnt, (acts, labels, input_lengths, label_lengths, blank_label), None, fscope)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 560, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 332, in _call_unconverted
    return f(*args)
  File "<string>", line 81, in warp_rnnt
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 491, in _apply_op_helper
    (prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.
2020-05-09 09:07:59.108461: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled
2020-05-09 09:07:59.109964: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled
Traceback (most recent call last):
  File "run_common_voice.py", line 537, in <module>
    app.run(main)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "run_common_voice.py", line 497, in main
    gpus=gpus)
  File "run_common_voice.py", line 253, in run_training
    train()
  File "run_common_voice.py", line 203, in train
    gpus=gpus)
  File "run_common_voice.py", line 307, in run_evaluate
    loss, metrics_results = eval_step(inputs)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 568, in __call__
    result = self._call(*args, **kwds)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 615, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 497, in _initialize
    *args, **kwds))
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2389, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2703, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2593, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 978, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 439, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 968, in wrapper
    raise e.ag_error_metadata.to_exception(e)
TypeError: in converted code:

    run_common_voice.py:277 step_fn  *
        loss = loss_fn(labels, outputs,
    /home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py:763 experimental_run_v2
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /home/carson/Documents/code/other/rnnt-speech-recognition/utils/loss.py:32 _loss_fn  *
        loss = rnnt_loss(y_pred, y_true,
    /home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/warprnnt_tensorflow-0.1-py3.6-linux-x86_64.egg/warprnnt_tensorflow/__init__.py:32 rnnt_loss  *
        loss, _ = _warprnnt.warp_rnnt(acts, labels, input_lengths,
    <string>:81 warp_rnnt

    /home/carson/Documents/code/other/rnnt-speech-recognition/.env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py:491 _apply_op_helper
        (prefix, dtypes.as_dtype(input_arg.type).name))

    TypeError: Input 'acts' of 'WarpRNNT' Op has type float16 that does not match expected type of float32.

@noahchalifour
Copy link
Owner

@dyc3 From what I can tell fp16_run doesn't work right now because the RNNT loss function does not support float16, I am not very good with C/C++ so if someone else would add float16 support to the warp-transducer package then I will gladly merge it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants