You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the warp-transducer successfully on other machines (Ubuntu 18.04) but on one, which is a CentOS, I am getting a Segmentation Fault right at the beginning of the training.
Now, I am not sure what is causing this. The only difference I can point out is that the CentOS machine uses gcc/g++ 4.8.5 (also tried 5.3.1) instead of 5.4.x on my other machines. Could this be the reason for that issue?
Compilation Output
$ CUDA_HOME=/usr/local/cuda ./scripts/build_rnnt.sh
Removing existing build/ directory ..
#################################################################
Running cmake for warp-transducer ..
-- The C compiler identification is GNU 4.8.5
-- The CXX compiler identification is GNU 4.8.5
-- Check for working C compiler: /usr/lib64/ccache/cc
-- Check for working C compiler: /usr/lib64/ccache/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/lib64/ccache/c++
-- Check for working CXX compiler: /usr/lib64/ccache/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found CUDA: /usr/local/cuda (found version "11.0")
-- cuda found TRUE
-- Building shared library with GPU support
-- Configuring done
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:
CMAKE_CXX_COMPILER_LAUNCHER
CMAKE_C_COMPILER_LAUNCHER
-- Build files have been written to: /home/sfalk/workspaces/git/speech-v2/warp-transducer/build
#################################################################
Running make ..
[ 11%] Building NVCC (Device) object CMakeFiles/warprnnt.dir/src/./warprnnt_generated_rnnt_entrypoint.cu.o
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
Scanning dependencies of target warprnnt
Linking CXX shared library libwarprnnt.so
[ 11%] Built target warprnnt
Scanning dependencies of target test_cpu
[ 22%] Building CXX object CMakeFiles/test_cpu.dir/tests/test_cpu.cpp.o
[ 33%] Building CXX object CMakeFiles/test_cpu.dir/tests/random.cpp.o
Linking CXX executable test_cpu
[ 33%] Built target test_cpu
[ 44%] Building NVCC (Device) object CMakeFiles/test_gpu.dir/tests/./test_gpu_generated_test_gpu.cu.o
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
Scanning dependencies of target test_gpu
[ 55%] Building CXX object CMakeFiles/test_gpu.dir/tests/random.cpp.o
Linking CXX executable test_gpu
[ 55%] Built target test_gpu
Scanning dependencies of target test_time
[ 66%] Building CXX object CMakeFiles/test_time.dir/tests/test_time.cpp.o
[ 77%] Building CXX object CMakeFiles/test_time.dir/tests/random.cpp.o
Linking CXX executable test_time
[ 77%] Built target test_time
[ 88%] Building NVCC (Device) object CMakeFiles/test_time_gpu.dir/tests/./test_time_gpu_generated_test_time.cu.o
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
Scanning dependencies of target test_time_gpu
[100%] Building CXX object CMakeFiles/test_time_gpu.dir/tests/random.cpp.o
Linking CXX executable test_time_gpu
[100%] Built target test_time_gpu
#################################################################
Running setup.py for tensorflow bindings ..
2021-03-11 08:32:27.494442: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
setup.py:63: UserWarning: Assuming tensorflow was compiled without C++11 ABI. It is generally true if you are using binary pip package. If you compiled tensorflow from source with gcc >= 5 and didn't set -D_GLIBCXX_USE_CXX11_ABI=0 during compilation, you need to set environment variable TF_CXX11_ABI=1 when compiling this bindings. Also be sure to touch some files in src to trigger recompilation. Also, you need to set (or unsed) this environment variable if getting undefined symbol: _ZN10tensorflow... errors
warnings.warn("Assuming tensorflow was compiled without C++11 ABI. "
running install
running bdist_egg
running egg_info
writing warprnnt_tensorflow.egg-info/PKG-INFO
writing dependency_links to warprnnt_tensorflow.egg-info/dependency_links.txt
writing top-level names to warprnnt_tensorflow.egg-info/top_level.txt
reading manifest file 'warprnnt_tensorflow.egg-info/SOURCES.txt'
writing manifest file 'warprnnt_tensorflow.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/warprnnt_tensorflow
copying build/lib.linux-x86_64-3.8/warprnnt_tensorflow/__init__.py -> build/bdist.linux-x86_64/egg/warprnnt_tensorflow
copying build/lib.linux-x86_64-3.8/warprnnt_tensorflow/kernels.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/warprnnt_tensorflow
byte-compiling build/bdist.linux-x86_64/egg/warprnnt_tensorflow/__init__.py to __init__.cpython-38.pyc
creating stub loader for warprnnt_tensorflow/kernels.cpython-38-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/warprnnt_tensorflow/kernels.py to kernels.cpython-38.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying warprnnt_tensorflow.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying warprnnt_tensorflow.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying warprnnt_tensorflow.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying warprnnt_tensorflow.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
warprnnt_tensorflow.__pycache__.__init__.cpython-38: module references __path__
warprnnt_tensorflow.__pycache__.kernels.cpython-38: module references __file__
creating 'dist/warprnnt_tensorflow-0.1-py3.8-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing warprnnt_tensorflow-0.1-py3.8-linux-x86_64.egg
creating /home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/warprnnt_tensorflow-0.1-py3.8-linux-x86_64.egg
Extracting warprnnt_tensorflow-0.1-py3.8-linux-x86_64.egg to /home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages
Adding warprnnt-tensorflow 0.1 to easy-install.pth file
Installed /home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/warprnnt_tensorflow-0.1-py3.8-linux-x86_64.egg
Processing dependencies for warprnnt-tensorflow==0.1
Finished processing dependencies for warprnnt-tensorflow==0.1
(asr2) [sfalk@everestspeech-v2]$ python -c "from warprnnt_tensorflow import rnnt_loss"
2021-03-11 08:32:42.757357: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-11 08:32:44.642952: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Segmentation Fault
Epoch 1/5000
Fatal Python error: Segmentation fault
Current thread 0x00007f8ea1ffa700 (most recent call first):
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1853 in _create_c_op
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 2015 in __init__
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 3528 in _create_op_internal
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 590 in _create_op_internal
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 748 in _apply_op_helper
File "<string>", line 80 in warp_rnnt
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/warprnnt_tensorflow-0.1-py3.8-linux-x86_64.egg/warprnnt_tensorflow/__init__.py", line 32 in rnnt_loss
File "/home/sfalk/workspaces/git/speech-v2/asr/model/transducer/__init__.py", line 252 in rnnt_loss_wrapper
File "/home/sfalk/workspaces/git/speech-v2/asr/model/transducer/__init__.py", line 209 in rnnt_gradient
File "/home/sfalk/workspaces/git/speech-v2/asr/model/transducer/__init__.py", line 163 in train_step
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 788 in run_step
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 478 in _call_unconverted
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 396 in converted_call
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 667 in wrapper
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_run.py", line 323 in run
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/threading.py", line 932 in _bootstrap_inner
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/threading.py", line 890 in _bootstrap
Thread 0x00007f9389663740 (most recent call first):
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/threading.py", line 302 in wait
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/threading.py", line 558 in wait
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_run.py", line 196 in _call_for_each_replica
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_run.py", line 93 in call_for_each_replica
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 628 in _call_for_each_replica
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 2730 in call_for_each_replica
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1259 in run
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 795 in step_function
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 479 in _call_unconverted
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 396 in converted_call
File "/tmp/tmpembj6sob.py", line 16 in tf__train_function
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 459 in converted_call
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 966 in wrapper
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 634 in wrapped_fn
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 990 in func_graph_from_py_func
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3196 in _create_graph_function
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3361 in _maybe_define_function
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2969 in _get_concrete_function_internal_garbage_collected
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 725 in _initialize
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 871 in _call
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 828 in __call__
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1100 in fit
File "asr/bin/train_keras.py", line 256 in run_training
File "asr/bin/train_keras.py", line 292 in main
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/absl/app.py", line 251 in _run_main
File "/home/sfalk/miniconda3/envs/asr2/lib/python3.8/site-packages/absl/app.py", line 300 in run
File "asr/bin/train_keras.py", line 381 in <module>
Segmentation fault (core dumped)
The text was updated successfully, but these errors were encountered:
I am using the
warp-transducer
successfully on other machines (Ubuntu 18.04) but on one, which is a CentOS, I am getting a Segmentation Fault right at the beginning of the training.Now, I am not sure what is causing this. The only difference I can point out is that the CentOS machine uses gcc/g++ 4.8.5 (also tried 5.3.1) instead of 5.4.x on my other machines. Could this be the reason for that issue?
Compilation Output
Segmentation Fault
The text was updated successfully, but these errors were encountered: