Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault (core dumped) When run anything on gpu. #1024

Closed
vuquocan1987 opened this issue Jun 24, 2020 · 5 comments
Closed

Segmentation fault (core dumped) When run anything on gpu. #1024

vuquocan1987 opened this issue Jun 24, 2020 · 5 comments
Assignees

Comments

@vuquocan1987
Copy link

vuquocan1987 commented Jun 24, 2020

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • Ubuntu 18.04.04:
  • TensorFlow installed from : pip3 install tensorflow-rocm
  • TensorFlow version : 2.2.0
  • Python version: 3.7
  • GPU model and memory: rx580 8gb
  • rocm: 3.5.1

Describe the current behavior
When I tried to import any model, or creating any model that require gpus operation, I got the error `Segmentation fault (core dumped)
Describe the expected behavior
The program should run without error

Standalone code to reproduce the issue
import tensorflow as tf from tensorflow.keras.applications.resnet50 import ResNet50 model = ResNet50(weights='imagenet')

or
model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10) ])
both code snippet above crash when creating model with same error mentioned.

here is the full error stack:
2020-06-25 02:57:19.930061: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libhip_hcc.so 2020-06-25 02:57:19.985372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1579] Found device 0 with properties: pciBusID: 0000:01:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X] ROCm AMD GPU ISA: gfx803 coreClock: 1.34GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: -1B/s 2020-06-25 02:57:20.033347: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so 2020-06-25 02:57:20.034196: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so 2020-06-25 02:57:20.038766: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so 2020-06-25 02:57:20.038977: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so 2020-06-25 02:57:20.039037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 2020-06-25 02:57:20.039253: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA 2020-06-25 02:57:20.043716: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3600000000 Hz 2020-06-25 02:57:20.043918: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x556aef8ca0e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-06-25 02:57:20.043929: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-06-25 02:57:20.045297: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x556aef8cbba0 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices: 2020-06-25 02:57:20.045326: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Ellesmere [Radeon RX 470/480/570/570X/580/580X], AMDGPU ISA version: gfx803 2020-06-25 02:57:20.045435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1579] Found device 0 with properties: pciBusID: 0000:01:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X] ROCm AMD GPU ISA: gfx803 coreClock: 1.34GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: -1B/s 2020-06-25 02:57:20.045464: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so 2020-06-25 02:57:20.045475: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so 2020-06-25 02:57:20.045484: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so 2020-06-25 02:57:20.045493: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so 2020-06-25 02:57:20.045523: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 2020-06-25 02:57:20.777733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-06-25 02:57:20.777775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 2020-06-25 02:57:20.777780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N 2020-06-25 02:57:20.777909: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7399 MB memory) -> physical GPU (device: 0, name: Ellesmere [Radeon RX 470/480/570/570X/580/580X], pci bus id: 0000:01:00.0) Segmentation fault (core dumped)

@vuquocan1987
Copy link
Author

Here is the link to someone with same issue:
tensorflow#40751 (comment)

@ghost
Copy link

ghost commented Jun 24, 2020

same error here

@vuquocan1987
Copy link
Author

same error here

I am not sure why, but I solved the problem by setting the path for hip library:

ROCm/ROCm#1163 (comment)

@jerryyin
Copy link
Member

jerryyin commented Jun 25, 2020

Instead of setting the LD_LIBRARY_PATH, could you try the alternative to set ROCM_PATH and make sure you have the latest hip-rocclr, and see if it fix the issue?
$ sudo apt-get install hip-rocclr
$ export ROCM_PATH=/opt/rocm

@jerryyin jerryyin self-assigned this Jun 25, 2020
@jerryyin
Copy link
Member

jerryyin commented Jul 7, 2020

Closing the issue for inactivity.

@jerryyin jerryyin closed this as completed Jul 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants