Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow 2.0 AMD support #362

Closed
Cvikli opened this issue Mar 20, 2019 · 58 comments
Closed

Tensorflow 2.0 AMD support #362

Cvikli opened this issue Mar 20, 2019 · 58 comments
Assignees

Comments

@Cvikli
Copy link

Cvikli commented Mar 20, 2019

I would be curious if Tensorflow 2.0 works with AMD Radeon VII?

Also, if it is available, are there any benchmark comparison with 2080Ti on some standard network to see if we should invest in Radeon VII clusters?

@sunway513
Copy link

Hi @Cvikli , we are finalizing the 2.0-alpha docker image and will be available soon, please stay tuned.

@sunway513 sunway513 self-assigned this Mar 21, 2019
@sunway513
Copy link

Hi @Cvikli , we've pushed out the preview build docker image for TF2.0-alpha0:
rocm/tensorflow:tf2.0-alpha0-preview
Please help review it and let us know your feedback :-)
Here's the link to our dockerhub repo:
https://cloud.docker.com/u/rocm/repository/docker/rocm/tensorflow/general

@Cvikli
Copy link
Author

Cvikli commented Mar 23, 2019

Great!
Just ordered our first card for testing. :) If the delivery and tests go well, then I will be back with results by April 2.

Thank you for the fast work! I am really excited about it!

@dagamayank
Copy link

Please open a new issue if bugs are found with the 2.0 docker.

@Cvikli
Copy link
Author

Cvikli commented Apr 3, 2019

Sorry for opening the thread but I own you guys with a lot!

The RADEON VII's performance is crazy with tensorflow 2.0a.
In our tests, we reached close to the same speed like our 2080ti(about 10-15% less)! But the Radeon VII has more memory which was a bottleneck in our case. On this price this videocard has the best value to do machine learning we think that in our company!

We are glad to open our eyes towards AMD products, we are buying our first configuration which is 40% cheaper and as we measured capable to perform better in our scenario than our well optimised server configuration.

Thank you for all the work!

@briansp2020
Copy link

@Cvikli

We are glad to open our eyes towards AMD products, we are buying our first configuration which is 40% cheaper and as we measured capable to perform better in our scenario than our well optimised server configuration.

Could you give a bit more detail? How much faster is Radeon VII for your application? What type of mode are you running (CNN/RNN/GAN/etc.)? What processor are you running?

Just curious.

@sunway513
Copy link

Thank you @Cvikli , great to hear that your experiment went well and you are going to invest more on ROCm and AMD GPUs!

@Cvikli
Copy link
Author

Cvikli commented May 12, 2019

The system is something like this:

  • 1x ASRock x399 taichi
  • 1x AMD TR4 2950X
  • 1x Samsung 970 EVO 1TB M.2 PCIe MZ-V7E1T0BW
  • 4x SAPPHIRE Radeon VII
  • 2x G.SKILL FlareX 64GB
  • 1x Thermaltake Toughpower 1500W Gold
  • 1x FRYZEN fan
    The other system setup is close to the same, except it was with 4 NVidia 1080ti.

The result with RNN networks on 1 Radeon VII and 1080ti was close to the same.

Now after we switched over to 4 Radeon VII, we face two big scaling issue on convolutional networks.

  1. One of our computer has 4 AMD Radeon VII, but we can't have more than one calculation (without this error below) on the system if we would use two separate GPU card. The second calculation that is running on the other GPU writes this:
2019-05-12 15:28:04.632396: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 14.95G (16049923584 bytes) from device: hipError_t(1002)
2019-05-12 15:28:04.632456: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 13.45G (14444931072 bytes) from device: hipError_t(1002)
2019-05-12 15:28:04.632475: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 12.11G (13000437760 bytes) from device: hipError_t(1002)
... many lines like this
2019-05-12 15:36:58.756188: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 310.35M (325421568 bytes) from device: hipError_t(1002)
2019-05-12 15:36:58.756226: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 279.31M (292879616 bytes) from device: hipError_t(1002)
2019-05-12 15:36:58.756252: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 251.38M (263591680 bytes) from device: hipError_t(1002)
2019-05-12 15:36:58.756279: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 226.24M (237232640 bytes) from device: hipError_t(1002)
2019-05-12 15:36:58.756304: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 203.62M (213509376 bytes) from device: hipError_t(1002)
2019-05-12 15:36:58.756323: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 183.26M (192158464 bytes) from device: hipError_t(1002)
2019-05-12 15:36:58.756343: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 164.93M (172942848 bytes) from device: hipError_t(1002)
2019-05-12 15:37:01.337949: E tensorflow/stream_executor/rocm/rocm_driver.cc:493] failed to memset memory: HIP_ERROR_InvalidValue
Segmentation fault (core dumped)

We are pretty sure things should work, because it was working with NVidia 1080ti. However inspite of it writes, that it failed to allocate the memory, the whole program just start and somehow running normally I think.

Can this happen because of the docker image, we can't use separate GPUs for different runs?

  1. Comparing convolutional performance the 4AMD and 4Nvidia, difference got really huge because of cuDNN for Nvidia cards. We can get more than 10x performance from the 1080Ti than the Radeon VII card. We find this difference in speed a little too big at image recognition cuDNN, I can't believe that this should happen and the hardware shouldn't be able to achieve the same.

What do you guys think about this? Is this normal that we get 10x slower speed when it comes to cudNN? (For me cuDNN sounds totally a software with better arithmetic operations I guess, is it possible to improve on this?)

@sunway513
Copy link

Hi @Cvikli , let's step back a bit and look at your system configuration:

  • 4x SAPPHIRE Radeon VII
  • 2x G.SKILL FlareX 64GB
  • 1x Thermaltake Toughpower 1500W Gold

The typical gold workstation power supply would run at 87% efficiency at full load, therefore it can supposedly power up to 1307W.
TR 2950x TDP is measured at 180W, Radeon VII TDP is 300W, but the peak power consumption can go up to 321.8W (according to third-party measurement here).
Considering the other components on your workstation, the current 1500W is not sufficient for your system at full load. We'd recommend you to go for 1800W PSU or dual 1000W PSU for your system provide sufficient juices for 4 Radeon VII GPUs.

2019-05-12 15:28:04.632396: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 14.95G (16049923584 bytes) from device: hipError_t(1002)

The above error message indicates the target GPU device memory has already been allocated by the other processes.
There're a couple of solutions to expose only selected GPUs to the user process:

  1. Use HIP_VISIBLE_DEVICES environment variable to select the target GPUs for the process from the HIP level. e.g. use the following to select the first GPU:
  • export HIP_VISIBLE_DEVICES=0
  1. Use ROCR_VISIBLE_DEVICES environment variable to select the target GPUs from the ROCr (ROCm user-bit driver) level. e.g. the following to select the first GPU:
  • export ROCR_VISIBLE_DEVICES=0
  1. Pass selected GPU driver interfaces (/dev/dri/render#) )to Docker container. e.g. use the following docker run command option to select the first GPU:
  • sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri/renderD128 --group-add video
    Note you show see the following four interfaces for your 4xRadeon VII system:
    $ ls /dev/dri/render*
    /dev/dri/renderD128 /dev/dri/renderD129 /dev/dri/renderD130 /dev/dri/renderD131

We recommend approach #3, as that would isolate the GPUs at a relatively lower level of the ROCm stack.

For your concern on mGPU performance, could you provide the exact commands to reproduce your observations?

Just FYI, we have been actively running regressions tests for single node multi-GPU performance, and there's no mGPU performance regression issue reported for TF1.13 on ROCm2.4 release.
After you can resolve the concern on the power supply, for tf_cnn_benchmarks resnet50 as an example, you should be able to see near-linear scalability on FP32 using the following command with 4 GPUs:
TF_ROCM_FUSION_ENABLE=1 python3 tf_cnn_benchmarks.py --data_format=NCHW --batch_size=128 --model=resnet50 --optimizer=sgd --num_batches=100 --variable_update=replicated --nodistortions --gpu_thread_mode=gpu_shared --num_gpus=4 --all_reduce_spec=pscpu --print_training_accuracy=True --display_every=10

@Cvikli
Copy link
Author

Cvikli commented May 13, 2019

hank you for the 3 different ways to manage visible devices.
The second solution (with export ROCR_VISIBLE_DEVICES=0) WORKED like a charm for us!
Interestingly the third solution didn't restrict the available GPU devices in the docker container.

Ran some test on TF2.0 on ROCm2.4 and performance is still a lot lower than what an Nvidia 1080Ti can provide benchmarking on MobileNetv2, what bothers us yet a little.
To get some direction for the TF2.0 ROCm2.4, I thought I share these logs.
Before the calculations would start for a MobileNetV2:

2019-05-13 18:48:40.653042: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocblas.so
2019-05-13 18:48:40.683726: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libMIOpen.so
2019-05-13 18:48:44.998231: I tensorflow/core/kernels/conv_grad_input_ops.cc:997] running auto-tune for Backward-Data
2019-05-13 18:48:45.094061: I tensorflow/core/kernels/conv_grad_filter_ops.cc:886] running auto-tune for Backward-Filter
... 2x14 lines like this with Backward-Data and Backward-Filter
2019-05-13 18:48:48.854030: I tensorflow/core/kernels/conv_grad_input_ops.cc:997] running auto-tune for Backward-Data
2019-05-13 18:48:48.945517: I tensorflow/core/kernels/conv_grad_filter_ops.cc:886] running auto-tune for Backward-Filter
2019-05-13 18:48:49.207930: I tensorflow/core/kernels/conv_grad_input_ops.cc:997] running auto-tune for Backward-Data
2019-05-13 18:48:49.295100: I tensorflow/core/kernels/conv_grad_filter_ops.cc:886] running auto-tune for Backward-Filter
2019-05-13 18:48:50.639570: I tensorflow/core/kernels/conv_grad_filter_ops.cc:886] running auto-tune for Backward-Filter

So I pretty much feel like we are running some operations 19 times, which leads to 10-15x speed loss, but it is only a guess. If I can help in any other way let me know.

PS.: on TF2.0 ROCm2.4, I couldn't run the tf_cnn_benchmarks.py because missing tensorflow.contrib.

@sunway513
Copy link

Hi @Cvikli , glad the ROCr env var worked for you!
For approach #3, if you run ROCr level utils you should see the restricted access (e.g. /opt/rocm/bin/rocminfo); however, since rocm_smi uses different approaches to query the GPU status, you can still see all the GPUs using rocm_smi even you pass limited GPU device interfaces to docker container. Adding @jlgreathouse @y2kenny for awareness.

2019-05-13 18:48:44.998231: I tensorflow/core/kernels/conv_grad_input_ops.cc:997] running auto-tune for Backward-Data
2019-05-13 18:48:45.094061: I tensorflow/core/kernels/conv_grad_filter_ops.cc:886] running auto-tune for Backward-Filter

The above logs indicate the time spent there was actually for MIOpen to compile kernels, please refer to my previous comment here for reference.
Those are one-time effort, for the latter runs MIOpen will just pick the cached kernels under ~/.cache/miopen instead of compiling those again. If you have been using docker containers for the dev work, you can consider committing the docker container with MIOpen cache compiled so you can reuse those for later reference.

@sunway513
Copy link

Besides, if your application is built on TF1.x api, you might use the following TF1.13 release instead of using TF2.0 branch built with --config=v1:
rocm/tensorflow:rocm2.4-tf1.13-python3

@Cvikli
Copy link
Author

Cvikli commented May 23, 2019

We ported our code from tf2.0 to tf1.13 and run the MobileNetV2 implementation from tf.keras.applications on the configuration you suggested (TF1.13 on ROCm2.4 release), and we still see NO improvement in speed.
Nvidia 1080Ti still performs 5-10x faster. I don't know if it is, because cudnn or cuda is not availabe for Radeon cards, but this performance difference is pretty high.

@sunway513
Copy link

Hi @Cvikli , could you provide the exact steps to repro your observation?
FYI, Tensorflow-ROCm deploys the ROCm MIOpen library to accelerate the DL workloads, the repo is here:
https://github.com/ROCmSoftwarePlatform/MIOpen

@quantuminformation
Copy link

Anyone tested with the latest Macbook pros?

@quocdat32461997
Copy link

I run into the error "failed to allocate 14.95G (16049923584 bytes) from device: hipError_t(1002)" as above.
System info:
Intel® Xeon(R) CPU E5-2630 v2 @ 2.60GHz × 12
Radeon VII
1500 W PSU
ROCm installed with Tensorflow-rocm 1.13.1 (through pip3)

I have not tried install tensorflow-rocm through docker.

Any help?

@sunway513
Copy link

Hi @quocdat32461997 , can you try to set the following environment variables:
export HIP_HIDDEN_FREE_MEM=500
If it still fails, please create a new issue and provide more complete logs.

@quocdat32461997
Copy link

Problem solved by re-installing ROCm and Tensorflow-rocm. Proabably I did not install the ROCm properly. Thanks a lot.

@Cvikli
Copy link
Author

Cvikli commented Jun 11, 2019

Hey there!
I would like to know if there will be a new docker image with tensorflow==2.0.0b installed, because now still only alpha version is available for tf2.0.
By the way we ran the https://github.com/lambdal/lambda-tensorflow-benchmark tests, and the difference between an Nvidia and the Radeon cards are less then stated above.
If you are interested I can share the tests results here.

@sunway513
Copy link

Hi @Cvikli , we are preparing the TF2.0 beta release, it's currently under QA test coverage.
We'll update here after the new docker image is available.

@Cvikli
Copy link
Author

Cvikli commented Jun 11, 2019

You guys, you are crazy! I love it! :) Thank you for this speed!

@satvikpendem
Copy link

Looks like the link at the beginning of the thread redirects to https://hub.docker.com, here's the link I'm using to track releases: https://hub.docker.com/r/rocm/tensorflow/tags

@sunway513
Copy link

Hi @Cvikli , we have published the docker container for TF-ROCm 2.0 Beta1. Please kindly check it and let us know if you have any questions:
rocm/tensorflow:rocm2.5-tf2.0-beta1-config-v2

@ghost
Copy link

ghost commented Jun 21, 2019

Hi everyone,
when I run the rocm/tensorflow:rocm2.5-tf2.0-beta1-config-v2 docker container or any other container with tensorflow 2.0, trying to import tensorflow results in following error:
>>> import tensorflow as tf
Illegal instruction (core dumped)

I am using a rx 480 with rocm 2.5 and rocm with tensorflow 1.13 works fine.

@sunway513
Copy link

Hi @moonshine502 , I've tried a couple of samples using the rocm2.5-tf2.0-beta1-config-v2 docker image on my GFX803 node, those are working fine.
Could you provide the steps to reproduce your issue?

@ghost
Copy link

ghost commented Jun 22, 2019

Hi @sunway513,
thank you for your response.

Hardware: Intel Celeron G3900 (Skylake), AMD Radeon RX 480 (gfx803)
Software:

Issue:
Executing python3 -c "import tensorflow as tf" inside the docker results in
python3 -c "import tensorflow as tf"
Illegal instruction (core dumped)

I am guessing that this error is caused by the cpu not being compatible with the new tensorflow version. Could this be the case?

@dundir
Copy link

dundir commented Jun 25, 2019

@moonshine502 I'm running almost the exact same system setup and its able to load and train for me.

The only difference appears to be the CPU, or possibly the card. I'm using a Ryzen 5 2400G; everything else looks near the same. I'm using a RX560 14cu, which registers in linux as an RX480 (gfx803), ROCM 2.5.27.

I ran through all the steps for training a mnist dataset at the link below to confirm tf2.0 was actually working, the accuracy for the evaluation wasn't the best (~87.7%) vs (98%) but it was able to compute.

https://www.tensorflow.org/beta/tutorials/quickstart/beginner

Edit: included more info.

@ghost
Copy link

ghost commented Jun 25, 2019

Hi @dundir, @sunway513,

I am now pretty sure that the cause of the problem is my cpu which does not support avx instructions. It seems that previous versions of tensorflow with rocm were compiled without avx, because they work on my machine. So I may try to build tensorflow 2.0 without avx or get a new cpu.

Thank you for your help.

@bionicles
Copy link

Memory being the bottleneck, can we do bfloat16 and int8, float8, float16? Just curious

@salmanulhaq
Copy link

salmanulhaq commented Nov 28, 2019

We ported our code from tf2.0 to tf1.13 and run the MobileNetV2 implementation from tf.keras.applications on the configuration you suggested (TF1.13 on ROCm2.4 release), and we still see NO improvement in speed.
Nvidia 1080Ti still performs 5-10x faster. I don't know if it is, because cudnn or cuda is not availabe for Radeon cards, but this performance difference is pretty high.

cuDNN is not purely software play and is backed by actual silicon (dedicated tensor cores for MAD ops) which boosts half-precision performance. I'll need to check if Radeon VII has dedicated tensor cores as well. Also, nvidia won't automatically optimize code to make use of tensor cores, that has to be done w/ using cuDNN extensions

@michaelklachko
Copy link

@salmanulhaq 1080Ti has no tensor cores.

@raxbits
Copy link

raxbits commented Nov 28, 2019

We ported our code from tf2.0 to tf1.13 and run the MobileNetV2 implementation from tf.keras.applications on the configuration you suggested (TF1.13 on ROCm2.4 release), and we still see NO improvement in speed.
Nvidia 1080Ti still performs 5-10x faster. I don't know if it is, because cudnn or cuda is not availabe for Radeon cards, but this performance difference is pretty high.

cuDNN is not purely software play and is backed by actual silicon (dedicated tensor cores for MAD ops) which boosts half-precision performance. I'll need to check if Radeon VII has dedicated tensor cores as well. Also, nvidia won't automatically optimize code to make use of tensor cores, that has to be done w/ using cuDNN extensions

do u have a referece for hardware being involved in CUDNN?

CUDNN afaik is pure software play with optimization and what not , what u may be referring to is TENSOR cores which was added to packaged on Volta and carried to Turing silicons.

@roschler
Copy link

Anybody tried TF 2.0 with a Radeon RX 580, with 8GB RAM? Does it work? If it does, has anybody tried running multiple cards in parallel?

I have one of the first generation Nvidia Titan X cards (pre-pascal). I'm finally giving up on it. It can only run CUDA drivers from a long time ago, from the year the card first was produced. Anything newer, I've tried them all, and the card won't initialize (i.e. - O/S rejects it at the device level). Very sad about this since I pad a ton for it, but it's time to move on.

@himanshugoel2797
Copy link

It ought to work but I'm not convinced that there's a point in running multiple 580s on a single training task. I don't think they'd be fast enough to gain a meaningful speedup (I didn't test rocm, but in a rendering task between a VII and a 580, it was faster to just use the VII than to have them both work together).

@kuabhish
Copy link

kuabhish commented Jan 8, 2020

Anyone tested with the latest Macbook pros?

Can anyone reply to @quantuminformation question please?

@quantuminformation
Copy link

I've now upgraded to the new MBP 16, but not used TFJS for a while, might get into py soon.

@sunway513
Copy link

Hi @quantuminformation @kuabhish , please refer to the following doc for ROCm support coverage over OSes:
https://github.com/RadeonOpenCompute/ROCm#deploying-rocm
There's another thread discussing the Mac support on main ROCm repo:
ROCm/ROCm#262

@sumannelli
Copy link

Hi Cvikli,

I am having radeon-vii but not able to configure with tensorflow. Please guide me. I was struggling to configure this for more than 15 days. Can I use the my gpu without docker ? Can i use the tensorflow 1.x with gpu. I had installed the rocm but still gpu is bot responding while training my model.

My system config:
OS: Ubuntu 18.04
Thanks
Suman

@sunway513
Copy link

Hi @sumannelli , did you follow the following instructions to install TF?
https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-install-basic.md

And certainly, you can use your GPU without docker, that's just a matter of deployment approach -- using docker would likely help you save some time config the user bit environment with ROCm.

@sumannelli-Ib
Copy link

HI Sunway513,
Thanks for the reply. I can able to use the AMD radeon Vii with Tensorflow2.1 but while my model is training, it is using only 3% of memory only.
OS: ubuntu 18.04
kernel: 5.3
rocm:3.1.3
tensorlow:2.1
If I am using any incompatible version please let me know. once again thanks for the quick reply.
Thanks
Suman Nelli

@Sifatul22
Copy link

Hi, Guys
My CPU specs are
Ryzen 5 3600 and AMD Radeon RX 5500 XT
Is there any way I could enable TensorFlow GPU using Rocm or other platforms? Please help me out

@sunway513
Copy link

HI @Sifatul22 , your configuration should work.
Please follow the document here to install ROCm and Tensorflow-rocm:
https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-install-basic.md
Let us know if you have questions, thanks.

@briansp2020
Copy link

@sunway513 Is Navi now supported? Radeon RX 5500 XT is Navi, isn't it?

@sunway513
Copy link

Hi @briansp2020 , Navi is not supported by ROCm yet, please refer to the following document for the GPU GPU list supported by ROCm:
https://github.com/RadeonOpenCompute/ROCm#supported-gpus

@sumannelli
Copy link

Hi sunway513,
I referred the link you provided to install the Rocm, it is installing with python 2.7. But I want to install with python 3.6.
https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-install-basic.md
Please suggest me on this.
Thanks

@sunway513
Copy link

Hi @sumannelli , in the same document, if you follow the steps to install python3 dependencies, depends on the default python3 version you have in your environment, you should be able to configure it correctly.

@sumannelli
Copy link

@hi sunway513,

Thanks for the reply Now I can run tensorflow2 on AMD radeon Vii.
But now I am using object detection api which support tensorflow1.15.0, when i installed thetensorflow-rocm==1.15.0 ,getting the error as"
aceback (most recent call last):
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 2453, in
from tensorflow.python.util import deprecation
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 25, in
from tensorflow.python.platform import tf_logging as logging
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/platform/tf_logging.py", line 38, in
from tensorflow.python.util.tf_export import tf_export
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/util/tf_export.py", line 48, in
from tensorflow.python.util import tf_decorator
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/util/tf_decorator.py", line 64, in
from tensorflow.python.util import tf_stack
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/util/tf_stack.py", line 29, in
from tensorflow.python import _tf_stack
ImportError: /home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/_tf_stack.so: undefined symbol: PySlice_AdjustIndices

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "a.ipynb", line 1, in
from tensorflow.keras.datasets import mnist
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/init.py", line 99, in
from tensorflow_core import *
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/init.py", line 28, in
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/init.py", line 50, in getattr
module = self._load()
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/init.py", line 44, in _load
module = _importlib.import_module(self.name)
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/init.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/pywrap_tensorflow.py", line 74, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 2453, in
from tensorflow.python.util import deprecation
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 25, in
from tensorflow.python.platform import tf_logging as logging
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/platform/tf_logging.py", line 38, in
from tensorflow.python.util.tf_export import tf_export
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/util/tf_export.py", line 48, in
from tensorflow.python.util import tf_decorator
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/util/tf_decorator.py", line 64, in
from tensorflow.python.util import tf_stack
File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/util/tf_stack.py", line 29, in
from tensorflow.python import _tf_stack
ImportError: /home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/_tf_stack.so: undefined symbol: PySlice_AdjustIndices

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

Thanks
Suman Nelli

@sumannelli-Ib
Copy link

sumannelli-Ib commented Mar 25, 2020

Hi sunway513,
The Rocm 3.1 is not working with Tensorflow-rocm=1.15.0. Please provide the link or reference to download the Rocm 2.10
Note:
when using the below command it is downloading Rocm 3.1. But I need 2.1

sudo apt install rocm-dkms
My work has stopped because of this. kindly reply me.

@sunway513
Copy link

Hi @sumannelli-Ib , you can use the following for ROCm2.10 package:
http://repo.radeon.com/rocm/apt/2.10.0/
You can modify the following file in your system:

~$ cat /etc/apt/sources.list.d/rocm.list 
deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main

Then

sudo apt update 
sudo apt autoremove rocm-dkms && sudo apt install rocm-dkms -y

If you want to stick with ROCm3.1, you need to pull the latest tensorflow-rocm whl packages, please consult with our document below:
https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-install-basic.md
Specifically, you'll need the following command:
pip3 install --user tensorflow-rocm==1.15.2

In the future, please make sure your tensorflow-rocm version is compatible to the ROCm build installed on your system, the compatibility info:
https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-rocm-release.md

@sumannelli-Ib
Copy link

sumannelli-Ib commented Apr 9, 2020

Hi @sunway,

I tried as mentioned above but getting the below error when trying to train the model(Using TensorFlow object detection API) .

warning: :0:0: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering
warning: :0:0: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering
In file included from /opt/rocm-3.1.0/hip/../include/hip/hip_runtime_api.h:342:
/opt/rocm-3.1.0/hip/../include/hip/hcc_detail/hip_runtime_api.h:48:10: fatal error: 'hsa/hsa.h' file not found
#include <hsa/hsa.h>

@sunway513
Copy link

Hi @sumannelli-Ib , it seems your system is still on ROCm3.10, is that what you want?
I'd suggest to clean up your ROCm installations and reinstall the ROCm & TF packages from scratch.
Alternatively, you can try the following docker containers, so you don't need to deal with the user-bit configurations -- it can be pretty convoluted if there're some obsoleted ROCm packages installed on your system.
Docker containers:
https://hub.docker.com/r/rocm/tensorflow

@sumannelli
Copy link

sumannelli commented Apr 10, 2020

Hi
Thanks for the quick reply.
I reinstalled my OS and tried. Earlier I used 2.10 with tensorflow 1.15.0 but the GPU is slower than my CPU. So I updated my rocm to 3.1 and tensorflow to 1.15.3 and I tried with mnist dataset training it's works like a charm and fast. But when i train with TFOD API I am getting the above mentioned error.

Dealing with docker is hectic work for me. I never used before it.

@sunway513
Copy link

Hi @sumannelli , you are welcome! However, I don't think we have brought up TFOD API for tensorflow-rocm project yet.

@sumannelli
Copy link

sumannelli commented Apr 10, 2020

But tensorflow-rocm==1.15.0 is working where tensorflow-rocm=1.15.3 showing above error
Kindly somehow help me in fixing this. Especially for this work I purchased the amd GPU. If this is not work for this project.there is no meaning in purchasing it

@sunway513
Copy link

Hi @sumannelli , can you open a new issues and provide us the steps to reproduce the problem?
If ROCm2.10 + TF1.15.0 would work, you might just try the following docker container:
rocm/tensorflow:rocm2.10.0-tf1.15-dev

ROCm enabled relocatable package feature in ROCm3.1.0, that feature may introduce the regressions you've observed for TFOD API.
However, we need to know more details before offering further help, Thank you for the understanding.

@sumannelli-Ib
Copy link

Hi @sunway513,

It's been 2 two days I have raised an issue, but nobody is assigned. Could you please help me with the below new link.
#927

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests