[aarch64] Run python meet core dump failure on TF 1.12.2 with rocm 2.4

<em>Please make sure that this is a bug. As per our [GitHub Policy](https://github.com/tensorflow/tensorflow/blob/master/ISSUES.md), we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

**System information**
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  Ubuntu 18.04
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
- TensorFlow installed from (source or binary): build from source
- TensorFlow version (use command below): 1.12
- Python version: 3.6
- Bazel version (if compiling from source): 0.15.2
- GCC/Compiler version (if compiling from source): 7.4
- ROCm/MIOpen version: 2.3
- GPU model and memory: Vega64 8GB

You can collect some of this information using our environment capture
[script](https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh)
You can also obtain the TensorFlow version with: 1. TF 1.0: `python -c "import
tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"` 2. TF 2.0: `python -c
"import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"`

**Describe the current behavior**
run simple test GPU python script but error at core dump

2019-06-13 14:07:31.880680: I tensorflow/core/common_runtime/placer.cc:927] transpose/perm: (Const)/job:localhost/replica:0/task:0/device:GPU:0
Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-06-13 14:07:31.880698: I tensorflow/core/common_runtime/placer.cc:927] Const: (Const)/job:localhost/replica:0/task:0/device:GPU:0
terminate called after throwing an instance of 'std::exception'
  what():  std::exception
Aborted (core dumped)

script
cat tf-gpu.py 
import sys
import numpy as np
import tensorflow as tf
from datetime import datetime

device_name = sys.argv[1]  # Choose device from cmd line. Options: gpu or cpu
shape = (int(sys.argv[2]), int(sys.argv[2]))
if device_name == "gpu":
        device_name = "/gpu:0"
else:
        device_name = "/cpu:0"

with tf.device(device_name):
    random_matrix = tf.random_uniform(shape=shape, minval=0, maxval=1)
    dot_operation = tf.matmul(random_matrix, tf.transpose(random_matrix))
    sum_operation = tf.reduce_sum(dot_operation)


startTime = datetime.now()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
    result = session.run(sum_operation)
    print(result)

//It can be hard to see the results on the terminal with lots of output -- add some newlines to improve readability.
print("\n" * 5)
print("Shape:", shape, "Device:", device_name)
print("Time taken:", datetime.now() - startTime)

print("\n" * 5)



**Describe the expected behavior**
But run another python script successfully & HIP Examples also run successfully

//cat test_single_gpu.py  
from __future__ import print_function
'''
Basic Multi GPU computation example using TensorFlow library.
Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
'''

'''
This tutorial requires your machine to have 1 GPU
"/cpu:0": The CPU of your machine.
"/gpu:0": The first GPU of your machine
'''

import numpy as np
import tensorflow as tf
import datetime

// Processing Units logs
log_device_placement = True

// Num of multiplications to perform
n = 10

'''
Example: compute A^n + B^n on 2 GPUs
Results on 8 cores with 2 GTX-980:
 * Single GPU computation time: 0:00:11.277449
 * Multi GPU computation time: 0:00:07.131701
'''
// Create random large matrix
A = np.random.rand(10000, 10000).astype('float32')
B = np.random.rand(10000, 10000).astype('float32')

// Create a graph to store results
c1 = []
c2 = []

def matpow(M, n):
    if n < 1: #Abstract cases where n < 1
        return M
    else:
        return tf.matmul(M, matpow(M, n-1))

'''
Single GPU computing
'''
with tf.device('/gpu:0'):
    a = tf.placeholder(tf.float32, [10000, 10000])
    b = tf.placeholder(tf.float32, [10000, 10000])
    # Compute A^n and B^n and store results in c1
    c1.append(matpow(a, n))
    c1.append(matpow(b, n))

with tf.device('/cpu:0'):
  sum = tf.add_n(c1) #Addition of all elements in c1, i.e. A^n + B^n

t1_1 = datetime.datetime.now()
with tf.Session(config=tf.ConfigProto(log_device_placement=log_device_placement)) as sess:
    # Run the op.
    sess.run(sum, {a:A, b:B})
t2_1 = datetime.datetime.now()

print("Single GPU computation time: " + str(t2_1-t1_1))

**Code to reproduce the issue**
Provide a reproducible test case that is the bare minimum necessary to generate the problem.

**Other info / logs**
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[aarch64] Run python meet core dump failure on TF 1.12.2 with rocm 2.4 #506

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[aarch64] Run python meet core dump failure on TF 1.12.2 with rocm 2.4 #506

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions