-
Notifications
You must be signed in to change notification settings - Fork 99
Description
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Ubuntu 18.04 - Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
- TensorFlow installed from (source or binary): build from source
- TensorFlow version (use command below): 1.12
- Python version: 3.6
- Bazel version (if compiling from source): 0.15.2
- GCC/Compiler version (if compiling from source): 7.4
- ROCm/MIOpen version: 2.3
- GPU model and memory: Vega64 8GB
You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the current behavior
run simple test GPU python script but error at core dump
2019-06-13 14:07:31.880680: I tensorflow/core/common_runtime/placer.cc:927] transpose/perm: (Const)/job:localhost/replica:0/task:0/device:GPU:0
Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-06-13 14:07:31.880698: I tensorflow/core/common_runtime/placer.cc:927] Const: (Const)/job:localhost/replica:0/task:0/device:GPU:0
terminate called after throwing an instance of 'std::exception'
what(): std::exception
Aborted (core dumped)
script
cat tf-gpu.py
import sys
import numpy as np
import tensorflow as tf
from datetime import datetime
device_name = sys.argv[1] # Choose device from cmd line. Options: gpu or cpu
shape = (int(sys.argv[2]), int(sys.argv[2]))
if device_name == "gpu":
device_name = "/gpu:0"
else:
device_name = "/cpu:0"
with tf.device(device_name):
random_matrix = tf.random_uniform(shape=shape, minval=0, maxval=1)
dot_operation = tf.matmul(random_matrix, tf.transpose(random_matrix))
sum_operation = tf.reduce_sum(dot_operation)
startTime = datetime.now()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
result = session.run(sum_operation)
print(result)
//It can be hard to see the results on the terminal with lots of output -- add some newlines to improve readability.
print("\n" * 5)
print("Shape:", shape, "Device:", device_name)
print("Time taken:", datetime.now() - startTime)
print("\n" * 5)
Describe the expected behavior
But run another python script successfully & HIP Examples also run successfully
//cat test_single_gpu.py
from future import print_function
'''
Basic Multi GPU computation example using TensorFlow library.
Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
'''
'''
This tutorial requires your machine to have 1 GPU
"/cpu:0": The CPU of your machine.
"/gpu:0": The first GPU of your machine
'''
import numpy as np
import tensorflow as tf
import datetime
// Processing Units logs
log_device_placement = True
// Num of multiplications to perform
n = 10
'''
Example: compute A^n + B^n on 2 GPUs
Results on 8 cores with 2 GTX-980:
- Single GPU computation time: 0:00:11.277449
- Multi GPU computation time: 0:00:07.131701
'''
// Create random large matrix
A = np.random.rand(10000, 10000).astype('float32')
B = np.random.rand(10000, 10000).astype('float32')
// Create a graph to store results
c1 = []
c2 = []
def matpow(M, n):
if n < 1: #Abstract cases where n < 1
return M
else:
return tf.matmul(M, matpow(M, n-1))
'''
Single GPU computing
'''
with tf.device('/gpu:0'):
a = tf.placeholder(tf.float32, [10000, 10000])
b = tf.placeholder(tf.float32, [10000, 10000])
# Compute A^n and B^n and store results in c1
c1.append(matpow(a, n))
c1.append(matpow(b, n))
with tf.device('/cpu:0'):
sum = tf.add_n(c1) #Addition of all elements in c1, i.e. A^n + B^n
t1_1 = datetime.datetime.now()
with tf.Session(config=tf.ConfigProto(log_device_placement=log_device_placement)) as sess:
# Run the op.
sess.run(sum, {a:A, b:B})
t2_1 = datetime.datetime.now()
print("Single GPU computation time: " + str(t2_1-t1_1))
Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.