Skip to content

Resnet50 - ResourceExhaustedError on R9 Nano? #29

@lukeiwanski

Description

@lukeiwanski

Hello,

First of all, thanks for the effort!

I encountered problem while following instructions from https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-install-basic.md and https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-quickstart.md#tensorflows-tf_cnn_benchmarks

The TF version used is: http://repo.radeon.com/rocm/misc/tensorflow/tensorflow-1.3.0-cp27-cp27mu-manylinux1_x86_64.whl

The GPU used is R9 Nano:

Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP.internal (2617.0)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_object_metadata cl_amd_event_callback 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     gfx803
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 
  Driver Version                                  2617.0 (HSA1.1,LC)
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Board Name (AMD)                         Fiji [Radeon R9 FURY / NANO Series]
  Device Topology (AMD)                           PCI-E, 01:00.0
  Max compute units                               64

Following command causes ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[64,512,28,28]:
python tf_cnn_benchmarks.py --model=resnet50 --num_gpus=1
Gist for full log here: https://gist.github.com/lukeiwanski/f20596d0c7812b977a70d40e13f4a45d

Have you seen anything like that before?

Thanks,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions