Open
Description
Context
- Pytorch version: 2.6.0+rocm6.2.4
- Operating System and version: Ubuntu 24.04.2 LTS x86_64
Your Environment
- Installed using source? [yes/no]: no
- Are you planning to deploy it using docker container? [yes/no]: no
- Is it a CPU or GPU environment?: GPU
- Which example are you using: mnist
- Link to code or data to repro [if any]: mnist
Expected Behavior
Train Epoch: 1 [0/60000 (0%)] Loss: 2.326473
Train Epoch: 1 [640/60000 (1%)] Loss: 1.377825
Train Epoch: 1 [1280/60000 (2%)] Loss: 0.828890
Train Epoch: 1 [1920/60000 (3%)] Loss: 0.623807
Train Epoch: 1 [2560/60000 (4%)] Loss: 0.447925
Train Epoch: 1 [3200/60000 (5%)] Loss: 0.293224
Train Epoch: 1 [3840/60000 (6%)] Loss: 0.163648
Train Epoch: 1 [4480/60000 (7%)] Loss: 0.633399
Train Epoch: 1 [5120/60000 (9%)] Loss: 0.226126
Train Epoch: 1 [5760/60000 (10%)] Loss: 0.226796
...
Current Behavior
Traceback (most recent call last):
File "/home/USER/Desktop/PYTHON Document/examples/mnist/main.py", line 147, in <module>
main()
File "/home/USER/Desktop/PYTHON Document/examples/mnist/main.py", line 138, in main
train(args, model, device, train_loader, optimizer, epoch)
File "/home/USER/Desktop/PYTHON Document/examples/mnist/main.py", line 45, in train
output = model(data)
^^^^^^^^^^^
File "/home/USER/Desktop/PYTHON Document/PhyRevE/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/USER/Desktop/PYTHON Document/PhyRevE/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/USER/Desktop/PYTHON Document/examples/mnist/main.py", line 25, in forward
x = self.conv1(x)
^^^^^^^^^^^^^
File "/home/USER/Desktop/PYTHON Document/PhyRevE/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/USER/Desktop/PYTHON Document/PhyRevE/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/USER/Desktop/PYTHON Document/PhyRevE/.venv/lib/python3.12/site-packages/torch/nn/modules/conv.py", line 554, in forward
return self._conv_forward(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/USER/Desktop/PYTHON Document/PhyRevE/.venv/lib/python3.12/site-packages/torch/nn/modules/conv.py", line 549, in _conv_forward
return F.conv2d(
^^^^^^^^^
RuntimeError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
Possible Solution
export HIP_VISIBLE_DEVICES=1
export HSA_OVERRIDE_GFX_VERSION=11.0.0
export PYTORCH_ROCM_ARCH="gfx1100"
But it doesn't work for me.
Steps to Reproduce
- Install the lastest pytorch by
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4
- clone
examples
and cd the directory. python3 mnist/main.py
Failure Logs [if any]
Output of AMD_LOG_LEVEL=3 python main.py
AMD_LOG.log
Metadata
Metadata
Assignees
Labels
No labels