Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native API returns: -996 invalid_kernel("uses-fp64-math") #285

Closed
JustSomeRandomUsername opened this issue Jan 18, 2023 · 18 comments
Closed
Labels
ARC ARC GPU Crash Execution crashes

Comments

@JustSomeRandomUsername
Copy link

I am trying to train a yolov7 model on my a770 on Ubuntu 22.04 I have onaApi, intel extension for pytorch installed and have trained a different model successfully.

But with this model iam getting a
"RuntimeError: Native API failed. Native API returns: -996 (Function exists but address is not available)
invalid_kernel("uses-fp64-math")
-996 (Function exists but address is not available)"

I am running the code from
https://github.com/WongKinYiu/yolov7
with slight modifications for running on xpu.
import intel_extension_for_pytorch
changed device to "xpu"
model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.float32)

I am still a bit new to work on neuralnets am i missing something obvious or is this a intel specific driver problem ?

@jingxu10
Copy link
Contributor

thanks for reporting the issue. We will try to reproduce this issue.

@fredlarochelle
Copy link

fredlarochelle commented Feb 8, 2023

Here is a quick way to reproduce the problem, it works fine on cpu, not on xpu:

import torch
import intel_extension_for_pytorch as ipex

device = torch.device('xpu' if torch.xpu.is_available() else 'cpu')
print(f'Using device: {device}, named {torch.xpu.get_device_name(0)}.')

# float16, bfloat16 and float32 get the same error...
x = torch.randn([100], dtype=torch.float16, device=device)

print(x * 0.5)

Btw, it would be nice if torch.xpu.get_device_name(0) would return the name of the GPU instead of the IDs!

@jingxu10
Copy link
Contributor

jingxu10 commented Feb 8, 2023

@fredlarochelle Thanks for sharing the reproducer.
By the way, could you share the output of that device name print? Also, what is your GPU? Arc? Do you use 1.13.10+xpu or 1.10?
The script run on my environment shows Using device: xpu, named Intel(R) Data Center GPU Flex Series 170 [0x56c0].

@fredlarochelle
Copy link

fredlarochelle commented Feb 8, 2023

I am using an A770 and when I call the function get_device_name(), it returns Intel(R) Graphics [0x56a0]. Running my script, it returns Using device: xpu, named Intel(R) Graphics [0x56a0]..

I am using 1.13.10+xpu, on Python 3.10.6 (the rest is the exact setup in the latest installation guide).

@jingxu10
Copy link
Contributor

jingxu10 commented Feb 9, 2023

the reason of this error triggered by print is that double data type is invoked internally in print, which is not supported by the GPU hardware. Please move the tensor to CPU first then print.

import torch
import intel_extension_for_pytorch as ipex

device = torch.device('xpu' if torch.xpu.is_available() else 'cpu')
print(f'Using device: {device}, named {torch.xpu.get_device_name(0)}.')

# float16, bfloat16 and float32 get the same error...
x = torch.randn([100], dtype=torch.float16, device=device)
y=x*0.5
y=y.to('cpu')
print(y)

Regarding to the device name, it is fully retrieved from driver. May I know your driver info? like version and/or how you installed it.

@fredlarochelle
Copy link

Doesn't seem to solve the problem, running your code example, I still get the same error. Here is the full output:

image

But, you are right that it seems related to the print function, the conversion to string actually. Trying to print(x), gets the same error and replacing print() by str() also gets the same error. But, digging around a bit, I found a weird case:

import torch
import intel_extension_for_pytorch as ipex

# works fine
x = torch.ones([10], dtype=torch.float16, device='cpu')
x.to('xpu')
x.to('cpu')
print(x)

# doesn't work
x = torch.ones([10], dtype=torch.float16, device='xpu')
x.to('cpu')
print(x)

# but casting to integer works?
x = torch.ones([10], dtype=torch.float16, device='xpu')
x.to('cpu')
print(x.int())

Here is another weird example:

import torch
import intel_extension_for_pytorch as ipex

# works fine
torch.arange(10, device='cpu')

# works fine
torch.arange(10, device='xpu')

# works fine
torch.arange(10, dtype=torch.float32, device='cpu')

# doesn't works
torch.arange(10, dtype=torch.float32, device='xpu')

About the device name, just checked with XPU Manager and it too can't retrieve the full name. Seems like it is a problem upstream with the driver and not with ipex. For the driver install, I followed the exact instructions here and my driver version with clinfo is 22.49.25018.21.

@jingxu10
Copy link
Contributor

jingxu10 commented Feb 9, 2023

Oh, sorry. I'll correct in the code snippet as well.

...
y=y.to('cpu')
print(y)

I'll check the arange one.

@jingxu10
Copy link
Contributor

jingxu10 commented Feb 9, 2023

The arange one works on my side.
@ashokei @sanchitintel would you double confirm on your side?

@jingxu10
Copy link
Contributor

jingxu10 commented Feb 9, 2023

Hi @JustSomeRandomUsername , since your workload is a training task. Double could be used in this case. Arc GPU doesn't support double precision in hardware level. Currently we don't recommend to run training on Arc and ATS-M GPUs.

@fredlarochelle
Copy link

fredlarochelle commented Feb 9, 2023

Oh yeah, my bad, I didn't catch it either! I can confirm that it works on my side too.

And just to make sure, I double checked and I get the -996 error for the arange one.

A note should be added somewhere in the documentation, all of this is a different behavior than "standard" PyTorch with CUDA. You don't need to transfer the tensor back to the CPU to print it normally.

@jingxu10
Copy link
Contributor

jingxu10 commented Feb 9, 2023

Exactly. Thanks for the advice. We will put this as a known issue.

@JustSomeRandomUsername
Copy link
Author

@jingxu10 Seems likely that the code uses doubles somewhere, ill check if removing the use of doubles solves my problem. I have never heard that training is not recommended on arc, is this temporary ? Do you expect training on Arc to get simpler with software updates ?

@jingxu10
Copy link
Contributor

jingxu10 commented Feb 9, 2023

To clarify this statement, comparing to inference, training invokes double precision data type more probably. Since the hardware itself doesn't support double, if the training workload doesn't depend on usage of double, it is OK. If not, there will be accuracy issues.

@tripzero
Copy link

I'm seeing this using BoostingMonocularDepth with torch._C._nn.upsample_bilinear2d and Arc A770. Isn't there a way to emulate fp64? An environment variable perhaps?

@fredlarochelle
Copy link

Here is a quick reproducer:

import torch
import intel_extension_for_pytorch as ipex

X = torch.rand(9, dtype=torch.float32, device='xpu').reshape((1, 1, 3, 3))
up = torch.nn.UpsamplingBilinear2d((9, 9))
X_up = up(X)

Also, the function upsample_bilinear2d_out_frame() is not implemented for BFloat16. Any plan on implementing it or is it related to pytorch/pytorch#88536?

And the error @tripzero is getting is probably related to all the int64_t in that function. Any plan of diverting from PyTorch and reimplementing the operators using fp64 math to fp32 or emulating fp64 is the way you are planning to go foward?

@tripzero
Copy link

Looks like setting IGC_EnableDPEmulation=1 doesn't help though you would think it would...

@jingxu10 jingxu10 added ARC ARC GPU Crash Execution crashes labels Feb 26, 2023
@billxc
Copy link

billxc commented Mar 18, 2023

the reason of this error triggered by print is that double data type is invoked internally in print, which is not supported by the GPU hardware. Please move the tensor to CPU first then print.

import torch
import intel_extension_for_pytorch as ipex

device = torch.device('xpu' if torch.xpu.is_available() else 'cpu')
print(f'Using device: {device}, named {torch.xpu.get_device_name(0)}.')

# float16, bfloat16 and float32 get the same error...
x = torch.randn([100], dtype=torch.float16, device=device)
y=x*0.5
y=y.to('cpu')
print(y)

Regarding to the device name, it is fully retrieved from driver. May I know your driver info? like version and/or how you installed it.

Is it possible that we wrap the _str function in the future release, so that in the future, we can use the print function safely in the future

@jingxu10
Copy link
Contributor

It is fixed in the latest code. Would you access xpu-master branch and try use the compile_bundle.sh script under scripts folder to build a binary with the latest code base? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ARC ARC GPU Crash Execution crashes
Projects
None yet
Development

No branches or pull requests

5 participants