-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEO driver not detect GPU when using kernel 6.8.x. #710
Comments
I can reproduce this with latest drm-tip I do not see any difference in |
Thank you for reproduced this. On 6.7.x, GPU is recognized. |
Yes, it works with 6.7 (drm-tip) kernel also for me, just not with 6.8 (i915 KMD). EDIT: that was with public Xe KMD repo, not drm-tip. With drm-tip, the issue is already with earlier kernel version (see below). |
I tested with 6.8.0-rc1(6.8.0-060800rc1-generic) and this issue is reproduced. Maybe between 6.7 and 6.8.0-rc1 appear this issue. I notice several commits with new Xe Intel driver and fixing eDP/DisplayPort in 6.8.0-rc1. I not have time to bisect for detect what commit/commits cause this behaviour. |
Dang. I was comparing "drm-tip" on TGL against "xe-drm-next" kernel on DG1, but their i915 KMD codes seem to progress at different rates, so I had to do quick bisection using already existing nightly "drm-tip" builds... While things work still with 6.7 version of "xe-drm-next" kernel repo, with the "drm-tip" repo kernel,
(Commits named like those, or the original commits are not any more in "drm-tip" repo, as it gets constantly rebased to upstream, so I cannot provide list of commits between them any more.) |
Hi folks, |
Media and 3D drivers seem to work fine with that change, why it's a problem for L0/compute stack? (I'm wondering whether this change should be reported to upstream as kernel stable ABI breakage...) Looking at the compute-runtime code, it seems to affect SVM capability & address space size: Where's in Mesa code: |
Yes, with those both |
Hi @eero-t, Using latest drm-tip version with variable in environment, GPU appear.
|
In this case issue is from Kernel or NEO driver/OpenCL? |
Well, it depends the GTT size value returned by the KMD is thought to be part of stable ABI, but I do not see how it could be, as there can be different reasons for those values to differ. I would think that NEO should accept / adapt to sensible GTT size values, potentially with a warning when it differs from expected, instead of barfing out when it's not exactly matching its expectations. |
Tested 6.8.0-rc3 based Xe KMD, and compute/Sysman driver worked with that, so this issue seems to be i915 KMD specific (as expected). |
I can reproduce this on Arch |
I can reproduce this on Arch with Linux 6.8 release (6.8.1-arch1-1) using i915. Exporting these works fine:
|
In this case, will the NEO compute driver have adaptation to working on new behaviour? |
Encountered this issue also. |
On 6.8:
On 6.7:
The issue seems to lie here: compute-runtime/shared/source/memory_manager/gfx_partition.cpp Lines 250 to 253 in 0307854
|
It seems that change in value reported by the GTT size (I.e. KMD would only internally use the "usable" GTT size value, and report full address space to user space, including the reserved parts, and distros using 6.8.0 kernel need to patch their kernels until upstream releases updated kernel.) @JablonskiMateusz Maybe |
Note that the upcoming Ubuntu 24.04 LTS uses the non-LTS 6.8 kernel. Hopefully it can be fixed before it's released next month. Otherwise OpenCL will not be available on many distros based on it. |
Thanks |
|
rusticl is still an experimental implementation and according to Mesa it is currently broken on Arc GPUs. My use case is video processing and only NEO supports zero-copy interop between VA-API and OpenCL through |
Just adding as well that I'm also experiencing this issue on nixos when running the latest kernel (6.8.1). GPU (intel N100 alder lake) does not show up in clinfo. However, on a N5105 machine (Jasper Lake), the GPU did get detected by clinfo on the latest kernel. However downgrading to 6.7.10 on the N100 machine immediately resolved the issue. |
uploaded the fix to noble, thanks for the ping |
Fixes an upstream issue in the last version intel/compute-runtime#710
This issue seems to be fixed with |
since issue seems to be fixed, can we now close the issue? |
Hello @JablonskiMateusz , I think this issue is fixed now. Maybe is fine to close this ticket. |
@ionutnechita-intel Sorry, but this doesn't work inside an OCI container with podman for whatever reason. Not sure if it is also an issue with Docker but I would presume it would be a problem as well. You have to export the two environment variables |
@simonlui Are you sure that the version of the Intel Compute Runtime installed inside the container contains the fix? I can imagine your situation happening if this were not the case. For reference, my iGPU appears to be correctly detected by clinfo inside an Arch Linux-based container. |
@joanbm Yeah that was it. I was confused why I was hitting this in the oneapi-basekit Docker image but it was last updated a month ago at the time of writing this so it makes sense why it still had the issue without the updated version of the runtime inside the container. |
@JablonskiMateusz When will this fix be posted to the apt repo at https://repositories.intel.com/gpu/ubuntu? |
Hi @simonlui, I understand what you are saying. but it must be checked more thoroughly, with several OS variants as a container. I tested it on Ubuntu 24.04, directly on the physical machine, with the latest update, and I didn't see the problem anymore. |
@ionutnechita-intel The problem was fixed, it was an outdated compute runtime package inside the oneapi-basekit Docker image which didn't have the updated runtime installed by default. Updating the package manually fixed the issue. |
Hi @simonlui, Thank you for feedback. A good day. |
I am having the same issue with Rocky Linux. When I upgraded from 9.2 to 9.4, I can no longer see the Arc GPU in the clinfo. I see my Arc 750 in "lspci" but not in clinfo and I cannot run codes on it. If I use the two environment variables above, it works! (this is the first fix I have found). Will this be fixed in the next driver release that supports RHEL 9.4? |
NEO driver is not detect for GPU when using kernel 6.8.x.
When have kernel 6.5.x and 6.6.x this is present.
And on kernel 6.8.x have this:
The text was updated successfully, but these errors were encountered: