-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drmOpen("nvidia", NULL) returns -1 or garbage value (reopened) #263
Comments
Well, nothing is going to work as long as you have this error I see in your logs:
So, let's start with that. The preceding lines look concerning:
The "Bad threadStateDatabase.timeout.flags" messages are not expected, but worse is the firmware load error. It looks like it is failing to find nvidia/515.43.04/gsp.bin. Does /lib/firmware/nvidia/515.43.04/gsp.bin exist on your system? |
No
How would I retrieve that firmware? |
One way is described in https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/README.md
|
Is there a way to get the firmware without the userland easily without installing then uninstalling? |
You can extract just the gsp.bin if you prefer. Something like this:
|
Problem persists
I added a printf for the handle and made sure that card0 was a valid file dmesg:
|
Solution: O_CREAT smh |
NVIDIA Open GPU Kernel Modules Version
ce3d74f
Does this happen with the proprietary driver (of the same version) as well?
I cannot test this
Operating System and Version
Description: Fedora release 36 (Thirty Six)
Kernel Release
Linux fedora 5.17.9-300.fc36.x86_64 #1 SMP PREEMPT Wed May 18 15:08:23 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Hardware: GPU
Its a RTX 2060 from GIGABYTE, I am not going to install the proprietary tool that is suggested
Describe the bug
GPU file descriptors return -1 or 3
To Reproduce
Try to open /dev/dri/cardx or use the drmOpen() function.
Bug Incidence
Always
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
[ 4.751792] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751793] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751793] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751793] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751794] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751794] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751795] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751795] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751796] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751796] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751797] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751797] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751798] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751798] ACPI: [Firmware Bug]: No valid BIOS _PSS frequency found for processor 5
[ 4.751799] ACPI: [Firmware Bug]: BIOS needs update for CPU frequency support
[ 4.751827] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751827] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751828] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751828] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751829] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751829] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751830] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751830] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751830] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751831] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751831] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751832] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751832] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751833] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751833] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751834] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751834] ACPI: [Firmware Bug]: No valid BIOS _PSS frequency found for processor 6
[ 4.751835] ACPI: [Firmware Bug]: BIOS needs update for CPU frequency support
[ 4.751863] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751863] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751864] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751864] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751865] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751865] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751866] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751866] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751866] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751867] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751867] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751868] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751868] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751869] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751869] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751870] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751870] ACPI: [Firmware Bug]: No valid BIOS _PSS frequency found for processor 7
[ 4.751871] ACPI: [Firmware Bug]: BIOS needs update for CPU frequency support
[ 5.385291] nvidia-gpu 0000:01:00.3: i2c timeout error e0000000
[ 5.385294] ucsi_ccg 0-0008: i2c_transfer failed -110
[ 5.385295] ucsi_ccg 0-0008: ucsi_ccg_init failed - -110
[ 5.385298] ucsi_ccg: probe of 0-0008 failed with error -110
[ 5.398888] kauditd_printk_skb: 136 callbacks suppressed
[ 5.398889] audit: type=1130 audit(1653589722.262:145): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-udev-settle comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 5.444839] audit: type=1130 audit(1653589722.308:146): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-fsck@dev-disk-by\x2duuid-cd5cf0c9\x2db7ce\x2d41da\x2dbcf1\x2dae0ccb7c629a comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 5.459774] audit: type=1130 audit(1653589722.323:147): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-fsck@dev-disk-by\x2duuid-5B81\x2d8B7D comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 5.463346] EXT4-fs (sda2): mounted filesystem with ordered data mode. Quota mode: none.
[ 5.487808] audit: type=1130 audit(1653589722.351:148): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dracut-shutdown comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 5.511918] audit: type=1130 audit(1653589722.375:149): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=plymouth-read-write comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 5.519858] audit: type=1130 audit(1653589722.383:150): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=import-state comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 5.567885] audit: type=1130 audit(1653589722.431:151): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 5.570480] audit: type=1334 audit(1653589722.433:152): prog-id=60 op=LOAD
[ 5.570670] audit: type=1334 audit(1653589722.434:153): prog-id=61 op=LOAD
[ 5.570726] audit: type=1334 audit(1653589722.434:154): prog-id=62 op=LOAD
[ 5.602867] RPC: Registered named UNIX socket transport module.
[ 5.602869] RPC: Registered udp transport module.
[ 5.602870] RPC: Registered tcp transport module.
[ 5.602870] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 5.771489] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[ 5.771491] Bluetooth: BNEP filters: protocol multicast
[ 5.771494] Bluetooth: BNEP socket layer initialized
[ 5.905379] NET: Registered PF_QIPCRTR protocol family
[ 6.526274] iwlwifi 0000:00:14.3: Conflict between TLV & NVM regarding enabling LAR (TLV = enabled NVM =disabled)
[ 6.712661] iwlwifi 0000:00:14.3: Conflict between TLV & NVM regarding enabling LAR (TLV = enabled NVM =disabled)
[ 9.192208] e1000e 0000:00:1f.6 eno2: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 9.192258] IPv6: ADDRCONF(NETDEV_CHANGE): eno2: link becomes ready
[ 9.768045] thermal cooling_device11: Setting cooling device state is deprecated
[ 11.529700] rfkill: input handler disabled
[ 11.930058] Bluetooth: RFCOMM TTY layer initialized
[ 11.930063] Bluetooth: RFCOMM socket layer initialized
[ 11.930096] Bluetooth: RFCOMM ver 1.11
[ 15.899439] logitech-hidpp-device 0003:046D:1025.0007: HID++ 1.0 device connected.
[ 28.879886] rfkill: input handler enabled
[ 249.288102] nvidia-modeset: Unloading
[ 249.304665] NVOC: __nvoc_objDelete: Child class OBJIOVASPACE not freed from parent class OBJVMM.Allocator 00000000d4fbfba6 released with memory allocations
[ 249.304686] [NvPort] *************************************************
[ 249.304686] NvPort memory tracking information for allocator 00000000d4fbfba6:
[ 249.304687] ACTIVE: 1 allocations, 644 bytes allocated (616 useful, 28 meta)
[ 249.304688] TOTAL: 150 allocations, 512133 bytes allocated (507933 useful, 4200 meta)
[ 249.304689] PEAK: 148 allocations, 511980 bytes allocated (507836 useful, 4144 meta)
[ 249.304689] [NvPort] *************************************************
[ 249.304702] nvidia-nvlink: Unregistered Nvlink Core, major device number 234
[ 249.326369] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[ 249.326373] NVRM getCpuCounts: RmInitCpuCounts: physical 0x8 logical 0x8
[ 249.326722] NVRM rmapiControlCacheInit: using cache mode 1
[ 249.327021] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=io+mem
[ 249.327024] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 249.327038] NVRM halmgrGetHalForGpu_IMPL: Matching PMC_BOOT_42 = 0x164a1000 to HAL_IMPL_TU104
[ 249.327070] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 249.375107] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 515.43.04 Release Build (yusufkhan@) Tue May 24 06:08:38 PM EDT 2022
[ 1317.099154] intel_powerclamp: Start idle injection to reduce power
[ 1318.444129] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1318.445139] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1318.446128] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1318.447127] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1318.448126] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1319.405115] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1319.406135] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1319.407115] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1320.797081] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1320.798091] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1326.121176] intel_powerclamp: Stop forced idle injection
[ 1343.137619] intel_powerclamp: Start idle injection to reduce power
[ 1355.185450] intel_powerclamp: Stop forced idle injection
[ 1372.202022] intel_powerclamp: Start idle injection to reduce power
[ 1384.226791] intel_powerclamp: Stop forced idle injection
[ 1401.243395] intel_powerclamp: Start idle injection to reduce power
[ 1411.275389] intel_powerclamp: Stop forced idle injection
[ 1637.846061] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64 515.43.04 Release Build (yusufkhan@) Tue May 24 06:08:29 PM EDT 2022
[ 1637.846069] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 1637.846071] NVRM rmapiAllocWithSecInfo: client:0x0 parent:0x0 object:0x0 class:0x0
[ 1637.846074] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 1637.846083] NVRM rmapiAllocWithSecInfo: allocation complete
[ 1637.847149] nvidia_drm: unknown parameter 'NVreg_RmMsg' ignored
[ 1637.847478] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 1637.847686] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 1637.847688] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 1637.847759] nvidia 0000:01:00.0: Direct firmware load for nvidia/515.43.04/gsp.bin failed with error -2
[ 1637.847771] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x61:0x0:1610)
[ 1637.847787] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 1637.847824] [drm:nv_drm_load [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[ 1637.847891] [drm:nv_drm_probe_devices [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to register device
[ 2690.068695] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 2690.068700] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 2690.068702] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 2690.068710] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 2690.068711] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 2690.068712] NVRM rm_get_firmware_version: rm_get_firmware_version: Failed to query gpu build versions, status=0x40
The program I attempted to run on this was
with my nvidia-next libdrm branch https://gitlab.freedesktop.org/YusufKhan-gamedev/drm/-/tree/nvidia-next
Please Note that the dmesg didnt change immediately after I ran that program
The text was updated successfully, but these errors were encountered: