-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
snd_bcm2835: NULL pointer dereference #621
Comments
vchiu_queue_is_empty just does:
so queue is presumably NULL. I think that means vchi_msg_dequeue was called with handle=null. This means instance->vchi_handle[0] is null. Rebuilding kernel with AUDIO_DEBUG_ENABLE defined in sound/arm/bcm2835-vchiq.c will write a load of logging to dmesg. Might be worth trying to catch a log when you hit this failure. |
According to my logs it was not trying to close the stream. There is a small chance that some buffer underrun caused it to try reopening the device. But I would not expect it. One additional note: If this error occured no further usage of audio/video is possible, but will just hang until the Pi is reset. |
Just ran into the next occurance. It looks indeed as if a device close/open happens, not sure why it happens though.
|
I think you have 36ab679 - it's just been squashed into initial driver commit. |
Actually in my tree 3.14 it wasn't included. Hadn't rebased a few weeks, What I think does happen is a race condition on closing the device.
The vchi instance |
Can you try with the latest 3.14 tree? |
Alright. Test is already in progress. If it does not occur again within the next days I'll close the bug. Otherwise I'll update with more details |
Haven't seem the issue with 3.14.6 anymore. So I assume it's fixed. If it happens again I'll reopen. |
Just that I haved said it, I have to reopen :(
This time I can say for sure that it happened on a device close/open cycle, which was triggered by an audio format change. TV show being broadcasted as AC3, commercials as MPEG Audio. |
Realizing that this happened on the audio codec switch in the stream I did a short recording which contains such a switch and am playing it in an endless loop. This allowed me to reproduce the issue within an hour. |
Ignoring the callback if vchi_handle[0] is NULL does avoid the Ooops, but the application which was accessing the device at that point is stalled then. A hard kill works though and the VideoCore stays usable afterwards. |
I think the (updating samples consumed) callback is purely informative and won't cause a lockup if discarded. |
I'm trying to make up a little testtool to trigger it. |
Can you enable AUDIO_DEBUG_ENABLE and AUDIO_VERBOSE_DEBUG_ENABLE in bcm2835.h ? |
Sure. Just building a kernel and starting debug again So far I was unable to create a standalone test-case. Only am able to trigger the test with playing back that video file. And it happens once every few hours... I'll post a new log as soon as I have one. |
This time it was fast :) You can see the log here: |
Do you have and easy way to reproduce this? A while back when investigating a previous panic I did feel we needed a higher level mutex to protect the opens and closes. It turned out not to fix the panic I was investigating and it got stashed. Your panic does sound like a similar issue, so I've just dug it out and compile tested it - it could be worth a try: |
I've just running a test with it. Looping VLC with my test-snippet usually made the crash happen after some minutes yesterday. Right now it runs for almost 2 hours without a crash already. So this looks promising. If it's still running in the evening I would assume this fixes the issue. |
Unfortunately the error just occured again:
Further ideas? |
Yes, pre-built vlc with sample file would be useful. |
@popcornmix providing a proper raspbian build is a bit more painful than I had expected as my builds are requiring a newer glibc than raspbian has. But I'm at it. Hope to be able to provide something later today. |
I do have gcc 4.8 installed which includes newer glibc, so it may be okay on my install. |
Ah great, in this case you can try the following build: Just untar in your home folder, then:
You might need to install some dependencies like libdvbpsi though. If it fails just post the error log, I'll take a look then. If it works, just let it run. After a while you should see that it stalls and dmesg will show the Ooops. |
Sorry - I've been at this for ages, but can't get new glibc and old libav to be installed together. I started with clean wheezy install. I did "apt-get install vlc" hoping to get most of the dependencies. I use this scheme to install gcc 4.8 which solved the glibc issue, but still has missing libs: Now I can't install libs:
I can install the libs from jessie, but then it still fails to link as my libs are newer than expected. |
I've just uploaded a new tarball (same url), in which I included libdvbpsi, libav and some more. I think it should contain all critical libs that were missing on your system. |
Looks like just: I have this installed: |
That plugin shouldn't be needed for playing the sample. Can you paste the full log? |
I've setup a raspbian with gcc 4.8 following the blog post you linked to now. |
Have you overclocked your Pi? Mine is running with 950 MHz, maybe a higher clock increases chances to hit it? |
Another panic last night with more logging. However I think the message is persisting to the next instance of audio service and the old message arrives while the new service is being created. instance->vchi_handle[0] is null when the callback occurs. Doing an early return from the callback if instance->vchi_handle[0] is null will probably help. But the real fix will be to vchiq. A colleague is investigating this. |
This matches my experience. But an early return if vchi_handle[0] is null seemed not to be sufficient. It avoided the panic, but the VLC process was still stalled. Having a fix in vchiq sounds proper :) |
Any update from the vchiq guy yet? |
I have a (vchiq) patch that does a synchronous close, but it didn't fix the panic. |
Interesting. Is it still getting that late callback? |
Still looks the same from my log. The callback occurs before the open has returned, and so the vchiq handle is null. |
Have you had a chance to talk to him these days? |
Yes, spoken to him. He's aware of an issue still present that could explain this behaviour and will fix it, but he's got a more urgent job on at the moment. Hopefully next week. |
Sounds promising. Looking forward to an update :) |
You can try: I'm currently testing it (with my earlier alsa mutex patch) and it's been running for nearly 2 hours. |
Test is running for roughly 4 hours now. Looks promising! |
Test ran until my remote connection dropped at 2am. I think we can assume the patch is working. Did your tests succeed as well? |
Still running now since last night. Looks good to me. |
See: raspberrypi/linux#621 kernel: hid: Reduce default mouse polling interval to 60Hz See: http://www.raspberrypi.org/forums/viewtopic.php?t=75365&p=556534#p540272 firmware: gpioman: Add support for configuring gpio from external blob file firmware: Backport latest VCSM firmware: arm_loader: Add extended command line parameters before filling in device tree See: raspberrypi/linux#627 firmware: bplus: Support config option max_usb_current See: http://www.raspberrypi.org/forums/viewtopic.php?f=63&t=81736&start=50#p577877 firmware: bplus: Invert left/right channels for pwm audio on bplus
See: raspberrypi/linux#621 kernel: hid: Reduce default mouse polling interval to 60Hz See: http://www.raspberrypi.org/forums/viewtopic.php?t=75365&p=556534#p540272 firmware: gpioman: Add support for configuring gpio from external blob file firmware: Backport latest VCSM firmware: arm_loader: Add extended command line parameters before filling in device tree See: raspberrypi/linux#627 firmware: bplus: Support config option max_usb_current See: http://www.raspberrypi.org/forums/viewtopic.php?f=63&t=81736&start=50#p577877 firmware: bplus: Invert left/right channels for pwm audio on bplus
This is in latest firmware update. Please close when you are happy it's resolved. |
In my program I often get this error on the Raspberry Pi 2. I've updated to the latest kernel ( Here is the full log, the
Could this also be a bug in my code or this is definitely a bug in the kernel or ALSA driver? My code works fine on x86 Linuxes like Mint and Ubuntu but on Raspbian it crashes very often with the What can I do about this? |
Can you provide a test program (ideally minimal) that provokes the error? |
You should create a new issue. No evidence that your issue is connected to the original issue. |
Added #1560 |
See: raspberrypi/linux#621 kernel: hid: Reduce default mouse polling interval to 60Hz See: http://www.raspberrypi.org/forums/viewtopic.php?t=75365&p=556534#p540272 firmware: gpioman: Add support for configuring gpio from external blob file firmware: Backport latest VCSM firmware: arm_loader: Add extended command line parameters before filling in device tree See: raspberrypi/linux#627 firmware: bplus: Support config option max_usb_current See: http://www.raspberrypi.org/forums/viewtopic.php?f=63&t=81736&start=50#p577877 firmware: bplus: Invert left/right channels for pwm audio on bplus
[ Upstream commit 2fb2799 ] rmnet can have only two bridge interface. One of them is a link interface and another one is added by the master operation. rmnet interface shouldn't allow adding additional bridge interfaces by mater operation. But, there is no code to deny additional interfaces. So, interface leak occurs. Test commands: ip link add dummy0 type dummy ip link add dummy1 type dummy ip link add dummy2 type dummy ip link add rmnet0 link dummy0 type rmnet mux_id 1 ip link set dummy1 master rmnet0 ip link set dummy2 master rmnet0 ip link del rmnet0 In the above test command, the dummy0 was attached to rmnet as VND mode. Then, dummy1 was attached to rmnet0 as BRIDGE mode. At this point, dummy0 mode is switched from VND to BRIDGE automatically. Then, dummy2 is attached to rmnet as BRIDGE mode. At this point, rmnet0 should deny this operation. But, rmnet0 doesn't deny this. So that below splat occurs when the rmnet0 interface is deleted. Splat looks like: [ 186.684787][ C2] WARNING: CPU: 2 PID: 1009 at net/core/dev.c:8992 rollback_registered_many+0x986/0xcf0 [ 186.684788][ C2] Modules linked in: rmnet dummy openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_x [ 186.684805][ C2] CPU: 2 PID: 1009 Comm: ip Not tainted 5.8.0-rc1+ #621 [ 186.684807][ C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 186.684808][ C2] RIP: 0010:rollback_registered_many+0x986/0xcf0 [ 186.684811][ C2] Code: 41 8b 4e cc 45 31 c0 31 d2 4c 89 ee 48 89 df e8 e0 47 ff ff 85 c0 0f 84 cd fc ff ff 5 [ 186.684812][ C2] RSP: 0018:ffff8880cd9472e0 EFLAGS: 00010287 [ 186.684815][ C2] RAX: ffff8880cc56da58 RBX: ffff8880ab21c000 RCX: ffffffff9329d323 [ 186.684816][ C2] RDX: 1ffffffff2be6410 RSI: 0000000000000008 RDI: ffffffff95f32080 [ 186.684818][ C2] RBP: dffffc0000000000 R08: fffffbfff2be6411 R09: fffffbfff2be6411 [ 186.684819][ C2] R10: ffffffff95f32087 R11: 0000000000000001 R12: ffff8880cd947480 [ 186.684820][ C2] R13: ffff8880ab21c0b8 R14: ffff8880cd947400 R15: ffff8880cdf10640 [ 186.684822][ C2] FS: 00007f00843890c0(0000) GS:ffff8880d4e00000(0000) knlGS:0000000000000000 [ 186.684823][ C2] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 186.684825][ C2] CR2: 000055b8ab1077b8 CR3: 00000000ab612006 CR4: 00000000000606e0 [ 186.684826][ C2] Call Trace: [ 186.684827][ C2] ? lockdep_hardirqs_on_prepare+0x379/0x540 [ 186.684829][ C2] ? netif_set_real_num_tx_queues+0x780/0x780 [ 186.684830][ C2] ? rmnet_unregister_real_device+0x56/0x90 [rmnet] [ 186.684831][ C2] ? __kasan_slab_free+0x126/0x150 [ 186.684832][ C2] ? kfree+0xdc/0x320 [ 186.684834][ C2] ? rmnet_unregister_real_device+0x56/0x90 [rmnet] [ 186.684835][ C2] unregister_netdevice_many.part.135+0x13/0x1b0 [ 186.684836][ C2] rtnl_delete_link+0xbc/0x100 [ ... ] [ 238.440071][ T1009] unregister_netdevice: waiting for rmnet0 to become free. Usage count = 1 Fixes: 037f9cd ("net: rmnet: use upper/lower device infrastructure") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2fb2799 ] rmnet can have only two bridge interface. One of them is a link interface and another one is added by the master operation. rmnet interface shouldn't allow adding additional bridge interfaces by mater operation. But, there is no code to deny additional interfaces. So, interface leak occurs. Test commands: ip link add dummy0 type dummy ip link add dummy1 type dummy ip link add dummy2 type dummy ip link add rmnet0 link dummy0 type rmnet mux_id 1 ip link set dummy1 master rmnet0 ip link set dummy2 master rmnet0 ip link del rmnet0 In the above test command, the dummy0 was attached to rmnet as VND mode. Then, dummy1 was attached to rmnet0 as BRIDGE mode. At this point, dummy0 mode is switched from VND to BRIDGE automatically. Then, dummy2 is attached to rmnet as BRIDGE mode. At this point, rmnet0 should deny this operation. But, rmnet0 doesn't deny this. So that below splat occurs when the rmnet0 interface is deleted. Splat looks like: [ 186.684787][ C2] WARNING: CPU: 2 PID: 1009 at net/core/dev.c:8992 rollback_registered_many+0x986/0xcf0 [ 186.684788][ C2] Modules linked in: rmnet dummy openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_x [ 186.684805][ C2] CPU: 2 PID: 1009 Comm: ip Not tainted 5.8.0-rc1+ #621 [ 186.684807][ C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 186.684808][ C2] RIP: 0010:rollback_registered_many+0x986/0xcf0 [ 186.684811][ C2] Code: 41 8b 4e cc 45 31 c0 31 d2 4c 89 ee 48 89 df e8 e0 47 ff ff 85 c0 0f 84 cd fc ff ff 5 [ 186.684812][ C2] RSP: 0018:ffff8880cd9472e0 EFLAGS: 00010287 [ 186.684815][ C2] RAX: ffff8880cc56da58 RBX: ffff8880ab21c000 RCX: ffffffff9329d323 [ 186.684816][ C2] RDX: 1ffffffff2be6410 RSI: 0000000000000008 RDI: ffffffff95f32080 [ 186.684818][ C2] RBP: dffffc0000000000 R08: fffffbfff2be6411 R09: fffffbfff2be6411 [ 186.684819][ C2] R10: ffffffff95f32087 R11: 0000000000000001 R12: ffff8880cd947480 [ 186.684820][ C2] R13: ffff8880ab21c0b8 R14: ffff8880cd947400 R15: ffff8880cdf10640 [ 186.684822][ C2] FS: 00007f00843890c0(0000) GS:ffff8880d4e00000(0000) knlGS:0000000000000000 [ 186.684823][ C2] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 186.684825][ C2] CR2: 000055b8ab1077b8 CR3: 00000000ab612006 CR4: 00000000000606e0 [ 186.684826][ C2] Call Trace: [ 186.684827][ C2] ? lockdep_hardirqs_on_prepare+0x379/0x540 [ 186.684829][ C2] ? netif_set_real_num_tx_queues+0x780/0x780 [ 186.684830][ C2] ? rmnet_unregister_real_device+0x56/0x90 [rmnet] [ 186.684831][ C2] ? __kasan_slab_free+0x126/0x150 [ 186.684832][ C2] ? kfree+0xdc/0x320 [ 186.684834][ C2] ? rmnet_unregister_real_device+0x56/0x90 [rmnet] [ 186.684835][ C2] unregister_netdevice_many.part.135+0x13/0x1b0 [ 186.684836][ C2] rtnl_delete_link+0xbc/0x100 [ ... ] [ 238.440071][ T1009] unregister_netdevice: waiting for rmnet0 to become free. Usage count = 1 Fixes: 037f9cd ("net: rmnet: use upper/lower device infrastructure") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…rors commit 085a9f4 upstream. Use down_read_nested() and down_write_nested() when taking the ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in the path to the PCI root bus as lock subclass parameter. This fixes the following false-positive lockdep report when unplugging a Lenovo X1C8 from a Lenovo 2nd gen TB3 dock: pcieport 0000:06:01.0: pciehp: Slot(1): Link Down pcieport 0000:06:01.0: pciehp: Slot(1): Card not present ============================================ WARNING: possible recursive locking detected 5.16.0-rc2+ #621 Not tainted -------------------------------------------- irq/124-pciehp/86 is trying to acquire lock: ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80 but task is already holding lock: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&ctrl->reset_lock); lock(&ctrl->reset_lock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by irq/124-pciehp/86: #0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180 #1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110 #2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40 stack backtrace: CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ #621 Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021 Call Trace: <TASK> dump_stack_lvl+0x59/0x73 __lock_acquire.cold+0xc5/0x2c6 lock_acquire+0xb5/0x2b0 down_read+0x3e/0x50 pciehp_check_presence+0x23/0x80 pciehp_runtime_resume+0x5c/0xa0 device_for_each_child+0x45/0x70 pcie_port_device_runtime_resume+0x20/0x30 pci_pm_runtime_resume+0xa7/0xc0 __rpm_callback+0x41/0x110 rpm_callback+0x59/0x70 rpm_resume+0x512/0x7b0 __pm_runtime_resume+0x4a/0x90 __device_release_driver+0x28/0x240 device_release_driver+0x26/0x40 pci_stop_bus_device+0x68/0x90 pci_stop_bus_device+0x2c/0x90 pci_stop_and_remove_bus_device+0xe/0x20 pciehp_unconfigure_device+0x6c/0x110 pciehp_disable_slot+0x5b/0xe0 pciehp_handle_presence_or_link_change+0xc3/0x2f0 pciehp_ist+0x179/0x180 This lockdep warning is triggered because with Thunderbolt, hotplug ports are nested. When removing multiple devices in a daisy-chain, each hotplug port's reset_lock may be acquired recursively. It's never the same lock, so the lockdep splat is a false positive. Because locks at the same hierarchy level are never acquired recursively, a per-level lockdep class is sufficient to fix the lockdep warning. The choice to use one lockdep subclass per pcie-hotplug controller in the path to the root-bus was made to conserve class keys because their number is limited and the complexity grows quadratically with number of keys according to Documentation/locking/lockdep-design.rst. Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/ Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redhat.com/ Link: https://lore.kernel.org/r/20211217141709.379663-1-hdegoede@redhat.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=208855 Reported-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…rors commit 085a9f4 upstream. Use down_read_nested() and down_write_nested() when taking the ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in the path to the PCI root bus as lock subclass parameter. This fixes the following false-positive lockdep report when unplugging a Lenovo X1C8 from a Lenovo 2nd gen TB3 dock: pcieport 0000:06:01.0: pciehp: Slot(1): Link Down pcieport 0000:06:01.0: pciehp: Slot(1): Card not present ============================================ WARNING: possible recursive locking detected 5.16.0-rc2+ #621 Not tainted -------------------------------------------- irq/124-pciehp/86 is trying to acquire lock: ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80 but task is already holding lock: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&ctrl->reset_lock); lock(&ctrl->reset_lock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by irq/124-pciehp/86: #0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180 #1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110 #2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40 stack backtrace: CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ #621 Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021 Call Trace: <TASK> dump_stack_lvl+0x59/0x73 __lock_acquire.cold+0xc5/0x2c6 lock_acquire+0xb5/0x2b0 down_read+0x3e/0x50 pciehp_check_presence+0x23/0x80 pciehp_runtime_resume+0x5c/0xa0 device_for_each_child+0x45/0x70 pcie_port_device_runtime_resume+0x20/0x30 pci_pm_runtime_resume+0xa7/0xc0 __rpm_callback+0x41/0x110 rpm_callback+0x59/0x70 rpm_resume+0x512/0x7b0 __pm_runtime_resume+0x4a/0x90 __device_release_driver+0x28/0x240 device_release_driver+0x26/0x40 pci_stop_bus_device+0x68/0x90 pci_stop_bus_device+0x2c/0x90 pci_stop_and_remove_bus_device+0xe/0x20 pciehp_unconfigure_device+0x6c/0x110 pciehp_disable_slot+0x5b/0xe0 pciehp_handle_presence_or_link_change+0xc3/0x2f0 pciehp_ist+0x179/0x180 This lockdep warning is triggered because with Thunderbolt, hotplug ports are nested. When removing multiple devices in a daisy-chain, each hotplug port's reset_lock may be acquired recursively. It's never the same lock, so the lockdep splat is a false positive. Because locks at the same hierarchy level are never acquired recursively, a per-level lockdep class is sufficient to fix the lockdep warning. The choice to use one lockdep subclass per pcie-hotplug controller in the path to the root-bus was made to conserve class keys because their number is limited and the complexity grows quadratically with number of keys according to Documentation/locking/lockdep-design.rst. Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/ Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redhat.com/ Link: https://lore.kernel.org/r/20211217141709.379663-1-hdegoede@redhat.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=208855 Reported-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…rors commit 085a9f4 upstream. Use down_read_nested() and down_write_nested() when taking the ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in the path to the PCI root bus as lock subclass parameter. This fixes the following false-positive lockdep report when unplugging a Lenovo X1C8 from a Lenovo 2nd gen TB3 dock: pcieport 0000:06:01.0: pciehp: Slot(1): Link Down pcieport 0000:06:01.0: pciehp: Slot(1): Card not present ============================================ WARNING: possible recursive locking detected 5.16.0-rc2+ #621 Not tainted -------------------------------------------- irq/124-pciehp/86 is trying to acquire lock: ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80 but task is already holding lock: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&ctrl->reset_lock); lock(&ctrl->reset_lock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by irq/124-pciehp/86: #0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180 #1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110 #2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40 stack backtrace: CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ #621 Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021 Call Trace: <TASK> dump_stack_lvl+0x59/0x73 __lock_acquire.cold+0xc5/0x2c6 lock_acquire+0xb5/0x2b0 down_read+0x3e/0x50 pciehp_check_presence+0x23/0x80 pciehp_runtime_resume+0x5c/0xa0 device_for_each_child+0x45/0x70 pcie_port_device_runtime_resume+0x20/0x30 pci_pm_runtime_resume+0xa7/0xc0 __rpm_callback+0x41/0x110 rpm_callback+0x59/0x70 rpm_resume+0x512/0x7b0 __pm_runtime_resume+0x4a/0x90 __device_release_driver+0x28/0x240 device_release_driver+0x26/0x40 pci_stop_bus_device+0x68/0x90 pci_stop_bus_device+0x2c/0x90 pci_stop_and_remove_bus_device+0xe/0x20 pciehp_unconfigure_device+0x6c/0x110 pciehp_disable_slot+0x5b/0xe0 pciehp_handle_presence_or_link_change+0xc3/0x2f0 pciehp_ist+0x179/0x180 This lockdep warning is triggered because with Thunderbolt, hotplug ports are nested. When removing multiple devices in a daisy-chain, each hotplug port's reset_lock may be acquired recursively. It's never the same lock, so the lockdep splat is a false positive. Because locks at the same hierarchy level are never acquired recursively, a per-level lockdep class is sufficient to fix the lockdep warning. The choice to use one lockdep subclass per pcie-hotplug controller in the path to the root-bus was made to conserve class keys because their number is limited and the complexity grows quadratically with number of keys according to Documentation/locking/lockdep-design.rst. Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/ Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redhat.com/ Link: https://lore.kernel.org/r/20211217141709.379663-1-hdegoede@redhat.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=208855 Reported-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…rors commit 085a9f4 upstream. Use down_read_nested() and down_write_nested() when taking the ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in the path to the PCI root bus as lock subclass parameter. This fixes the following false-positive lockdep report when unplugging a Lenovo X1C8 from a Lenovo 2nd gen TB3 dock: pcieport 0000:06:01.0: pciehp: Slot(1): Link Down pcieport 0000:06:01.0: pciehp: Slot(1): Card not present ============================================ WARNING: possible recursive locking detected 5.16.0-rc2+ raspberrypi#621 Not tainted -------------------------------------------- irq/124-pciehp/86 is trying to acquire lock: ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80 but task is already holding lock: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&ctrl->reset_lock); lock(&ctrl->reset_lock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by irq/124-pciehp/86: #0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180 raspberrypi#1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110 raspberrypi#2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40 stack backtrace: CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ raspberrypi#621 Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021 Call Trace: <TASK> dump_stack_lvl+0x59/0x73 __lock_acquire.cold+0xc5/0x2c6 lock_acquire+0xb5/0x2b0 down_read+0x3e/0x50 pciehp_check_presence+0x23/0x80 pciehp_runtime_resume+0x5c/0xa0 device_for_each_child+0x45/0x70 pcie_port_device_runtime_resume+0x20/0x30 pci_pm_runtime_resume+0xa7/0xc0 __rpm_callback+0x41/0x110 rpm_callback+0x59/0x70 rpm_resume+0x512/0x7b0 __pm_runtime_resume+0x4a/0x90 __device_release_driver+0x28/0x240 device_release_driver+0x26/0x40 pci_stop_bus_device+0x68/0x90 pci_stop_bus_device+0x2c/0x90 pci_stop_and_remove_bus_device+0xe/0x20 pciehp_unconfigure_device+0x6c/0x110 pciehp_disable_slot+0x5b/0xe0 pciehp_handle_presence_or_link_change+0xc3/0x2f0 pciehp_ist+0x179/0x180 This lockdep warning is triggered because with Thunderbolt, hotplug ports are nested. When removing multiple devices in a daisy-chain, each hotplug port's reset_lock may be acquired recursively. It's never the same lock, so the lockdep splat is a false positive. Because locks at the same hierarchy level are never acquired recursively, a per-level lockdep class is sufficient to fix the lockdep warning. The choice to use one lockdep subclass per pcie-hotplug controller in the path to the root-bus was made to conserve class keys because their number is limited and the complexity grows quadratically with number of keys according to Documentation/locking/lockdep-design.rst. Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/ Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redhat.com/ Link: https://lore.kernel.org/r/20211217141709.379663-1-hdegoede@redhat.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=208855 Reported-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In long runs with VLC playing back TV we sometimes run into a freeze of the playback which seems to be caused by a NULL pointer dereference:
Any thoughts what might cause this?
It happens with all kernel trees we've been testing with so far (3.6.11, 3.10 and 3.14).
The text was updated successfully, but these errors were encountered: