Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snd_bcm2835: NULL pointer dereference #621

Closed
julianscheel opened this issue Jun 17, 2014 · 54 comments
Closed

snd_bcm2835: NULL pointer dereference #621

julianscheel opened this issue Jun 17, 2014 · 54 comments

Comments

@julianscheel
Copy link
Contributor

In long runs with VLC playing back TV we sometimes run into a freeze of the playback which seems to be caused by a NULL pointer dereference:

[335302.110579] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[335302.110766] pgd = c0004000
[335302.110829] [00000008] *pgd=00000000
[335302.110931] Internal error: Oops: 17 [#1] PREEMPT ARM
[335302.111031] Modules linked in: fuse snd_soc_pcm1796 snd_soc_bcm2708_i2s snd_bcm2835 regmap_mmio bcm2708_wdog snd_soc_core snd_compress regmap_i2c snd_pcm_dmaengine regmap_spi leds_gpio led_class snd_pcm snd_timer spi_bcm2708 vfat snd i2c_bcm2708 fat
[335302.111604] CPU: 0 PID: 33 Comm: VCHIQ-0 Not tainted 3.14.0 #1
[335302.111695] task: d356ec00 ti: d35c6000 task.ti: d35c6000
[335302.111798] PC is at vchiu_queue_is_empty+0xc/0x24
[335302.111880] LR is at vchi_msg_dequeue+0x3c/0xa8
[335302.111956] pc : [<c024747c>]    lr : [<c0246e20>]    psr: 60000013
[335302.111956] sp : d35c7e50  ip : d35c7e60  fp : d35c7e5c
[335302.112108] r10: 0000001f  r9 : c051eba4  r8 : d35c7e94
[335302.112186] r7 : 00000014  r6 : d35c7e90  r5 : 00000000  r4 : 00000004
[335302.112279] r3 : d35c7e90  r2 : 00000014  r1 : d35c7e94  r0 : 00000004
[335302.112372] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[335302.112476] Control: 00c5387d  Table: 13774008  DAC: 00000017
[335302.112559] Process VCHIQ-0 (pid: 33, stack limit = 0xd35c61b8)
[335302.112646] Stack: (0xd35c7e50 to 0xd35c8000)
[335302.112720] 7e40:                                     d35c7e84 d35c7e60 c0246e20 c024747c
[335302.112836] 7e60: d35c7e90 d24b6180 00000000 00000000 00000000 d36e1800 d35c7ec4 d35c7e88
[335302.112953] 7e80: bf09a7bc c0246df0 00000000 ffffffff d255a684 d255a690 d35c7ec4 d35c7ea8
[335302.113068] 7ea0: c024753c c004af10 d255a680 d255a680 00000002 d400a928 d35c7ee4 d35c7ec8
[335302.113184] 7ec0: c0246c10 bf09a778 c0246b4c 4501f010 d400a928 00000020 d35c7f64 d35c7ee8
[335302.113300] 7ee0: c0240afc c0246b58 c004259c c0042428 d35c7f34 d35c7f00 c00496d0 c0042594
[335302.113417] 7f00: 48c20646 00000000 c0cac058 00000014 d36d1588 d4004020 d4004194 c07a8948
[335302.113532] 7f20: 00000010 d4004194 d35c7f4c 00000000 d35bcf60 c051eba4 c023fce8 00000000
[335302.113648] 7f40: d35bcf60 c051eba4 c023fce8 00000000 00000000 00000000 d35c7fac d35c7f68
[335302.113764] 7f60: c003a8bc c023fcf4 d0c0c050 00000000 4a40c044 c051eba4 00000000 d35c7f7c
[335302.113879] 7f80: d35c7f7c 00000000 d35c7f88 d35c7f88 d35bcf60 c003a7f8 00000000 00000000
[335302.113993] 7fa0: 00000000 d35c7fb0 c000e738 c003a804 00000000 00000000 00000000 00000000
[335302.114109] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[335302.114224] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 050240c0 e2c6c0d0
[335302.114329] Backtrace: 
[335302.114408] [<c0247470>] (vchiu_queue_is_empty) from [<c0246e20>] (vchi_msg_dequeue+0x3c/0xa8)
[335302.114550] [<c0246de4>] (vchi_msg_dequeue) from [<bf09a7bc>] (audio_vchi_callback+0x50/0xb8 [snd_bcm2835])
[335302.114680]  r8:d36e1800 r7:00000000 r6:00000000 r5:00000000 r4:d24b6180 r3:d35c7e90
[335302.114871] [<bf09a76c>] (audio_vchi_callback [snd_bcm2835]) from [<c0246c10>] (shim_callback+0xc4/0x114)
[335302.114999]  r6:d400a928 r5:00000002 r4:d255a680
[335302.115123] [<c0246b4c>] (shim_callback) from [<c0240afc>] (slot_handler_func+0xe14/0x174c)
[335302.115236]  r7:00000020 r6:d400a928 r5:4501f010 r4:c0246b4c
[335302.115391] [<c023fce8>] (slot_handler_func) from [<c003a8bc>] (kthread+0xc4/0xe0)
[335302.115495]  r10:00000000 r9:00000000 r8:00000000 r7:c023fce8 r6:c051eba4 r5:d35bcf60
[335302.115667]  r4:00000000
[335302.115744] [<c003a7f8>] (kthread) from [<c000e738>] (ret_from_fork+0x14/0x3c)
[335302.115845]  r7:00000000 r6:00000000 r5:c003a7f8 r4:d35bcf60
[335302.115985] Code: c0461af0 e1a0c00d e92dd800 e24cb004 (e5902004) 
[335302.116761] ---[ end trace 2fa5c3f368110486 ]---

Any thoughts what might cause this?
It happens with all kernel trees we've been testing with so far (3.6.11, 3.10 and 3.14).

@popcornmix
Copy link
Collaborator

vchiu_queue_is_empty just does:

return queue->read == queue->write;

so queue is presumably NULL. I think that means vchi_msg_dequeue was called with handle=null.

This means instance->vchi_handle[0] is null.
I guess most likely vc_vchi_audio_deinit has been called. Is it possible VLC was in the process of closing the audio stream?

Rebuilding kernel with AUDIO_DEBUG_ENABLE defined in sound/arm/bcm2835-vchiq.c will write a load of logging to dmesg. Might be worth trying to catch a log when you hit this failure.

@julianscheel
Copy link
Contributor Author

According to my logs it was not trying to close the stream. There is a small chance that some buffer underrun caused it to try reopening the device. But I would not expect it.
I'm just building a kernel with AUDIO_DEBUG_ENABLE set. Might take a while before I can post debug logs the bug happens once in a few days only.

One additional note: If this error occured no further usage of audio/video is possible, but will just hang until the Pi is reset.

@julianscheel
Copy link
Contributor Author

Just ran into the next occurance. It looks indeed as if a device close/open happens, not sure why it happens though.
Still it probably shouldn't oops, even if it's reopening the device. See the following dmesg log. It's with a 3.14 kernel. I just saw that on 3.10 series is a commit in the alsa drivers which somehow feels related: 36ab679
Do you think that one might be what happens here?

[ 1119.393111] bcm2835_audio_write_worker:779  .. IN
[ 1119.393122] bcm2835_audio_write_worker:781  Writing 6144 bytes from d255a630
[ 1119.393226] bcm2835_audio_write_worker:848  .. OUT
[ 1119.393239] my_wq_function:111  .. OUT 0
[ 1119.401749] bcm2835_audio_write:160  .. IN
[ 1119.401792] bcm2835_audio_write:176  .. OUT 0
[ 1119.401842] my_wq_function:94  .. IN d245c0c0:2
[ 1119.401854] bcm2835_audio_write_worker:779  .. IN
[ 1119.401884] bcm2835_audio_write_worker:781  Writing 6144 bytes from d255be30
[ 1119.401979] bcm2835_audio_write_worker:848  .. OUT
[ 1119.401990] my_wq_function:111  .. OUT 0
[ 1119.402033] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.402053] audio_vchi_callback:224  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=1920
[ 1119.402065] audio_vchi_callback:235  .. OUT
[ 1119.412414] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.412562] audio_vchi_callback:224  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=2016
[ 1119.412585] audio_vchi_callback:235  .. OUT
[ 1119.421974] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.422018] audio_vchi_callback:224  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=1824
[ 1119.422032] audio_vchi_callback:235  .. OUT
[ 1119.431913] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.431946] audio_vchi_callback:224  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=1920
[ 1119.431958] audio_vchi_callback:235  .. OUT
[ 1119.441976] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.442007] audio_vchi_callback:224  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=1920
[ 1119.442019] audio_vchi_callback:235  .. OUT
[ 1119.452325] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.452396] audio_vchi_callback:224  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=2000
[ 1119.452409] audio_vchi_callback:235  .. OUT
[ 1119.462084] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.462127] audio_vchi_callback:224  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=1840
[ 1119.462140] audio_vchi_callback:235  .. OUT
[ 1119.471909] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.471939] audio_vchi_callback:224  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=1920
[ 1119.471971] audio_vchi_callback:235  .. OUT
[ 1119.477288] bcm2835_audio_write:160  .. IN
[ 1119.477343] bcm2835_audio_write:176  .. OUT 0
[ 1119.477453] my_wq_function:94  .. IN d245c0c0:2
[ 1119.477467] bcm2835_audio_write_worker:779  .. IN
[ 1119.477478] bcm2835_audio_write_worker:781  Writing 6144 bytes from d255d630
[ 1119.477574] bcm2835_audio_write_worker:848  .. OUT
[ 1119.477586] my_wq_function:111  .. OUT 0
[ 1119.482617] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.482647] audio_vchi_callback:224  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=2032
[ 1119.482658] audio_vchi_callback:235  .. OUT
[ 1119.485726] bcm2835_audio_write:160  .. IN
[ 1119.485765] bcm2835_audio_write:176  .. OUT 0
[ 1119.485775] bcm2835_audio_write:160  .. IN
[ 1119.485785] bcm2835_audio_write:176  .. OUT 0
[ 1119.485819] my_wq_function:94  .. IN d245c0c0:2
[ 1119.485830] bcm2835_audio_write_worker:779  .. IN
[ 1119.485840] bcm2835_audio_write_worker:781  Writing 4560 bytes from d255ee30
[ 1119.486061] bcm2835_audio_write_worker:848  .. OUT
[ 1119.486076] my_wq_function:111  .. OUT 0
[ 1119.486090] my_wq_function:94  .. IN d245c0c0:2
[ 1119.486098] bcm2835_audio_write_worker:779  .. IN
[ 1119.486108] bcm2835_audio_write_worker:781  Writing 1584 bytes from d2550000
[ 1119.486138] bcm2835_audio_write_worker:848  .. OUT
[ 1119.486147] my_wq_function:111  .. OUT 0
[ 1119.490952] bcm2835_audio_write:160  .. IN
[ 1119.490993] bcm2835_audio_write:176  .. OUT 0
[ 1119.491029] my_wq_function:94  .. IN d245c0c0:2
[ 1119.491041] bcm2835_audio_write_worker:779  .. IN
[ 1119.491051] bcm2835_audio_write_worker:781  Writing 6144 bytes from d2550630
[ 1119.491144] bcm2835_audio_write_worker:848  .. OUT
[ 1119.491156] my_wq_function:111  .. OUT 0
[ 1119.491913] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.491939] audio_vchi_callback:224  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=1808
[ 1119.491950] audio_vchi_callback:235  .. OUT
[ 1119.501968] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.502009] audio_vchi_callback:224  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=1920
[ 1119.502023] audio_vchi_callback:235  .. OUT
[ 1119.512516] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.512551] audio_vchi_callback:224  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=2032
[ 1119.512592] audio_vchi_callback:235  .. OUT
[ 1119.519016] bcm2835_audio_write:160  .. IN
[ 1119.519060] bcm2835_audio_write:176  .. OUT 0
[ 1119.519098] my_wq_function:94  .. IN d245c0c0:2
[ 1119.519110] bcm2835_audio_write_worker:779  .. IN
[ 1119.519120] bcm2835_audio_write_worker:781  Writing 6144 bytes from d2551e30
[ 1119.519226] bcm2835_audio_write_worker:848  .. OUT
[ 1119.519238] my_wq_function:111  .. OUT 0
[ 1119.521925] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.521956] audio_vchi_callback:224  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=1808
[ 1119.521968] audio_vchi_callback:235  .. OUT
[ 1119.531916] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.531951] audio_vchi_callback:224  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=1920
[ 1119.531962] audio_vchi_callback:235  .. OUT
[ 1119.542709] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.542748] audio_vchi_callback:224  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=2064
[ 1119.542763] audio_vchi_callback:235  .. OUT
[ 1119.551915] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.552061] audio_vchi_callback:224  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=1776
[ 1119.552102] audio_vchi_callback:235  .. OUT
[ 1119.559923] bcm2835_audio_stop:138  .. IN
[ 1119.559967] bcm2835_audio_stop:152  .. OUT 0
[ 1119.560116] my_wq_function:94  .. IN d245c0c0:0
[ 1119.560131] bcm2835_audio_stop_worker:674  .. IN
[ 1119.560217] bcm2835_audio_stop_worker:704  .. OUT
[ 1119.560232] my_wq_function:111  .. OUT 0
[ 1119.560477] bcm2835_audio_close:714  .. IN
[ 1119.565478] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=1, handle=  (null)
[ 1119.565503] audio_vchi_callback:217  .. instance=d24cec80, m.type=VC_AUDIO_MSG_TYPE_RESULT, success=0
[ 1119.565523] audio_vchi_callback:235  .. OUT
[ 1119.565554] vc_vchi_audio_deinit:307  .. IN
[ 1119.565565] vc_vchi_audio_deinit:315  .. about to lock (1)
[ 1119.565575] vc_vchi_audio_deinit:325  .. 0:closing d252ee40
[ 1119.565777] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=8, handle=  (null)
[ 1119.565818] vc_vchi_audio_deinit:340  .. OUT
[ 1119.565828] bcm2835_audio_close:767  .. OUT
[ 1119.571645] bcm2835_audio_open:408  .. IN
[ 1119.571725] bcm2835_audio_open_connection:351  .. IN
[ 1119.571736] bcm2835_audio_open_connection:353 bcm2835_audio_open_connection: start
[ 1119.571758] vc_vchi_audio_init:247 vc_vchi_audio_init: start<6>[ 1119.571858] audio_vchi_callback:206  .. IN instance=d252ee40, param=d252ee40, reason=1, handle=  (null)
[ 1119.571888] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[ 1119.572035] pgd = c0004000
[ 1119.572091] [00000008] *pgd=00000000
[ 1119.572318] Internal error: Oops: 17 [#1] PREEMPT ARM
[ 1119.572404] Modules linked in: snd_soc_pcm1796 snd_soc_bcm2708_i2s snd_bcm2835 regmap_mmio bcm2708_wdog snd_soc_core snd_compress regmap_i2c snd_pcm_dmaengine regmap_spi snd_pcm leds_gpio led_class snd_timer spi_bcm2708 i2c_bcm2708 snd vfat fat
[ 1119.572943] CPU: 0 PID: 33 Comm: VCHIQ-0 Not tainted 3.14.0 #1
[ 1119.573033] task: d356ec00 ti: d35c6000 task.ti: d35c6000
[ 1119.573135] PC is at vchiu_queue_is_empty+0xc/0x24
[ 1119.573216] LR is at vchi_msg_dequeue+0x3c/0xa8
[ 1119.573292] pc : [<c024747c>]    lr : [<c0246e20>]    psr: 68000013
[ 1119.573292] sp : d35c7e48  ip : d35c7e58  fp : d35c7e54
[ 1119.573442] r10: 0000001f  r9 : c051eba4  r8 : d35c7e94
[ 1119.573520] r7 : 00000014  r6 : d35c7e90  r5 : 00000000  r4 : 00000004
[ 1119.573612] r3 : d35c7e90  r2 : 00000014  r1 : d35c7e94  r0 : 00000004
[ 1119.573705] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[ 1119.573807] Control: 00c5387d  Table: 124f0008  DAC: 00000017
[ 1119.573890] Process VCHIQ-0 (pid: 33, stack limit = 0xd35c61b8)
[ 1119.573978] Stack: (0xd35c7e48 to 0xd35c8000)
[ 1119.574053] 7e40:                   d35c7e7c d35c7e58 c0246e20 c024747c d35c7e90 d252ee40
[ 1119.574170] 7e60: 00000000 00000000 00000000 d34fa800 d35c7ec4 d35c7e80 bf09a9e8 c0246df0
[ 1119.574286] 7e80: 00000000 00000001 00000000 d248c800 d252e144 d252e150 d35c7ec4 d35c7ea8
[ 1119.574401] 7ea0: c024753c c004af10 d252e140 00000002 d4008a08 00000000 d35c7ee4 d35c7ec8
[ 1119.574516] 7ec0: c0246c10 bf09a980 c0246b4c 4501f010 d4008a08 00000020 d35c7f64 d35c7ee8
[ 1119.574633] 7ee0: c0240afc c0246b58 c004259c c0042428 d35c7f34 d35c7f00 c00496d0 c0042594
[ 1119.574748] 7f00: c4c0d0d0 00000000 04c4c2c6 00000014 d36d0f88 d4004020 d4004194 04359a28
[ 1119.574864] 7f20: 00000010 d4004194 d35c7f4c 00000000 d35bcf60 c051eba4 c023fce8 00000000
[ 1119.574978] 7f40: d35bcf60 c051eba4 c023fce8 00000000 00000000 00000000 d35c7fac d35c7f68
[ 1119.575093] 7f60: c003a8bc c023fcf4 40c044c0 00000000 c0c0c0c2 c051eba4 00000000 d35c7f7c
[ 1119.575210] 7f80: d35c7f7c 00000000 d35c7f88 d35c7f88 d35bcf60 c003a7f8 00000000 00000000
[ 1119.575325] 7fa0: 00000000 d35c7fb0 c000e738 c003a804 00000000 00000000 00000000 00000000
[ 1119.575440] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1119.575553] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 c0c0c0c4 c084c4c2
[ 1119.575658] Backtrace: 
[ 1119.575736] [<c0247470>] (vchiu_queue_is_empty) from [<c0246e20>] (vchi_msg_dequeue+0x3c/0xa8)
[ 1119.575876] [<c0246de4>] (vchi_msg_dequeue) from [<bf09a9e8>] (audio_vchi_callback+0x74/0x170 [snd_bcm2835])
[ 1119.576005]  r8:d34fa800 r7:00000000 r6:00000000 r5:00000000 r4:d252ee40 r3:d35c7e90
[ 1119.576199] [<bf09a974>] (audio_vchi_callback [snd_bcm2835]) from [<c0246c10>] (shim_callback+0xc4/0x114)
[ 1119.576325]  r7:00000000 r6:d4008a08 r5:00000002 r4:d252e140
[ 1119.576473] [<c0246b4c>] (shim_callback) from [<c0240afc>] (slot_handler_func+0xe14/0x174c)
[ 1119.576587]  r7:00000020 r6:d4008a08 r5:4501f010 r4:c0246b4c
[ 1119.576739] [<c023fce8>] (slot_handler_func) from [<c003a8bc>] (kthread+0xc4/0xe0)
[ 1119.576844]  r10:00000000 r9:00000000 r8:00000000 r7:c023fce8 r6:c051eba4 r5:d35bcf60
[ 1119.577018]  r4:00000000
[ 1119.577093] [<c003a7f8>] (kthread) from [<c000e738>] (ret_from_fork+0x14/0x3c)
[ 1119.577193]  r7:00000000 r6:00000000 r5:c003a7f8 r4:d35bcf60
[ 1119.577334] Code: c0461af0 e1a0c00d e92dd800 e24cb004 (e5902004) 
[ 1119.578010] ---[ end trace 74896e46ee1bae7e ]---
[ 1119.578923] bcm2835_audio_open_connection:395  success !
[ 1119.579115] bcm2835_audio_open_connection:397  .. OUT
[ 1119.579267] bcm2835_audio_open:447  .. OUT
[ 1119.593807] bcm2835_audio_set_params:552  .. IN
[ 1119.594075] bcm2835_audio_set_params:556  Setting ALSA channels(2), samplerate(48000), bits-per-sample(16)
[ 1119.594278] bcm2835_audio_set_ctls_chan:458  .. IN
[ 1119.594435] bcm2835_audio_set_ctls_chan:461  Setting ALSA dest(0), volume(0)

@popcornmix
Copy link
Collaborator

I think you have 36ab679 - it's just been squashed into initial driver commit.

@julianscheel
Copy link
Contributor Author

Actually in my tree 3.14 it wasn't included. Hadn't rebased a few weeks,
Anyway after some closer investigation I doubt it's related.

What I think does happen is a race condition on closing the device.
In the log you see

[ 1119.565575] vc_vchi_audio_deinit:325  .. 0:closing d252ee40
[ 1119.565777] audio_vchi_callback:206  .. IN instance=d24cec80, param=d24cec80, reason=8, handle=  (null)
[ 1119.565818] vc_vchi_audio_deinit:340  .. OUT
[ 1119.565828] bcm2835_audio_close:767  .. OUT
[ 1119.571645] bcm2835_audio_open:408  .. IN
[ 1119.571725] bcm2835_audio_open_connection:351  .. IN
[ 1119.571736] bcm2835_audio_open_connection:353 bcm2835_audio_open_connection: start
[ 1119.571758] vc_vchi_audio_init:247 vc_vchi_audio_init: start<6>[ 1119.571858] audio_vchi_callback:206  .. IN instance=d252ee40, param=d252ee40, reason=1, handle=  (null)
[ 1119.571888] Unable to handle kernel NULL pointer dereference at virtual address 00000008

The vchi instance d252ee40 is closed by vc_vchi_audio_deinit right before the callback for this instance is arriving.
Now I wonder what's the best approach to handle this. Just checking for instance->vchi[0] being not NULL does not feel right as probably the whole instance should be invalid at that point in time. What do you think?

@popcornmix
Copy link
Collaborator

Can you try with the latest 3.14 tree?
The patch to fix a null pointer access does directly affect when the vc_vchi_audio_init call occurs, which is where you get your null pointer access from.

@julianscheel
Copy link
Contributor Author

Alright. Test is already in progress. If it does not occur again within the next days I'll close the bug. Otherwise I'll update with more details

@julianscheel
Copy link
Contributor Author

Haven't seem the issue with 3.14.6 anymore. So I assume it's fixed. If it happens again I'll reopen.

@julianscheel
Copy link
Contributor Author

Just that I haved said it, I have to reopen :(
Another trace unfortunately without debug log:

[56184.240574] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[56184.240776] pgd = c0004000
[56184.240903] [00000008] *pgd=00000000
[56184.241006] Internal error: Oops: 17 [#1] PREEMPT ARM
[56184.241117] Modules linked in: snd_soc_pcm1796 snd_soc_bcm2708_i2s snd_bcm2835 regmap_mmio bcm2708_wdog snd_soc_core snd_compress regmap_i2c snd_pcm_dmaengine regmap_spi leds_gpio led_class snd_pcm snd_timer spi_bcm2708 snd i2c_bcm2708 vfat fat
[56184.241737] CPU: 0 PID: 33 Comm: VCHIQ-0 Not tainted 3.14.6 #1
[56184.241849] task: d3598c00 ti: d35ee000 task.ti: d35ee000
[56184.241955] PC is at vchiu_queue_is_empty+0xc/0x24
[56184.242061] LR is at vchi_msg_dequeue+0x3c/0xa8
[56184.242149] pc : [<c0247e20>]    lr : [<c02477c4>]    psr: 68000013
[56184.242149] sp : d35efe50  ip : d35efe60  fp : d35efe5c
[56184.242330] r10: 0000001f  r9 : c0523388  r8 : d35efe94
[56184.242420] r7 : 00000014  r6 : d35efe90  r5 : 00000000  r4 : 00000004
[56184.242537] r3 : d35efe90  r2 : 00000014  r1 : d35efe94  r0 : 00000004
[56184.242650] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[56184.242788] Control: 00c5387d  Table: 00958008  DAC: 00000015
[56184.242889] Process VCHIQ-0 (pid: 33, stack limit = 0xd35ee1b8)
[56184.242993] Stack: (0xd35efe50 to 0xd35f0000)
[56184.243082] fe40:                                     d35efe84 d35efe60 c02477c4 c0247e20
[56184.243214] fe60: d35efe90 d3733f40 00000000 00000000 00000000 d3551000 d35efec4 d35efe88
[56184.243358] fe80: bf09a7b8 c0247794 00000000 c004ae1c 88000013 d401d7a8 d35efebc d35efea8
[56184.243488] fea0: d37681c0 00000002 d401d7b8 d37681c0 00000002 d401d7b8 d35efee4 d35efec8
[56184.243637] fec0: c02475b4 bf09a774 c02474f0 4501f010 d401d7b8 00000020 d35eff64 d35efee8
[56184.243770] fee0: c02412d0 c02474fc c00424a8 c0042334 d35eff34 d35eff00 c00495dc c00424a0
[56184.243917] ff00: 00000000 00000000 00000000 00000014 d3551f88 d4004020 d4004194 e3feb7d8
[56184.244068] ff20: 00000010 d4004194 fffffef9 ffffffff 00000000 00000000 c0240448 00000000
[56184.244194] ff40: d35e5f80 c0523388 c0240448 00000000 00000000 00000000 d35effac d35eff68
[56184.244341] ff60: c003a7b8 c0240454 00000000 00000000 08000000 c0523388 00000000 d35eff7c
[56184.244467] ff80: d35eff7c 00000000 d35eff88 d35eff88 d35e5f80 c003a6f4 00000000 00000000
[56184.244614] ffa0: 00000000 d35effb0 c000e738 c003a700 00000000 00000000 00000000 00000000
[56184.244741] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[56184.244891] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00800000
[56184.245011] Backtrace: 
[56184.245096] [<c0247e14>] (vchiu_queue_is_empty) from [<c02477c4>] (vchi_msg_dequeue+0x3c/0xa8)
[56184.245273] [<c0247788>] (vchi_msg_dequeue) from [<bf09a7b8>] (audio_vchi_callback+0x50/0xb8 [snd_bcm2835])
[56184.245440]  r8:d3551000 r7:00000000 r6:00000000 r5:00000000 r4:d3733f40 r3:d35efe90
[56184.245650] [<bf09a768>] (audio_vchi_callback [snd_bcm2835]) from [<c02475b4>] (shim_callback+0xc4/0x114)
[56184.245810]  r6:d401d7b8 r5:00000002 r4:d37681c0
[56184.245943] [<c02474f0>] (shim_callback) from [<c02412d0>] (slot_handler_func+0xe88/0x1808)
[56184.246086]  r7:00000020 r6:d401d7b8 r5:4501f010 r4:c02474f0
[56184.246255] [<c0240448>] (slot_handler_func) from [<c003a7b8>] (kthread+0xc4/0xe0)
[56184.246377]  r10:00000000 r9:00000000 r8:00000000 r7:c0240448 r6:c0523388 r5:d35e5f80
[56184.246569]  r4:00000000
[56184.246649] [<c003a6f4>] (kthread) from [<c000e738>] (ret_from_fork+0x14/0x3c)
[56184.246776]  r7:00000000 r6:00000000 r5:c003a6f4 r4:d35e5f80
[56184.246924] Code: c0463c74 e1a0c00d e92dd800 e24cb004 (e5902004) 
[56184.247656] ---[ end trace b55216747c8950db ]---

This time I can say for sure that it happened on a device close/open cycle, which was triggered by an audio format change. TV show being broadcasted as AC3, commercials as MPEG Audio.
I will try to get a debug log again, but I guess it will just look the same
Maybe we should just add a NULL pointer check before calling queue_is_empty?

@julianscheel julianscheel reopened this Jun 27, 2014
@julianscheel
Copy link
Contributor Author

Realizing that this happened on the audio codec switch in the stream I did a short recording which contains such a switch and am playing it in an endless loop. This allowed me to reproduce the issue within an hour.
I'll try to add a NULL pointer check for vchi_handle[0] in the callback as a first step now.

@julianscheel
Copy link
Contributor Author

Ignoring the callback if vchi_handle[0] is NULL does avoid the Ooops, but the application which was accessing the device at that point is stalled then. A hard kill works though and the VideoCore stays usable afterwards.
So the question is: What shall have been done by the callback so that the device shutdown is working properly?

@popcornmix
Copy link
Collaborator

I think the (updating samples consumed) callback is purely informative and won't cause a lockup if discarded.
However if vchi_handle[0] is null and you discard the response to a message (like close) then ALSA will hang.
Does the log still look the same as earlier? I'll try to study the log and code to work out exactly what the bad sequence is.
An arm binary that when run produces a panic (ideally quickly) may be useful to catch additional logging, or to test potential fixes.

@julianscheel
Copy link
Contributor Author

I'm trying to make up a little testtool to trigger it.
I have full debug logging enabled again now, so I will let you know as soon as it happens again.

@popcornmix
Copy link
Collaborator

Can you enable AUDIO_DEBUG_ENABLE and AUDIO_VERBOSE_DEBUG_ENABLE in bcm2835.h ?
I'd like to see some logging from bcm2835-pcm.c

@julianscheel
Copy link
Contributor Author

Sure. Just building a kernel and starting debug again

So far I was unable to create a standalone test-case. Only am able to trigger the test with playing back that video file. And it happens once every few hours... I'll post a new log as soon as I have one.

@julianscheel
Copy link
Contributor Author

This time it was fast :) You can see the log here:
http://pastebin.com/K1P0HnW4

@popcornmix
Copy link
Collaborator

Do you have and easy way to reproduce this?

A while back when investigating a previous panic I did feel we needed a higher level mutex to protect the opens and closes. It turned out not to fix the panic I was investigating and it got stashed.

Your panic does sound like a similar issue, so I've just dug it out and compile tested it - it could be worth a try:
http://pastebin.com/rNwJeYQt

@julianscheel
Copy link
Contributor Author

I've just running a test with it. Looping VLC with my test-snippet usually made the crash happen after some minutes yesterday. Right now it runs for almost 2 hours without a crash already. So this looks promising. If it's still running in the evening I would assume this fixes the issue.

@julianscheel
Copy link
Contributor Author

Unfortunately the error just occured again:

[11626.749402] bcm2835_audio_write:160  .. IN
[11626.749431] bcm2835_audio_write:176  .. OUT 0
[11626.749466] my_wq_function:94  .. IN d37f7900:2
[11626.749477] bcm2835_audio_write_worker:793  .. IN
[11626.749486] bcm2835_audio_write_worker:795  Writing 6144 bytes from c099a074
[11626.749587] bcm2835_audio_write_worker:862  .. OUT
[11626.749599] my_wq_function:111  .. OUT 0
[11626.751305] snd_bcm2835_pcm_pointer:432  .. IN
[11626.751331] snd_bcm2835_pcm_pointer:437 pcm_pointer... (0) hwptr=1758208 appl=1816692 pos=54272
[11626.751339] snd_bcm2835_pcm_pointer:439  .. OUT
[11626.753677] audio_vchi_callback:206  .. IN instance=c080d100, param=c080d100, reason=1, handle=  (null)
[11626.753707] audio_vchi_callback:224  .. instance=c080d100, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=1936
[11626.753717] bcm2835_playback_fifo_irq:70  .. IN
[11626.753727] bcm2835_playback_fifo_irq:73 alsa_stream=d37f7900 substream=d36f1a00
[11626.753742] bcm2835_playback_fifo_irq:93 updating pos cur: 54272 + 1936 max:65536 period_bytes:61440, hw_ptr: 1758208 new_period:0
[11626.753751] bcm2835_playback_fifo_irq:105  .. OUT
[11626.753758] audio_vchi_callback:235  .. OUT
[11626.755273] snd_bcm2835_pcm_pointer:432  .. IN
[11626.755299] snd_bcm2835_pcm_pointer:437 pcm_pointer... (0) hwptr=1758208 appl=1822836 pos=56208
[11626.755307] snd_bcm2835_pcm_pointer:439  .. OUT
[11626.755316] bcm2835_audio_write:160  .. IN
[11626.755342] bcm2835_audio_write:176  .. OUT 0
[11626.755378] my_wq_function:94  .. IN d37f7900:2
[11626.755388] bcm2835_audio_write_worker:793  .. IN
[11626.755398] bcm2835_audio_write_worker:795  Writing 6144 bytes from c099b874
[11626.755497] bcm2835_audio_write_worker:862  .. OUT
[11626.755510] my_wq_function:111  .. OUT 0
[11626.755597] snd_bcm2835_pcm_pointer:432  .. IN
[11626.755613] snd_bcm2835_pcm_pointer:437 pcm_pointer... (0) hwptr=1760144 appl=1822836 pos=56208
[11626.755620] snd_bcm2835_pcm_pointer:439  .. OUT
[11626.763336] audio_vchi_callback:206  .. IN instance=c080d100, param=c080d100, reason=1, handle=  (null)
[11626.763361] audio_vchi_callback:224  .. instance=c080d100, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=1856
[11626.763370] bcm2835_playback_fifo_irq:70  .. IN
[11626.763380] bcm2835_playback_fifo_irq:73 alsa_stream=d37f7900 substream=d36f1a00
[11626.763394] bcm2835_playback_fifo_irq:93 updating pos cur: 56208 + 1856 max:65536 period_bytes:61440, hw_ptr: 1760144 new_period:0
[11626.763402] bcm2835_playback_fifo_irq:105  .. OUT
[11626.763409] audio_vchi_callback:235  .. OUT
[11626.773472] audio_vchi_callback:206  .. IN instance=c080d100, param=c080d100, reason=1, handle=  (null)
[11626.773514] audio_vchi_callback:224  .. instance=c080d100, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=1952
[11626.773525] bcm2835_playback_fifo_irq:70  .. IN
[11626.773537] bcm2835_playback_fifo_irq:73 alsa_stream=d37f7900 substream=d36f1a00
[11626.773557] bcm2835_playback_fifo_irq:93 updating pos cur: 58064 + 1952 max:65536 period_bytes:61440, hw_ptr: 1760144 new_period:0
[11626.773567] bcm2835_playback_fifo_irq:105  .. OUT
[11626.773574] audio_vchi_callback:235  .. OUT
[11626.784187] audio_vchi_callback:206  .. IN instance=c080d100, param=c080d100, reason=1, handle=  (null)
[11626.784221] audio_vchi_callback:224  .. instance=c080d100, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=2000
[11626.784231] bcm2835_playback_fifo_irq:70  .. IN
[11626.784241] bcm2835_playback_fifo_irq:73 alsa_stream=d37f7900 substream=d36f1a00
[11626.784255] bcm2835_playback_fifo_irq:93 updating pos cur: 60016 + 2000 max:65536 period_bytes:61440, hw_ptr: 1760144 new_period:1
[11626.784266] snd_bcm2835_pcm_pointer:432  .. IN
[11626.784277] snd_bcm2835_pcm_pointer:437 pcm_pointer... (0) hwptr=1760144 appl=1822836 pos=62016
[11626.784285] snd_bcm2835_pcm_pointer:439  .. OUT
[11626.784313] bcm2835_playback_fifo_irq:105  .. OUT
[11626.784322] audio_vchi_callback:235  .. OUT
[11626.784399] snd_bcm2835_pcm_pointer:432  .. IN
[11626.784413] snd_bcm2835_pcm_pointer:437 pcm_pointer... (0) hwptr=1765952 appl=1822836 pos=62016
[11626.784421] snd_bcm2835_pcm_pointer:439  .. OUT
[11626.793360] audio_vchi_callback:206  .. IN instance=c080d100, param=c080d100, reason=1, handle=  (null)
[11626.793392] audio_vchi_callback:224  .. instance=c080d100, m.type=VC_AUDIO_MSG_TYPE_COMPLETE, complete=1808
[11626.793403] bcm2835_playback_fifo_irq:70  .. IN
[11626.793413] bcm2835_playback_fifo_irq:73 alsa_stream=d37f7900 substream=d36f1a00
[11626.793430] bcm2835_playback_fifo_irq:93 updating pos cur: 62016 + 1808 max:65536 period_bytes:61440, hw_ptr: 1765952 new_period:0
[11626.793439] bcm2835_playback_fifo_irq:105  .. OUT
[11626.793446] audio_vchi_callback:235  .. OUT
[11626.794674] snd_bcm2835_pcm_trigger:378  .. IN
[11626.794699] snd_bcm2835_pcm_trigger:402 bcm2835_AUDIO_TRIGGER_STOP running=1 draining=0
[11626.794707] snd_bcm2835_pcm_trigger:407 DROPPING
[11626.794715] bcm2835_audio_stop:138  .. IN
[11626.794742] bcm2835_audio_stop:152  .. OUT 0
[11626.794752] snd_bcm2835_pcm_trigger:421  .. OUT
[11626.794914] my_wq_function:94  .. IN d37f7900:0
[11626.794930] bcm2835_audio_stop_worker:688  .. IN
[11626.795038] bcm2835_audio_stop_worker:718  .. OUT
[11626.795054] my_wq_function:111  .. OUT 0
[11626.795200] snd_bcm2835_pcm_hw_free:291  .. IN
[11626.795254] snd_bcm2835_pcm_hw_free:291  .. IN
[11626.795266] snd_bcm2835_playback_close:217  .. IN
[11626.795274] snd_bcm2835_playback_close:227 Alsa close
[11626.795282] bcm2835_audio_close:728  .. IN
[11626.796633] audio_vchi_callback:206  .. IN instance=c080d100, param=c080d100, reason=1, handle=  (null)
[11626.796661] audio_vchi_callback:217  .. instance=c080d100, m.type=VC_AUDIO_MSG_TYPE_RESULT, success=0
[11626.796720] audio_vchi_callback:235  .. OUT
[11626.796757] vc_vchi_audio_deinit:315  .. IN
[11626.796768] vc_vchi_audio_deinit:323  .. about to lock (1)
[11626.796779] vc_vchi_audio_deinit:333  .. 0:closing c080d480
[11626.796935] audio_vchi_callback:206  .. IN instance=c080d100, param=c080d100, reason=8, handle=  (null)
[11626.796975] vc_vchi_audio_deinit:348  .. OUT
[11626.796985] bcm2835_audio_close:781  .. OUT
[11626.796994] snd_bcm2835_playback_close:258  .. OUT
[11626.797008] snd_bcm2835_playback_free:58 Freeing up alsa stream here ..
[11626.802290] snd_bcm2835_playback_open_generic:120  .. IN (0)
[11626.802316] snd_bcm2835_playback_open_generic:127 Alsa open (0)
[11626.802327] bcm2835_audio_open:420  .. IN
[11626.802981] bcm2835_audio_open_connection:360  .. IN
[11626.803000] bcm2835_audio_open_connection:362 bcm2835_audio_open_connection: start
[11626.803010] vc_vchi_audio_init:247 vc_vchi_audio_init: start
[11626.803024] vc_vchi_audio_init:280 vc_vchi_audio_init: about to open 0
[11626.803315] audio_vchi_callback:206  .. IN instance=c0857300, param=c0857300, reason=1, handle=  (null)
[11626.803351] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[11626.806310] pgd = c0004000
[11626.809065] [00000008] *pgd=00000000
[11626.811808] Internal error: Oops: 17 [#1] PREEMPT ARM
[11626.814451] Modules linked in: snd_soc_pcm1796 snd_soc_bcm2708_i2s snd_bcm2835 regmap_mmio bcm2708_wdog snd_soc_core snd_compress regmap_i2c snd_pcm_dmaengine regmap_spi leds_gpio led_class snd_pcm spi_bcm2708 snd_timer snd i2c_bcm2708 vfat fat
[11626.820074] CPU: 0 PID: 33 Comm: VCHIQ-0 Not tainted 3.14.6 #1
[11626.822789] task: d3598c00 ti: d35ee000 task.ti: d35ee000
[11626.825536] PC is at vchiu_queue_is_empty+0xc/0x24
[11626.828315] LR is at vchi_msg_dequeue+0x3c/0xa8
[11626.831102] pc : [<c0247e20>]    lr : [<c02477c4>]    psr: 60000013
[11626.831102] sp : d35efe48  ip : d35efe58  fp : d35efe54
[11626.836652] r10: 0000001f  r9 : c0523388  r8 : d35efe94
[11626.839446] r7 : 00000014  r6 : d35efe90  r5 : 00000000  r4 : 00000004
[11626.842289] r3 : d35efe90  r2 : 00000014  r1 : d35efe94  r0 : 00000004
[11626.845165] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[11626.848106] Control: 00c5387d  Table: 0081c008  DAC: 00000015
[11626.851065] Process VCHIQ-0 (pid: 33, stack limit = 0xd35ee1b8)
[11626.854086] Stack: (0xd35efe48 to 0xd35f0000)
[11626.857098] fe40:                   d35efe7c d35efe58 c02477c4 c0247e20 d35efe90 c0857300
[11626.860169] fe60: 00000000 00000000 00000000 d3552400 d35efec4 d35efe80 bf09afc4 c0247794
[11626.863329] fe80: 00000000 00000001 00000000 c004ae1c 80000013 d4017178 d35ee000 00000000
[11626.866518] fea0: c080d100 00000002 c080d100 00000002 d4017188 00000000 d35efee4 d35efec8
[11626.869738] fec0: c02475b4 bf09af5c c02474f0 4501f009 d4017188 00000020 d35eff64 d35efee8
[11626.872981] fee0: c02412d0 c02474fc c00424a8 c0042334 d35eff34 d35eff00 c00495dc c00424a0
[11626.876286] ff00: d35ee000 00000000 00000000 00000014 d3532388 d4004020 d4004194 266bd1a8
[11626.879642] ff20: 00000009 d4004194 fffffef9 ffffffff 00000000 00000000 c0240448 00000000
[11626.883018] ff40: d35e5f80 c0523388 c0240448 00000000 00000000 00000000 d35effac d35eff68
[11626.886408] ff60: c003a7b8 c0240454 bed11adc 00000000 c000e824 c0523388 00000000 d35eff7c
[11626.889838] ff80: d35eff7c 00000000 d35eff88 d35eff88 d35e5f80 c003a6f4 00000000 00000000
[11626.893317] ffa0: 00000000 d35effb0 c000e738 c003a700 00000000 00000000 00000000 00000000
[11626.896815] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[11626.900340] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00800000
[11626.903873] Backtrace: 
[11626.907455] [<c0247e14>] (vchiu_queue_is_empty) from [<c02477c4>] (vchi_msg_dequeue+0x3c/0xa8)
[11626.911120] [<c0247788>] (vchi_msg_dequeue) from [<bf09afc4>] (audio_vchi_callback+0x74/0x170 [snd_bcm2835])
[11626.914797]  r8:d3552400 r7:00000000 r6:00000000 r5:00000000 r4:c0857300 r3:d35efe90
[11626.918628] [<bf09af50>] (audio_vchi_callback [snd_bcm2835]) from [<c02475b4>] (shim_callback+0xc4/0x114)
[11626.922440]  r7:00000000 r6:d4017188 r5:00000002 r4:c080d100
[11626.926360] [<c02474f0>] (shim_callback) from [<c02412d0>] (slot_handler_func+0xe88/0x1808)
[11626.930287]  r7:00000020 r6:d4017188 r5:4501f009 r4:c02474f0
[11626.934359] [<c0240448>] (slot_handler_func) from [<c003a7b8>] (kthread+0xc4/0xe0)
[11626.938397]  r10:00000000 r9:00000000 r8:00000000 r7:c0240448 r6:c0523388 r5:d35e5f80
[11626.942518]  r4:00000000
[11626.946561] [<c003a6f4>] (kthread) from [<c000e738>] (ret_from_fork+0x14/0x3c)
[11626.950668]  r7:00000000 r6:00000000 r5:c003a6f4 r4:d35e5f80
[11626.954868] Code: c0463c74 e1a0c00d e92dd800 e24cb004 (e5902004) 
[11626.959382] ---[ end trace ca2dd7acdd6b9029 ]---
[11626.972559] vc_vchi_audio_init:283 vc_vchi_audio_init: opened 0: c080d100=0
[11626.988259] vc_vchi_audio_init:295 vc_vchi_audio_init: okay
[11627.002077] bcm2835_audio_open_connection:407  success !
[11627.012925] bcm2835_audio_open_connection:409  .. OUT
[11627.022028] bcm2835_audio_open:430  instance (c0857300)
[11627.031057] bcm2835_audio_open:460  .. OUT
[11627.040213] snd_bcm2835_playback_open_generic:197  .. OUT =0
[11627.051254] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=1, arg=d36f2000 (0) ret=0
[11627.061068] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=1, arg=d36f2000 (0) ret=0
[11627.082133] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.086893] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.091659] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (0) ret=0
[11627.096230] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.100778] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.105461] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (0) ret=0
[11627.109978] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.118302] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.124337] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (0) ret=0
[11627.129947] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.134514] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.139129] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (0) ret=0
[11627.143565] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.152320] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.158077] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (0) ret=0
[11627.162246] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.166249] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.170320] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (0) ret=0
[11627.174167] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.178146] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (0) ret=0
[11627.181961] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.189108] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.195384] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (0) ret=0
[11627.199888] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.203656] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.207361] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.211062] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=4, arg=c0891800 (1) ret=0
[11627.214563] snd_bcm2835_pcm_hw_params:271  .. IN
[11627.217973] snd_bcm2835_pcm_hw_params:283  .. OUT
[11627.221427] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=2, arg=c0863e64 (0) ret=0
[11627.228020] snd_bcm2835_pcm_lib_ioctl:450  .. substream=d36f1a00, cmd=2, arg=c0863e64 (1) ret=0
[11627.234505] snd_bcm2835_pcm_prepare:304  .. IN
[11627.238274] bcm2835_audio_set_params:566  .. IN
[11627.241652] bcm2835_audio_set_params:570  Setting ALSA channels(2), samplerate(48000), bits-per-sample(16)
[11627.245059] bcm2835_audio_set_ctls_chan:471  .. IN
[11627.248394] bcm2835_audio_set_ctls_chan:474  Setting ALSA dest(0), volume(0)

Further ideas?
I could provide you a VLC build along with the file if you want to be able to test it yourself?

@popcornmix
Copy link
Collaborator

Yes, pre-built vlc with sample file would be useful.

@julianscheel
Copy link
Contributor Author

@popcornmix providing a proper raspbian build is a bit more painful than I had expected as my builds are requiring a newer glibc than raspbian has. But I'm at it. Hope to be able to provide something later today.

@popcornmix
Copy link
Collaborator

I do have gcc 4.8 installed which includes newer glibc, so it may be okay on my install.

@julianscheel
Copy link
Contributor Author

Ah great, in this case you can try the following build:
http://jusst.de/files/vlc.tar.xz
And this sample video: http://jusst.de/files/audio-change.ts

Just untar in your home folder, then:

cd vlc/
source ./env
vlc -vvv -L audio-change.ts

You might need to install some dependencies like libdvbpsi though. If it fails just post the error log, I'll take a look then.

If it works, just let it run. After a while you should see that it stalls and dmesg will show the Ooops.

@popcornmix
Copy link
Collaborator

Sorry - I've been at this for ages, but can't get new glibc and old libav to be installed together.

I started with clean wheezy install. I did "apt-get install vlc" hoping to get most of the dependencies.
I then ran your VLC, and got the expected glibc issue.

I use this scheme to install gcc 4.8
http://somewideopenspace.wordpress.com/2014/02/28/gcc-4-8-on-raspberry-pi-wheezy/

which solved the glibc issue, but still has missing libs:
http://pastebin.com/KRZcnbte

Now I can't install libs:

sudo apt-get install libdvbpsi-dev
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Package libdvbpsi-dev is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'libdvbpsi-dev' has no installation candidate

I can install the libs from jessie, but then it still fails to link as my libs are newer than expected.

@julianscheel
Copy link
Contributor Author

I've just uploaded a new tarball (same url), in which I included libdvbpsi, libav and some more. I think it should contain all critical libs that were missing on your system.
Hopefully this works.

@popcornmix
Copy link
Collaborator

Looks like just:
[000128f8] core libvlc warning: cannot load module `/home/pi/alsaoops/vlc/usr/lib/vlc/plugins/access/liblinsys_sdi_plugin.so' (libpng16.so.16: cannot open shared object file: No such file or directory)

I have this installed:
/lib/arm-linux-gnueabihf/libpng12.so.0.49.0

@julianscheel
Copy link
Contributor Author

That plugin shouldn't be needed for playing the sample. Can you paste the full log?

@julianscheel
Copy link
Contributor Author

I've setup a raspbian with gcc 4.8 following the blog post you linked to now.
I had to add my libasound build. So using the latest tarball http://jusst.de/files/vlc.tar.xz you should be able to get it running.
vlc -L -vvv --mmal-adjust-refreshrate audio-change.ts

@julianscheel
Copy link
Contributor Author

Have you overclocked your Pi? Mine is running with 950 MHz, maybe a higher clock increases chances to hit it?
Most of the times I am able to hit it within 2 hours.

@popcornmix
Copy link
Collaborator

Another panic last night with more logging.
My suspicion is that a message from previous instance of audio service is sent from gpu just as close message comes through. This message should delay the close until it is delivered.

However I think the message is persisting to the next instance of audio service and the old message arrives while the new service is being created. instance->vchi_handle[0] is null when the callback occurs.

Doing an early return from the callback if instance->vchi_handle[0] is null will probably help.

But the real fix will be to vchiq. A colleague is investigating this.

@julianscheel
Copy link
Contributor Author

This matches my experience. But an early return if vchi_handle[0] is null seemed not to be sufficient. It avoided the panic, but the VLC process was still stalled.
Killing the process and starting a new one did work in that case though.

Having a fix in vchiq sounds proper :)

@julianscheel
Copy link
Contributor Author

Any update from the vchiq guy yet?

@popcornmix
Copy link
Collaborator

I have a (vchiq) patch that does a synchronous close, but it didn't fix the panic.

@julianscheel
Copy link
Contributor Author

Interesting. Is it still getting that late callback?

@popcornmix
Copy link
Collaborator

Still looks the same from my log. The callback occurs before the open has returned, and so the vchiq handle is null.
Need to talk to vchiq guy (but didn't see him yesterday or today)

@julianscheel
Copy link
Contributor Author

Have you had a chance to talk to him these days?

@popcornmix
Copy link
Collaborator

Yes, spoken to him. He's aware of an issue still present that could explain this behaviour and will fix it, but he's got a more urgent job on at the moment. Hopefully next week.

@julianscheel
Copy link
Contributor Author

Sounds promising. Looking forward to an update :)

@popcornmix
Copy link
Collaborator

You can try:
http://pastebin.com/FyeKFmhU

I'm currently testing it (with my earlier alsa mutex patch) and it's been running for nearly 2 hours.
I need to kill it to do something else, but I'll test it overnight.

@julianscheel
Copy link
Contributor Author

Test is running for roughly 4 hours now. Looks promising!

@julianscheel
Copy link
Contributor Author

Test ran until my remote connection dropped at 2am. I think we can assume the patch is working. Did your tests succeed as well?

@popcornmix
Copy link
Collaborator

Still running now since last night. Looks good to me.

popcornmix added a commit to raspberrypi/firmware that referenced this issue Jul 15, 2014
See: raspberrypi/linux#621

kernel: hid: Reduce default mouse polling interval to 60Hz
See: http://www.raspberrypi.org/forums/viewtopic.php?t=75365&p=556534#p540272

firmware: gpioman: Add support for configuring gpio from external blob file

firmware: Backport latest VCSM

firmware: arm_loader: Add extended command line parameters before filling in device tree
See: raspberrypi/linux#627

firmware: bplus: Support config option max_usb_current
See: http://www.raspberrypi.org/forums/viewtopic.php?f=63&t=81736&start=50#p577877

firmware: bplus: Invert left/right channels for pwm audio on bplus
popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Jul 15, 2014
See: raspberrypi/linux#621

kernel: hid: Reduce default mouse polling interval to 60Hz
See: http://www.raspberrypi.org/forums/viewtopic.php?t=75365&p=556534#p540272

firmware: gpioman: Add support for configuring gpio from external blob file

firmware: Backport latest VCSM

firmware: arm_loader: Add extended command line parameters before filling in device tree
See: raspberrypi/linux#627

firmware: bplus: Support config option max_usb_current
See: http://www.raspberrypi.org/forums/viewtopic.php?f=63&t=81736&start=50#p577877

firmware: bplus: Invert left/right channels for pwm audio on bplus
@popcornmix
Copy link
Collaborator

This is in latest firmware update. Please close when you are happy it's resolved.

@afalkenhahn
Copy link

In my program I often get this error on the Raspberry Pi 2. I've updated to the latest kernel (Linux raspberrypi 4.4.14-v7+ #896 SMP Sat Jul 2 15:09:43 BST 2016 armv7l GNU/Linux) but it didn't help. I still get the error and it usually renders the system unusable. Even rebooting is often not possible any more after the error has occurred making this quite a nightmare to debug.

Here is the full log, the NULL pointer reference is at 13:29:32:

Jul 10 13:26:37 raspberrypi kernel: [   58.723345] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:26:37 raspberrypi kernel: [   58.723388] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:26:47 raspberrypi kernel: [   68.566732] bcm2835_audio_set_params:583  Alsa controls not supported
Jul 10 13:26:47 raspberrypi kernel: [   68.566756] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:26:47 raspberrypi kernel: [   68.568377] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:04 raspberrypi kernel: [   84.908452] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:05 raspberrypi kernel: [   85.988429] bcm2835_audio_set_params:583  Alsa controls not supported
Jul 10 13:27:05 raspberrypi kernel: [   85.988455] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:05 raspberrypi kernel: [   85.988490] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:08 raspberrypi kernel: [   88.852333] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:21 raspberrypi kernel: [  102.173166] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:23 raspberrypi kernel: [  104.296299] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:23 raspberrypi kernel: [  104.296777] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:27:24 raspberrypi kernel: [  105.071317] bcm2835_audio_set_params:583  Alsa controls not supported
Jul 10 13:27:24 raspberrypi kernel: [  105.071345] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:24 raspberrypi kernel: [  105.071782] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:24 raspberrypi kernel: [  105.071813] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:27:24 raspberrypi kernel: [  105.226809] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:24 raspberrypi kernel: [  105.227236] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:24 raspberrypi kernel: [  105.557247] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:27:24 raspberrypi kernel: [  105.557448] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:24 raspberrypi kernel: [  105.557478] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:24 raspberrypi kernel: [  105.557501] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:27:24 raspberrypi kernel: [  105.701905] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:25 raspberrypi kernel: [  105.845243] bcm2835_audio_set_params:583  Alsa controls not supported
Jul 10 13:27:25 raspberrypi kernel: [  105.845268] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:25 raspberrypi kernel: [  105.845339] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:25 raspberrypi kernel: [  105.845369] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:27:25 raspberrypi kernel: [  105.845462] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:25 raspberrypi kernel: [  105.845491] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:25 raspberrypi kernel: [  105.845514] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:27:25 raspberrypi kernel: [  105.954211] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:25 raspberrypi kernel: [  105.954827] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:27:25 raspberrypi kernel: [  106.058414] bcm2835_audio_set_params:583  Alsa controls not supported
Jul 10 13:27:25 raspberrypi kernel: [  106.058440] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:25 raspberrypi kernel: [  106.058474] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:25 raspberrypi kernel: [  106.058498] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:27:25 raspberrypi kernel: [  106.426792] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:25 raspberrypi kernel: [  106.426838] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:27:25 raspberrypi kernel: [  106.571810] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:42 raspberrypi kernel: [  123.087638] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:27:43 raspberrypi kernel: [  124.494882] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:43 raspberrypi kernel: [  124.494935] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:43 raspberrypi kernel: [  124.494962] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:27:43 raspberrypi kernel: [  124.651874] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:43 raspberrypi kernel: [  124.651913] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:43 raspberrypi kernel: [  124.794293] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:44 raspberrypi kernel: [  124.954301] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:44 raspberrypi kernel: [  124.954904] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:44 raspberrypi kernel: [  124.954945] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:27:44 raspberrypi kernel: [  125.104276] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:27:44 raspberrypi kernel: [  125.261245] bcm2835_audio_set_params:583  Alsa controls not supported
Jul 10 13:27:44 raspberrypi kernel: [  125.261272] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:44 raspberrypi kernel: [  125.390072] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:45 raspberrypi kernel: [  125.992627] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:45 raspberrypi kernel: [  125.992673] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:27:45 raspberrypi kernel: [  126.582654] bcm2835_audio_set_params:583  Alsa controls not supported
Jul 10 13:27:45 raspberrypi kernel: [  126.582682] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:27:58 raspberrypi kernel: [  139.324682] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:27:58 raspberrypi kernel: [  139.534663] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:28:11 raspberrypi kernel: [  152.097032] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:28:17 raspberrypi kernel: [  158.224712] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:28:17 raspberrypi kernel: [  158.224755] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:28:19 raspberrypi kernel: [  159.963772] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:28:19 raspberrypi kernel: [  159.963817] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:28:19 raspberrypi kernel: [  159.963843] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:28:19 raspberrypi kernel: [  159.964184] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:28:19 raspberrypi kernel: [  160.124924] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:28:19 raspberrypi kernel: [  160.279402] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:28:19 raspberrypi kernel: [  160.279726] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:28:19 raspberrypi kernel: [  160.279758] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:28:19 raspberrypi kernel: [  160.279781] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:28:19 raspberrypi kernel: [  160.426672] bcm2835_audio_set_params:583  Alsa controls not supported
Jul 10 13:28:19 raspberrypi kernel: [  160.426698] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:28:19 raspberrypi kernel: [  160.426801] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:28:19 raspberrypi kernel: [  160.426835] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:28:19 raspberrypi kernel: [  160.427258] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:28:19 raspberrypi kernel: [  160.574714] bcm2835_audio_set_params:583  Alsa controls not supported
Jul 10 13:28:19 raspberrypi kernel: [  160.574739] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:28:19 raspberrypi kernel: [  160.574772] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:28:19 raspberrypi kernel: [  160.725709] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:28:20 raspberrypi kernel: [  160.835947] bcm2835_audio_set_params:583  Alsa controls not supported
Jul 10 13:28:20 raspberrypi kernel: [  160.835971] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:28:20 raspberrypi kernel: [  160.836439] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:28:20 raspberrypi kernel: [  160.926598] bcm2835_audio_set_params:583  Alsa controls not supported
Jul 10 13:28:20 raspberrypi kernel: [  160.926622] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:28:20 raspberrypi kernel: [  160.926659] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:28:20 raspberrypi kernel: [  160.926815] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:28:20 raspberrypi kernel: [  161.454756] bcm2835_audio_set_params:583  Alsa controls not supported
Jul 10 13:28:20 raspberrypi kernel: [  161.454782] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:28:20 raspberrypi kernel: [  161.749707] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:28:20 raspberrypi kernel: [  161.749752] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:28:59 raspberrypi kernel: [  200.001649] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:29:19 raspberrypi kernel: [  219.861713] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:29:20 raspberrypi kernel: [  221.076686] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:29:25 raspberrypi kernel: [  226.166698] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:29:26 raspberrypi kernel: [  227.191703] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:29:32 raspberrypi kernel: [  233.249388] Unable to handle kernel NULL pointer dereference at virtual address 00000000
Jul 10 13:29:32 raspberrypi kernel: [  233.255890] bcm2835_audio_set_params:583  Alsa controls not supported
Jul 10 13:29:32 raspberrypi kernel: [  233.255897] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:29:32 raspberrypi kernel: [  233.255924] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:29:32 raspberrypi kernel: [  233.256710] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:29:32 raspberrypi kernel: [  233.257505] pgd = 80004000
Jul 10 13:29:32 raspberrypi kernel: [  233.260208] [00000000] *pgd=00000000
Jul 10 13:29:32 raspberrypi kernel: [  233.261711] bcm2835_audio_set_params:583  Alsa controls not supported
Jul 10 13:29:32 raspberrypi kernel: [  233.261716] snd_bcm2835_pcm_prepare:329  error setting hw params
Jul 10 13:29:32 raspberrypi kernel: [  233.261739] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 0
Jul 10 13:29:32 raspberrypi kernel: [  233.263612] 
Jul 10 13:29:32 raspberrypi kernel: [  233.263806] Internal error: Oops: 5 [#1] SMP ARM
Jul 10 13:29:32 raspberrypi kernel: [  233.265888] bcm2835_audio_set_ctls:556 Couldn't set the controls for stream 1
Jul 10 13:29:32 raspberrypi kernel: [  233.275537] Modules linked in: arc4 ecb md4 md5 hmac nls_utf8 cifs snd_bcm2835 snd_pcm snd_seq snd_seq_device snd_timer snd bcm2835_gpiomem bcm2835_wdt evdev joydev uio_pdrv_genirq uio
Jul 10 13:29:32 raspberrypi kernel: [  233.292104] CPU: 1 PID: 70 Comm: VCHIQ-0 Not tainted 4.4.14-v7+ #896
Jul 10 13:29:32 raspberrypi kernel: [  233.298450] Hardware name: BCM2709
Jul 10 13:29:32 raspberrypi kernel: [  233.301849] task: b9081140 ti: b9068000 task.ti: b9068000
Jul 10 13:29:32 raspberrypi kernel: [  233.307256] PC is at __wake_up_common+0x30/0x90
Jul 10 13:29:32 raspberrypi kernel: [  233.311787] LR is at __wake_up_locked+0x24/0x2c
Jul 10 13:29:32 raspberrypi kernel: [  233.316319] pc : [<80063238>]    lr : [<800632bc>]    psr: a0050093
Jul 10 13:29:32 raspberrypi kernel: [  233.316319] sp : b9069e18  ip : b9069e48  fp : b9069e44
Jul 10 13:29:32 raspberrypi kernel: [  233.327788] r10: 00000001  r9 : 00000003  r8 : 00000000
Jul 10 13:29:32 raspberrypi kernel: [  233.327794] r7 : 00000000  r6 : b7612118  r5 : b7612110  r4 : b7612114
Jul 10 13:29:32 raspberrypi kernel: [  233.327799] r3 : 00000000  r2 : 00000001  r1 : 00000000  r0 : b7612114
Jul 10 13:29:32 raspberrypi kernel: [  233.327808] Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Jul 10 13:29:32 raspberrypi kernel: [  233.327833] Control: 10c5387d  Table: 389d006a  DAC: 00000055
Jul 10 13:29:32 raspberrypi kernel: [  233.327840] Process VCHIQ-0 (pid: 70, stack limit = 0xb9068210)
Jul 10 13:29:32 raspberrypi kernel: [  233.327848] Stack: (0xb9069e18 to 0xb906a000)
Jul 10 13:29:32 raspberrypi kernel: [  233.327856] 9e00:                                                       00000000 b7612114
Jul 10 13:29:32 raspberrypi kernel: [  233.327868] 9e20: b7612110 60050013 00000000 b9341200 bac91c90 8095ea64 b9069e5c b9069e48
Jul 10 13:29:32 raspberrypi kernel: [  233.327880] 9e40: 800632bc 80063214 00000000 b7612110 b9069e7c b9069e60 80063b60 800632a4
Jul 10 13:29:32 raspberrypi kernel: [  233.327890] 9e60: 00000000 b7612100 00000000 bac91c90 b9069eb4 b9069e80 7f07be9c 80063b28
Jul 10 13:29:32 raspberrypi kernel: [  233.327902] 9e80: 00000000 b9341200 00000014 00000000 00000000 3df6e9bc 3dfaef04 00000000
Jul 10 13:29:32 raspberrypi kernel: [  233.327913] 9ea0: b75e6900 00000002 b9069ed4 b9069eb8 803d4154 7f07bde8 00000020 803d4080
Jul 10 13:29:32 raspberrypi kernel: [  233.327924] 9ec0: 45043003 808a7f1c b9069f5c b9069ed8 803cc800 803d408c 00000000 80875408
Jul 10 13:29:32 raspberrypi kernel: [  233.327935] 9ee0: 808618c4 b9081424 b9069f44 b9069ef8 805b4d08 8004988c fff9fee9 b9911ccc
Jul 10 13:29:32 raspberrypi kernel: [  233.327947] 9f00: 805ed708 bac80194 bac80020 00000014 000c9cb0 00000003 00000014 bac80194
Jul 10 13:29:32 raspberrypi kernel: [  233.327958] 9f20: fff9fee9 ffffffff 00000000 00000000 00000000 00000000 b906b8c0 8095ea64
Jul 10 13:29:32 raspberrypi kernel: [  233.327970] 9f40: 803cb8f8 00000000 00000000 00000000 b9069fac b9069f60 80042678 803cb904
Jul 10 13:29:32 raspberrypi kernel: [  233.327981] 9f60: 39718000 00000000 00000000 8095ea64 00000000 00000000 b9069f78 b9069f78
Jul 10 13:29:32 raspberrypi kernel: [  233.327992] 9f80: 00000000 00000000 b9069f88 b9069f88 b906b8c0 8004258c 00000000 00000000
Jul 10 13:29:32 raspberrypi kernel: [  233.328002] 9fa0: 00000000 b9069fb0 8000fbc8 80042598 00000000 00000000 00000000 00000000
Jul 10 13:29:32 raspberrypi kernel: [  233.328013] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Jul 10 13:29:32 raspberrypi kernel: [  233.328023] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
Jul 10 13:29:32 raspberrypi kernel: [  233.328059] [<80063238>] (__wake_up_common) from [<800632bc>] (__wake_up_locked+0x24/0x2c)
Jul 10 13:29:32 raspberrypi kernel: [  233.328077] [<800632bc>] (__wake_up_locked) from [<80063b60>] (complete+0x44/0x54)
Jul 10 13:29:32 raspberrypi kernel: [  233.328118] [<80063b60>] (complete) from [<7f07be9c>] (audio_vchi_callback+0xc0/0x130 [snd_bcm2835])
Jul 10 13:29:32 raspberrypi kernel: [  233.328161] [<7f07be9c>] (audio_vchi_callback [snd_bcm2835]) from [<803d4154>] (shim_callback+0xd4/0x118)
Jul 10 13:29:32 raspberrypi kernel: [  233.328182] [<803d4154>] (shim_callback) from [<803cc800>] (slot_handler_func+0xf08/0x1858)
Jul 10 13:29:32 raspberrypi kernel: [  233.328199] [<803cc800>] (slot_handler_func) from [<80042678>] (kthread+0xec/0x104)
Jul 10 13:29:32 raspberrypi kernel: [  233.328219] [<80042678>] (kthread) from [<8000fbc8>] (ret_from_fork+0x14/0x2c)
Jul 10 13:29:32 raspberrypi kernel: [  233.328233] Code: e1a0a002 e1a08003 e1560001 e59b7004 (e5914000) 
Jul 10 13:29:32 raspberrypi kernel: [  233.328245] ---[ end trace b848eb8e41ea8852 ]--- 

Could this also be a bug in my code or this is definitely a bug in the kernel or ALSA driver? My code works fine on x86 Linuxes like Mint and Ubuntu but on Raspbian it crashes very often with the NULL pointer exception quoted above.

What can I do about this?

@popcornmix
Copy link
Collaborator

Can you provide a test program (ideally minimal) that provokes the error?

@popcornmix
Copy link
Collaborator

popcornmix commented Jul 11, 2016

You should create a new issue. No evidence that your issue is connected to the original issue.
And as this issue is closed, it is likely to get overlooked.

@afalkenhahn
Copy link

Added #1560

neuschaefer pushed a commit to neuschaefer/raspi-binary-firmware that referenced this issue Feb 27, 2017
See: raspberrypi/linux#621

kernel: hid: Reduce default mouse polling interval to 60Hz
See: http://www.raspberrypi.org/forums/viewtopic.php?t=75365&p=556534#p540272

firmware: gpioman: Add support for configuring gpio from external blob file

firmware: Backport latest VCSM

firmware: arm_loader: Add extended command line parameters before filling in device tree
See: raspberrypi/linux#627

firmware: bplus: Support config option max_usb_current
See: http://www.raspberrypi.org/forums/viewtopic.php?f=63&t=81736&start=50#p577877

firmware: bplus: Invert left/right channels for pwm audio on bplus
popcornmix pushed a commit that referenced this issue Jul 20, 2020
[ Upstream commit 2fb2799 ]

rmnet can have only two bridge interface.
One of them is a link interface and another one is added by
the master operation.
rmnet interface shouldn't allow adding additional
bridge interfaces by mater operation.
But, there is no code to deny additional interfaces.
So, interface leak occurs.

Test commands:
    ip link add dummy0 type dummy
    ip link add dummy1 type dummy
    ip link add dummy2 type dummy
    ip link add rmnet0 link dummy0 type rmnet mux_id 1
    ip link set dummy1 master rmnet0
    ip link set dummy2 master rmnet0
    ip link del rmnet0

In the above test command, the dummy0 was attached to rmnet as VND mode.
Then, dummy1 was attached to rmnet0 as BRIDGE mode.
At this point, dummy0 mode is switched from VND to BRIDGE automatically.
Then, dummy2 is attached to rmnet as BRIDGE mode.
At this point, rmnet0 should deny this operation.
But, rmnet0 doesn't deny this.
So that below splat occurs when the rmnet0 interface is deleted.

Splat looks like:
[  186.684787][    C2] WARNING: CPU: 2 PID: 1009 at net/core/dev.c:8992 rollback_registered_many+0x986/0xcf0
[  186.684788][    C2] Modules linked in: rmnet dummy openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_x
[  186.684805][    C2] CPU: 2 PID: 1009 Comm: ip Not tainted 5.8.0-rc1+ #621
[  186.684807][    C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  186.684808][    C2] RIP: 0010:rollback_registered_many+0x986/0xcf0
[  186.684811][    C2] Code: 41 8b 4e cc 45 31 c0 31 d2 4c 89 ee 48 89 df e8 e0 47 ff ff 85 c0 0f 84 cd fc ff ff 5
[  186.684812][    C2] RSP: 0018:ffff8880cd9472e0 EFLAGS: 00010287
[  186.684815][    C2] RAX: ffff8880cc56da58 RBX: ffff8880ab21c000 RCX: ffffffff9329d323
[  186.684816][    C2] RDX: 1ffffffff2be6410 RSI: 0000000000000008 RDI: ffffffff95f32080
[  186.684818][    C2] RBP: dffffc0000000000 R08: fffffbfff2be6411 R09: fffffbfff2be6411
[  186.684819][    C2] R10: ffffffff95f32087 R11: 0000000000000001 R12: ffff8880cd947480
[  186.684820][    C2] R13: ffff8880ab21c0b8 R14: ffff8880cd947400 R15: ffff8880cdf10640
[  186.684822][    C2] FS:  00007f00843890c0(0000) GS:ffff8880d4e00000(0000) knlGS:0000000000000000
[  186.684823][    C2] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  186.684825][    C2] CR2: 000055b8ab1077b8 CR3: 00000000ab612006 CR4: 00000000000606e0
[  186.684826][    C2] Call Trace:
[  186.684827][    C2]  ? lockdep_hardirqs_on_prepare+0x379/0x540
[  186.684829][    C2]  ? netif_set_real_num_tx_queues+0x780/0x780
[  186.684830][    C2]  ? rmnet_unregister_real_device+0x56/0x90 [rmnet]
[  186.684831][    C2]  ? __kasan_slab_free+0x126/0x150
[  186.684832][    C2]  ? kfree+0xdc/0x320
[  186.684834][    C2]  ? rmnet_unregister_real_device+0x56/0x90 [rmnet]
[  186.684835][    C2]  unregister_netdevice_many.part.135+0x13/0x1b0
[  186.684836][    C2]  rtnl_delete_link+0xbc/0x100
[ ... ]
[  238.440071][ T1009] unregister_netdevice: waiting for rmnet0 to become free. Usage count = 1

Fixes: 037f9cd ("net: rmnet: use upper/lower device infrastructure")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this issue Aug 12, 2020
[ Upstream commit 2fb2799 ]

rmnet can have only two bridge interface.
One of them is a link interface and another one is added by
the master operation.
rmnet interface shouldn't allow adding additional
bridge interfaces by mater operation.
But, there is no code to deny additional interfaces.
So, interface leak occurs.

Test commands:
    ip link add dummy0 type dummy
    ip link add dummy1 type dummy
    ip link add dummy2 type dummy
    ip link add rmnet0 link dummy0 type rmnet mux_id 1
    ip link set dummy1 master rmnet0
    ip link set dummy2 master rmnet0
    ip link del rmnet0

In the above test command, the dummy0 was attached to rmnet as VND mode.
Then, dummy1 was attached to rmnet0 as BRIDGE mode.
At this point, dummy0 mode is switched from VND to BRIDGE automatically.
Then, dummy2 is attached to rmnet as BRIDGE mode.
At this point, rmnet0 should deny this operation.
But, rmnet0 doesn't deny this.
So that below splat occurs when the rmnet0 interface is deleted.

Splat looks like:
[  186.684787][    C2] WARNING: CPU: 2 PID: 1009 at net/core/dev.c:8992 rollback_registered_many+0x986/0xcf0
[  186.684788][    C2] Modules linked in: rmnet dummy openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_x
[  186.684805][    C2] CPU: 2 PID: 1009 Comm: ip Not tainted 5.8.0-rc1+ #621
[  186.684807][    C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  186.684808][    C2] RIP: 0010:rollback_registered_many+0x986/0xcf0
[  186.684811][    C2] Code: 41 8b 4e cc 45 31 c0 31 d2 4c 89 ee 48 89 df e8 e0 47 ff ff 85 c0 0f 84 cd fc ff ff 5
[  186.684812][    C2] RSP: 0018:ffff8880cd9472e0 EFLAGS: 00010287
[  186.684815][    C2] RAX: ffff8880cc56da58 RBX: ffff8880ab21c000 RCX: ffffffff9329d323
[  186.684816][    C2] RDX: 1ffffffff2be6410 RSI: 0000000000000008 RDI: ffffffff95f32080
[  186.684818][    C2] RBP: dffffc0000000000 R08: fffffbfff2be6411 R09: fffffbfff2be6411
[  186.684819][    C2] R10: ffffffff95f32087 R11: 0000000000000001 R12: ffff8880cd947480
[  186.684820][    C2] R13: ffff8880ab21c0b8 R14: ffff8880cd947400 R15: ffff8880cdf10640
[  186.684822][    C2] FS:  00007f00843890c0(0000) GS:ffff8880d4e00000(0000) knlGS:0000000000000000
[  186.684823][    C2] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  186.684825][    C2] CR2: 000055b8ab1077b8 CR3: 00000000ab612006 CR4: 00000000000606e0
[  186.684826][    C2] Call Trace:
[  186.684827][    C2]  ? lockdep_hardirqs_on_prepare+0x379/0x540
[  186.684829][    C2]  ? netif_set_real_num_tx_queues+0x780/0x780
[  186.684830][    C2]  ? rmnet_unregister_real_device+0x56/0x90 [rmnet]
[  186.684831][    C2]  ? __kasan_slab_free+0x126/0x150
[  186.684832][    C2]  ? kfree+0xdc/0x320
[  186.684834][    C2]  ? rmnet_unregister_real_device+0x56/0x90 [rmnet]
[  186.684835][    C2]  unregister_netdevice_many.part.135+0x13/0x1b0
[  186.684836][    C2]  rtnl_delete_link+0xbc/0x100
[ ... ]
[  238.440071][ T1009] unregister_netdevice: waiting for rmnet0 to become free. Usage count = 1

Fixes: 037f9cd ("net: rmnet: use upper/lower device infrastructure")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Jan 27, 2022
…rors

commit 085a9f4 upstream.

Use down_read_nested() and down_write_nested() when taking the
ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in
the path to the PCI root bus as lock subclass parameter.

This fixes the following false-positive lockdep report when unplugging a
Lenovo X1C8 from a Lenovo 2nd gen TB3 dock:

  pcieport 0000:06:01.0: pciehp: Slot(1): Link Down
  pcieport 0000:06:01.0: pciehp: Slot(1): Card not present
  ============================================
  WARNING: possible recursive locking detected
  5.16.0-rc2+ #621 Not tainted
  --------------------------------------------
  irq/124-pciehp/86 is trying to acquire lock:
  ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80

  but task is already holding lock:
  ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180

   other info that might help us debug this:
   Possible unsafe locking scenario:

	 CPU0
	 ----
    lock(&ctrl->reset_lock);
    lock(&ctrl->reset_lock);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  3 locks held by irq/124-pciehp/86:
   #0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
   #1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110
   #2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40

  stack backtrace:
  CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ #621
  Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021
  Call Trace:
   <TASK>
   dump_stack_lvl+0x59/0x73
   __lock_acquire.cold+0xc5/0x2c6
   lock_acquire+0xb5/0x2b0
   down_read+0x3e/0x50
   pciehp_check_presence+0x23/0x80
   pciehp_runtime_resume+0x5c/0xa0
   device_for_each_child+0x45/0x70
   pcie_port_device_runtime_resume+0x20/0x30
   pci_pm_runtime_resume+0xa7/0xc0
   __rpm_callback+0x41/0x110
   rpm_callback+0x59/0x70
   rpm_resume+0x512/0x7b0
   __pm_runtime_resume+0x4a/0x90
   __device_release_driver+0x28/0x240
   device_release_driver+0x26/0x40
   pci_stop_bus_device+0x68/0x90
   pci_stop_bus_device+0x2c/0x90
   pci_stop_and_remove_bus_device+0xe/0x20
   pciehp_unconfigure_device+0x6c/0x110
   pciehp_disable_slot+0x5b/0xe0
   pciehp_handle_presence_or_link_change+0xc3/0x2f0
   pciehp_ist+0x179/0x180

This lockdep warning is triggered because with Thunderbolt, hotplug ports
are nested. When removing multiple devices in a daisy-chain, each hotplug
port's reset_lock may be acquired recursively. It's never the same lock, so
the lockdep splat is a false positive.

Because locks at the same hierarchy level are never acquired recursively, a
per-level lockdep class is sufficient to fix the lockdep warning.

The choice to use one lockdep subclass per pcie-hotplug controller in the
path to the root-bus was made to conserve class keys because their number
is limited and the complexity grows quadratically with number of keys
according to Documentation/locking/lockdep-design.rst.

Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/
Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redhat.com/
Link: https://lore.kernel.org/r/20211217141709.379663-1-hdegoede@redhat.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208855
Reported-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Jan 28, 2022
…rors

commit 085a9f4 upstream.

Use down_read_nested() and down_write_nested() when taking the
ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in
the path to the PCI root bus as lock subclass parameter.

This fixes the following false-positive lockdep report when unplugging a
Lenovo X1C8 from a Lenovo 2nd gen TB3 dock:

  pcieport 0000:06:01.0: pciehp: Slot(1): Link Down
  pcieport 0000:06:01.0: pciehp: Slot(1): Card not present
  ============================================
  WARNING: possible recursive locking detected
  5.16.0-rc2+ #621 Not tainted
  --------------------------------------------
  irq/124-pciehp/86 is trying to acquire lock:
  ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80

  but task is already holding lock:
  ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180

   other info that might help us debug this:
   Possible unsafe locking scenario:

	 CPU0
	 ----
    lock(&ctrl->reset_lock);
    lock(&ctrl->reset_lock);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  3 locks held by irq/124-pciehp/86:
   #0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
   #1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110
   #2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40

  stack backtrace:
  CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ #621
  Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021
  Call Trace:
   <TASK>
   dump_stack_lvl+0x59/0x73
   __lock_acquire.cold+0xc5/0x2c6
   lock_acquire+0xb5/0x2b0
   down_read+0x3e/0x50
   pciehp_check_presence+0x23/0x80
   pciehp_runtime_resume+0x5c/0xa0
   device_for_each_child+0x45/0x70
   pcie_port_device_runtime_resume+0x20/0x30
   pci_pm_runtime_resume+0xa7/0xc0
   __rpm_callback+0x41/0x110
   rpm_callback+0x59/0x70
   rpm_resume+0x512/0x7b0
   __pm_runtime_resume+0x4a/0x90
   __device_release_driver+0x28/0x240
   device_release_driver+0x26/0x40
   pci_stop_bus_device+0x68/0x90
   pci_stop_bus_device+0x2c/0x90
   pci_stop_and_remove_bus_device+0xe/0x20
   pciehp_unconfigure_device+0x6c/0x110
   pciehp_disable_slot+0x5b/0xe0
   pciehp_handle_presence_or_link_change+0xc3/0x2f0
   pciehp_ist+0x179/0x180

This lockdep warning is triggered because with Thunderbolt, hotplug ports
are nested. When removing multiple devices in a daisy-chain, each hotplug
port's reset_lock may be acquired recursively. It's never the same lock, so
the lockdep splat is a false positive.

Because locks at the same hierarchy level are never acquired recursively, a
per-level lockdep class is sufficient to fix the lockdep warning.

The choice to use one lockdep subclass per pcie-hotplug controller in the
path to the root-bus was made to conserve class keys because their number
is limited and the complexity grows quadratically with number of keys
according to Documentation/locking/lockdep-design.rst.

Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/
Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redhat.com/
Link: https://lore.kernel.org/r/20211217141709.379663-1-hdegoede@redhat.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208855
Reported-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Jan 28, 2022
…rors

commit 085a9f4 upstream.

Use down_read_nested() and down_write_nested() when taking the
ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in
the path to the PCI root bus as lock subclass parameter.

This fixes the following false-positive lockdep report when unplugging a
Lenovo X1C8 from a Lenovo 2nd gen TB3 dock:

  pcieport 0000:06:01.0: pciehp: Slot(1): Link Down
  pcieport 0000:06:01.0: pciehp: Slot(1): Card not present
  ============================================
  WARNING: possible recursive locking detected
  5.16.0-rc2+ #621 Not tainted
  --------------------------------------------
  irq/124-pciehp/86 is trying to acquire lock:
  ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80

  but task is already holding lock:
  ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180

   other info that might help us debug this:
   Possible unsafe locking scenario:

	 CPU0
	 ----
    lock(&ctrl->reset_lock);
    lock(&ctrl->reset_lock);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  3 locks held by irq/124-pciehp/86:
   #0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
   #1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110
   #2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40

  stack backtrace:
  CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ #621
  Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021
  Call Trace:
   <TASK>
   dump_stack_lvl+0x59/0x73
   __lock_acquire.cold+0xc5/0x2c6
   lock_acquire+0xb5/0x2b0
   down_read+0x3e/0x50
   pciehp_check_presence+0x23/0x80
   pciehp_runtime_resume+0x5c/0xa0
   device_for_each_child+0x45/0x70
   pcie_port_device_runtime_resume+0x20/0x30
   pci_pm_runtime_resume+0xa7/0xc0
   __rpm_callback+0x41/0x110
   rpm_callback+0x59/0x70
   rpm_resume+0x512/0x7b0
   __pm_runtime_resume+0x4a/0x90
   __device_release_driver+0x28/0x240
   device_release_driver+0x26/0x40
   pci_stop_bus_device+0x68/0x90
   pci_stop_bus_device+0x2c/0x90
   pci_stop_and_remove_bus_device+0xe/0x20
   pciehp_unconfigure_device+0x6c/0x110
   pciehp_disable_slot+0x5b/0xe0
   pciehp_handle_presence_or_link_change+0xc3/0x2f0
   pciehp_ist+0x179/0x180

This lockdep warning is triggered because with Thunderbolt, hotplug ports
are nested. When removing multiple devices in a daisy-chain, each hotplug
port's reset_lock may be acquired recursively. It's never the same lock, so
the lockdep splat is a false positive.

Because locks at the same hierarchy level are never acquired recursively, a
per-level lockdep class is sufficient to fix the lockdep warning.

The choice to use one lockdep subclass per pcie-hotplug controller in the
path to the root-bus was made to conserve class keys because their number
is limited and the complexity grows quadratically with number of keys
according to Documentation/locking/lockdep-design.rst.

Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/
Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redhat.com/
Link: https://lore.kernel.org/r/20211217141709.379663-1-hdegoede@redhat.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208855
Reported-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
whdgmawkd pushed a commit to whdgmawkd/linux that referenced this issue Feb 11, 2022
…rors

commit 085a9f4 upstream.

Use down_read_nested() and down_write_nested() when taking the
ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in
the path to the PCI root bus as lock subclass parameter.

This fixes the following false-positive lockdep report when unplugging a
Lenovo X1C8 from a Lenovo 2nd gen TB3 dock:

  pcieport 0000:06:01.0: pciehp: Slot(1): Link Down
  pcieport 0000:06:01.0: pciehp: Slot(1): Card not present
  ============================================
  WARNING: possible recursive locking detected
  5.16.0-rc2+ raspberrypi#621 Not tainted
  --------------------------------------------
  irq/124-pciehp/86 is trying to acquire lock:
  ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80

  but task is already holding lock:
  ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180

   other info that might help us debug this:
   Possible unsafe locking scenario:

	 CPU0
	 ----
    lock(&ctrl->reset_lock);
    lock(&ctrl->reset_lock);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  3 locks held by irq/124-pciehp/86:
   #0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
   raspberrypi#1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110
   raspberrypi#2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40

  stack backtrace:
  CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ raspberrypi#621
  Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021
  Call Trace:
   <TASK>
   dump_stack_lvl+0x59/0x73
   __lock_acquire.cold+0xc5/0x2c6
   lock_acquire+0xb5/0x2b0
   down_read+0x3e/0x50
   pciehp_check_presence+0x23/0x80
   pciehp_runtime_resume+0x5c/0xa0
   device_for_each_child+0x45/0x70
   pcie_port_device_runtime_resume+0x20/0x30
   pci_pm_runtime_resume+0xa7/0xc0
   __rpm_callback+0x41/0x110
   rpm_callback+0x59/0x70
   rpm_resume+0x512/0x7b0
   __pm_runtime_resume+0x4a/0x90
   __device_release_driver+0x28/0x240
   device_release_driver+0x26/0x40
   pci_stop_bus_device+0x68/0x90
   pci_stop_bus_device+0x2c/0x90
   pci_stop_and_remove_bus_device+0xe/0x20
   pciehp_unconfigure_device+0x6c/0x110
   pciehp_disable_slot+0x5b/0xe0
   pciehp_handle_presence_or_link_change+0xc3/0x2f0
   pciehp_ist+0x179/0x180

This lockdep warning is triggered because with Thunderbolt, hotplug ports
are nested. When removing multiple devices in a daisy-chain, each hotplug
port's reset_lock may be acquired recursively. It's never the same lock, so
the lockdep splat is a false positive.

Because locks at the same hierarchy level are never acquired recursively, a
per-level lockdep class is sufficient to fix the lockdep warning.

The choice to use one lockdep subclass per pcie-hotplug controller in the
path to the root-bus was made to conserve class keys because their number
is limited and the complexity grows quadratically with number of keys
according to Documentation/locking/lockdep-design.rst.

Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/
Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redhat.com/
Link: https://lore.kernel.org/r/20211217141709.379663-1-hdegoede@redhat.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208855
Reported-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants