Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Synopsys USB OTG driver to v2.94a #68

Closed
wants to merge 9 commits into from
Closed

Update Synopsys USB OTG driver to v2.94a #68

wants to merge 9 commits into from

Conversation

ddv2005
Copy link

@ddv2005 ddv2005 commented Jul 30, 2012

This is update for Synopsys USB OTG driver to version 2.94a. It is very clean and looks much better than the current version.

@popcornmix
Copy link
Collaborator

Have you tested this much?
Does this have all patches that have been applied to 2.90 over the last few months?

We did originally get v2.94a working just before the first github kernel release, but at that time we could make it lock up much more easily than the 2.90 driver. (I believe running netcat would make it hang).
Now, there have been numerous fixes to the kernel (both in dwc_otg and outside), the firmware and the distribution (e.g. vm.min_free_kbytes) which may have cured this hang.

I have been thinking about revisiting this, so the patch is welcome, but it will need some testing.

@popcornmix
Copy link
Collaborator

Does it build when applied to a clean tree for you? I get:
drivers/usb/host/dwc_otg/dwc_otg_driver.c:51:28: fatal error: dwc_otg_os_dep.h: No such file or directory compilation terminated.

@ddv2005
Copy link
Author

ddv2005 commented Jul 30, 2012

I did not tested it much but I guess that 2.94 have to be base for all
feature changes. Because current driver is missing many basic locks
and it need so much refactoring to be safe.

I found only 2 patches. I have applied patch to change count of host channels
from 12 to 8. But my patch for USB audio is not necessary
because 2.94 work fine without it.

Could you provide command line to check netcat hang?

I just forgot to add new files. Now new files on place

@popcornmix
Copy link
Collaborator

I think these fixes are still needed:
Fixed issue with some keyboards giving too much data : f679f05
Fix for DWC OTG HCD URB Dequeue has NULL URB panic: d25ac7a
Fix (hopefully) for DWC_MEMCPY kernel panics: 30b708f

@asb
Copy link

asb commented Jul 31, 2012

The netcat lockup when we last looked at 2.94 was reproducible like so:

$ cat ncrecv
#!/bin/sh
nc -l -p 1234 -v > /dev/null

$ cat ncsend
#!/bin/sh
nc $1 1234 < /dev/zero

On the pi: ./ncrev and on the computer connected to the pi: ./ncsend $RASPI_IP. This would produce a lockup within 10 minutes or so (likely much less). Neither the smsc95xx.turbo_mode=N or vm.min_free_kbytes would have an effect. I agree that if we can verify this issue either no longer occurs or can fix it, 2.94 is probably a better base for future development.

@popcornmix
Copy link
Collaborator

The netcat test does fail with this pull request appied.
The sending end just returns. The Pi just hangs. No panic shown, but ctrl-c / ctrl-z don't stop it.

Using sysreq to interrupt it works, and I can see in dmesg log

<4>[  417.122159] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
<4>[  417.122209] smsc95xx 1-1.1:1.0: eth0: MII is busy in smsc95xx_mdio_read
<4>[  422.121795] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
<4>[  422.121829] smsc95xx 1-1.1:1.0: eth0: MII is busy in smsc95xx_mdio_read
<4>[  428.221322] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
<5>[  432.410997] nfs: server 10.177.65.68 not responding, still trying
<4>[  433.220955] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000114
<4>[  438.220594] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
<4>[  443.220229] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000118
<4>[  448.219875] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
<4>[  453.219507] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000114
<4>[  458.219135] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
<4>[  463.218777] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000118
<5>[  468.768378] nfs: server 10.177.65.68 not responding, still trying
<4>[  497.176209] ------------[ cut here ]------------
<4>[  497.176271] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x2e8/0x308()
<6>[  497.176290] NETDEV WATCHDOG: eth0 (smsc95xx): transmit queue 0 timed out
<4>[  497.176316] Modules linked in: snd_bcm2835 snd_pcm snd_seq snd_timer snd_seq_device snd snd_page_alloc evdev
more>
<4>[  497.176418] [<c0015004>] (unwind_backtrace+0x0/0xfc) from [<c03fb1f0>] (dump_stack+0x20/0x24)
<4>[  497.176474] [<c03fb1f0>] (dump_stack+0x20/0x24) from [<c002e77c>] (warn_slowpath_common+0x5c/0x74)
<4>[  497.176513] [<c002e77c>] (warn_slowpath_common+0x5c/0x74) from [<c002e850>] (warn_slowpath_fmt+0x40/0x48)
<4>[  497.176549] [<c002e850>] (warn_slowpath_fmt+0x40/0x48) from [<c0363d20>] (dev_watchdog+0x2e8/0x308)
<4>[  497.176606] [<c0363d20>] (dev_watchdog+0x2e8/0x308) from [<c003e7e0>] (run_timer_softirq+0x158/0x41c)
<4>[  497.176654] [<c003e7e0>] (run_timer_softirq+0x158/0x41c) from [<c00358d4>] (__do_softirq+0xbc/0x23c)
<4>[  497.176703] [<c00358d4>] (__do_softirq+0xbc/0x23c) from [<c0035f30>] (irq_exit+0xa8/0xb4)
<4>[  497.176750] [<c0035f30>] (irq_exit+0xa8/0xb4) from [<c000ef50>] (handle_IRQ+0x44/0x94)
<4>[  497.176785] [<c000ef50>] (handle_IRQ+0x44/0x94) from [<c00084a0>] (asm_do_IRQ+0x18/0x1c)
<4>[  497.176843] [<c00084a0>] (asm_do_IRQ+0x18/0x1c) from [<c04016b8>] (__irq_svc+0x38/0xd0)
<4>[  497.176865] Exception stack(0xc05d3f38 to 0xc05d3f80)
<4>[  497.176885] 3f20:                                                       c05d4e94 00000000
<4>[  497.176912] 3f40: c05d3f80 00000000 c05d2000 c0603684 c05d902c c08529a0 00004008 410fb767
<4>[  497.176951] 3f60: 005ba91c c05d3f8c c05d3f80 c05d3f80 c000f0cc c000f0d0 60000013 ffffffff
<4>[  497.176994] [<c04016b8>] (__irq_svc+0x38/0xd0) from [<c000f0d0>] (default_idle+0x38/0x3c)
<4>[  497.177029] [<c000f0d0>] (default_idle+0x38/0x3c) from [<c000f30c>] (cpu_idle+0xb8/0xe4)
<4>[  497.177076] [<c000f30c>] (cpu_idle+0xb8/0xe4) from [<c03fa094>] (rest_init+0xa0/0xb8)
<4>[  497.177115] [<c03fa094>] (rest_init+0xa0/0xb8) from [<c05a07a4>] (start_kernel+0x294/0x2e0)
<4>[  497.177138] ---[ end trace 128ecba635631019 ]---
<5>[  509.495403] nfs: server 10.177.65.68 not responding, still trying
<5>[  517.204853] nfs: server 10.177.65.68 not responding, still trying
<5>[  593.099366] nfs: server 10.177.65.68 not responding, still trying
more>
<5>[ 1768.913811] nfs: server 10.177.65.68 not responding, still trying
<5>[ 1796.411805] nfs: server 10.177.65.68 not responding, still trying

@popcornmix
Copy link
Collaborator

What effect does the additional commit have? Can you see the netcat hang, and does this commit help?

@ddv2005
Copy link
Author

ddv2005 commented Aug 5, 2012

Unfortunately It still hangs but I still trying to find the problem.

@ddv2005
Copy link
Author

ddv2005 commented Aug 10, 2012

Hi,

I found that it hangs because driver stopped to receive host channel interrupts, but I can't find why it is happened. I have detailed log file and it is look fine https://gist.github.com/3315356.

1296.983854] [ 1281.254588] <7>[ 1281.254589] DWC_otg: DWC OTG HCD Interrupt Detected gintsts&gintmsk=0x02000000 core_if=ce8f0000
[ 1297.002106] [ 1281.254599] <7>[ 1281.254600] DWC_otg: --Host Channel Interrupt--, Channel 3
[ 1297.017205] [ 1281.254605] <7>[ 1281.254606] DWC_otg: hcint 0x00000023, hcintmsk 0x00000006, hcint&hcintmsk 0x00000002
[ 1297.035009] [ 1281.254610] <7>[ 1281.254611] DWC_otg: --Host Channel 3 Interrupt: Channel Halted--
[ 1297.051095] [ 1281.254615] <7>[ 1281.254616] DWC_otg: --Host Channel 3 Interrupt: Transfer Complete--
[ 1297.067541] [ 1281.254619] <7>[ 1281.254620] DWC_otg: Bulk transfer complete
[ 1297.078514] [ 1281.254831] <7>[ 1281.254832] DWC_otg: DWC_otg: update_urb_state_xfer_comp: IN, channel 3
[ 1297.095195] [ 1281.254839] <7>[ 1281.254840] DWC_otg: hc->xfer_len 18944
[ 1297.105744] [ 1281.254842] <7>[ 1281.254843] DWC_otg: hctsiz.xfersize 610
[ 1297.116246] [ 1281.254845] <7>[ 1281.254846] DWC_otg: urb->transfer_buffer_length 18944
[ 1297.131254] [ 1281.254848] <7>[ 1281.254849] DWC_otg: urb->actual_length 18334
[ 1297.145469] [ 1281.254851] <7>[ 1281.254852] DWC_otg: short_read 1, xfer_done 1
[ 1297.159784] [ 1281.254856] _complete: urb ce9893a0, device 3, ep 1 IN, status=0
[ 1297.170540] [ 1281.254901] dwc_otg_urb_enqueue, urb ce9893a0
[ 1297.179608] [ 1281.254907] Device address: 3
[ 1297.187418] [ 1281.254909] Endpoint: 1, IN
[ 1297.195061] [ 1281.254912] Endpoint type: BULK
[ 1297.203036] [ 1281.254914] Speed: HIGH
[ 1297.210210] [ 1281.254916] Max packet size: 512
[ 1297.218068] [ 1281.254918] Data buffer length: 18944
[ 1297.226306] [ 1281.254920] Transfer buffer: cda50002, Transfer DMA: 4da50002
[ 1297.236614] [ 1281.254924] Setup buffer: (null), Setup DMA: (null)
[ 1297.246395] [ 1281.254927] Interval: 0
[ 1297.253285] [ 1281.254943] <7>[ 1281.254945] DWC_otg: release_channel: channel 3, halt_status 2, xfer_len 18944
[ 1297.269389] [ 1281.254953] <7>[ 1281.254953] DWC_otg: deactivate_qh(ce988000,ce9cfee0,1)
[ 1297.283430] [ 1281.254966] <7>[ 1281.254967] DWC_otg: assign_and_init_hc(ce988000,ce9cfee0) - urb ce9cf2a0, actual_length 0
[ 1297.300574] [ 1281.254980] <7>[ 1281.254981] DWC_otg: dwc_otg_hc_init: Channel 2, Dev Addr 3, EP #1
[ 1297.315687] [ 1281.254987] <7>[ 1281.254988] DWC_otg: Is In 1, Is Low Speed 0, EP Type 2, Max Pkt 512, Multi Cnt 0
[ 1297.332591] [ 1281.254995] <7>[ 1281.254996] DWC_otg: Queue non-periodic transactions
[ 1297.346746] [ 1281.254998] <7>[ 1281.254999] DWC_otg: NP Tx Req Queue Space Avail (before queue): 8
[ 1297.362334] [ 1281.255002] <7>[ 1281.255002] DWC_otg: NP Tx FIFO Space Avail (before queue): 256
[ 1297.377672] [ 1281.255010] <7>[ 1281.255010] DWC_otg: dwc_otg_hc_start_transfer: Channel 2
[ 1297.392571] [ 1281.255014] <7>[ 1281.255015] DWC_otg: Xfer Size: 18944
[ 1297.402820] [ 1281.255018] <7>[ 1281.255019] DWC_otg: Num Pkts: 37
[ 1297.412695] [ 1281.255020] <7>[ 1281.255021] DWC_otg: Start PID: 0
[ 1297.422578] [ 1281.255026] <7>[ 1281.255026] DWC_otg: transfer 2 from core_if ce8f0000
[ 1297.437410] [ 1281.255038] <7>[ 1281.255039] DWC_otg: DWC OTG HCD Finished Servicing Interrupts
[ 1297.453216] [ 1281.255041] <7>[ 1281.255042] DWC_otg: DWC OTG HCD gintsts=0x04600009
[ 1297.468173] [ 1281.255045] <7>[ 1281.255046] DWC_otg: DWC OTG HCD gintmsk=0xf300080e

At this point we are setup DMA transfer for HC 2. HC interrupts is enabled. Every 1ms it got SOF interrupt but we never got any other HC interrupts for ANY channel. After 10 seconds it got xfer timeout.

[ 1298.598884] [ 1291.271150] <4>[ 1291.271152] WARN::hc_xfer_timeout:2602: hc_xfer_timeout: timeout on channel 2
[ 1298.601855]
[ 1298.617037] [ 1291.271164] <4>[ 1291.271165] WARN::hc_xfer_timeout:2604: start_hcchar_val 0x00d88a00
[ 1298.620010]
[ 1298.634774] DWC_otg: DWC OTG HCD Interrupts
[ 1298.641409] DWC_otg: DWC OTG HCD gintsts=0x04600031
[ 1298.648642] DWC_otg: DWC OTG HCD gintmsk=0xf300080e
[ 1298.655967] WARN::hc_xfer_timeout:2615: Real start_hcchar_val 0x80d88a00

You can see that interrupts still enabled and not HC interrupts queued (0x2000000 is not set). And HC transfer still enabled (0x80000000 in hcchar).
I don't understand why it stopped. Maybe you have some advice about it.

Best Regards,
Dmitry

@popcornmix
Copy link
Collaborator

Gray who tried to get 2.94 working some time ago responded:
"I suppose the system is still operational is it? If not, could we have
system interrupts disabled still? Perhaps a lock that was never
cleared?

(As far as I remember, my investigations suggested that we never
completed the processing of a transmit(?) queue eventually (minutes)
leading to a watchdog firing. Queues are updated under a lock ...

For me the eventual consequence was that the watchdog caused the USB
driver to tidy up ... which meant that it needed to traverse the queue
in question ... which meant that it took out the queue's lock
recursively. Which wasn't good either.)"

@ddv2005
Copy link
Author

ddv2005 commented Aug 11, 2012

I checked all locks and everything fine. I compiled kernel with lock detection flags and it can't find any deadlocked locks. You can see that interrupt handler successfully exit " [ 1281.255038] [ 1281.255039] DWC_otg: DWC OTG HCD Finished Servicing Interrupts" but never enter again except the SOF. Interrupts still enabled and every 1ms it got SOF interrupt in same IRQ handler. In case of deadlock IRQ handler can't finish. It is meant that in this case is no deadlock.

I have dump of HC registers at time of problem and it is look fine (channels enabled and no pending IRQ):

[ 1577.141978] DWC_otg: DWC OTG HCD gintsts=0x04600031
[ 1577.149039] DWC_otg: DWC OTG HCD gintmsk=0xf300080e
[ 1577.156119] WARN::hc_xfer_timeout:2615: Real hcchar_val 0x80d81200
[ 1577.156131]
[ 1577.168087] Host Global Registers
[ 1577.173650] HCFG @0xCF840400 : 0x00000000
[ 1577.180021] HFIR @0xCF840404 : 0x00001D4C
[ 1577.186426] HFNUM @0xCF840408 : 0x055838DC
[ 1577.192731] HPTXSTS @0xCF840410 : 0xB9080200
[ 1577.199307] HAINT @0xCF840414 : 0x00000000
[ 1577.205721] HAINTMSK @0xCF840418 : 0x000000FF
[ 1577.212308] HPRT0 @0xCF840440 : 0x00001005
[ 1577.218718] Host Channel 0 Specific Registers
[ 1577.225295] HCCHAR @0xCF840500 : 0xC09C8801
[ 1577.231733] HCSPLT @0xCF840504 : 0x00000000
[ 1577.238183] HCINT @0xCF840508 : 0x00000000
[ 1577.244542] HCINTMSK @0xCF84050C : 0x00000006
[ 1577.251144] HCTSIZ @0xCF840510 : 0x00080001
[ 1577.257621] HCDMA @0xCF840514 : 0x4EA69C80
[ 1577.263999] Host Channel 1 Specific Registers
[ 1577.270512] HCCHAR @0xCF840520 : 0x00D88A00
[ 1577.276901] HCSPLT @0xCF840524 : 0x00000000
[ 1577.283184] HCINT @0xCF840528 : 0x00000000
[ 1577.289358] HCINTMSK @0xCF84052C : 0x00000000
[ 1577.295734] HCTSIZ @0xCF840530 : 0xC0080262
[ 1577.301907] HCDMA @0xCF840534 : 0x4DDF47A0
[ 1577.308071] Host Channel 2 Specific Registers
[ 1577.314428] HCCHAR @0xCF840540 : 0x80D81200
[ 1577.320612] HCSPLT @0xCF840544 : 0x00000000
[ 1577.326768] HCINT @0xCF840548 : 0x00000000
[ 1577.332808] HCINTMSK @0xCF84054C : 0x00000006
[ 1577.339132] HCTSIZ @0xCF840550 : 0x0008004E
[ 1577.345313] HCDMA @0xCF840554 : 0x4DE80000
[ 1577.351407] Host Channel 3 Specific Registers
[ 1577.357717] HCCHAR @0xCF840560 : 0xE0DC9810
[ 1577.363923] HCSPLT @0xCF840564 : 0x00000000
[ 1577.370080] HCINT @0xCF840568 : 0x00000000
[ 1577.376163] HCINTMSK @0xCF84056C : 0x00000006
[ 1577.382482] HCTSIZ @0xCF840570 : 0x40080010
[ 1577.388660] HCDMA @0xCF840574 : 0x4EA57680
[ 1577.394755] Host Channel 4 Specific Registers
[ 1577.401013] HCCHAR @0xCF840580 : 0x00D81200
[ 1577.407251] HCSPLT @0xCF840584 : 0x00000000
[ 1577.413371] HCINT @0xCF840588 : 0x00000000
[ 1577.419470] HCINTMSK @0xCF84058C : 0x00000000
[ 1577.425828] HCTSIZ @0xCF840590 : 0x4000004E
[ 1577.431954] HCDMA @0xCF840594 : 0x4DE80050
[ 1577.438061] Host Channel 5 Specific Registers
[ 1577.444351] HCCHAR @0xCF8405A0 : 0x00D81200
[ 1577.450525] HCSPLT @0xCF8405A4 : 0x00000000
[ 1577.456701] HCINT @0xCF8405A8 : 0x00000000
[ 1577.462707] HCINTMSK @0xCF8405AC : 0x00000000
[ 1577.469043] HCTSIZ @0xCF8405B0 : 0x0000004E
[ 1577.475200] HCDMA @0xCF8405B4 : 0x4DE80050
[ 1577.481246] Host Channel 6 Specific Registers
[ 1577.487535] HCCHAR @0xCF8405C0 : 0x20DC9810
[ 1577.493729] HCSPLT @0xCF8405C4 : 0x00000000
[ 1577.499875] HCINT @0xCF8405C8 : 0x00000000
[ 1577.505948] HCINTMSK @0xCF8405CC : 0x00000000
[ 1577.512220] HCTSIZ @0xCF8405D0 : 0xC0080010
[ 1577.518389] HCDMA @0xCF8405D4 : 0x4EA57680
[ 1577.524505] Host Channel 7 Specific Registers
[ 1577.530728] HCCHAR @0xCF8405E0 : 0x80D88A00
[ 1577.536940] HCSPLT @0xCF8405E4 : 0x00000000
[ 1577.543074] HCINT @0xCF8405E8 : 0x00000020
[ 1577.549170] HCINTMSK @0xCF8405EC : 0x00000006
[ 1577.555525] HCTSIZ @0xCF8405F0 : 0x00A02C00
[ 1577.561622] HCDMA @0xCF8405F4 : 0x4DDF1E00

@popcornmix
Copy link
Collaborator

I'm afraid I don't know a great deal about this driver, but is it possible to identify the cause of the netcat lockup by gradually merging from 2.90 to 2.94 and find which change breaks netcat?

This is straightforward when all changes are independent. Obviously not all will be, which makes this much tougher to do, but it may be worth considering.

@ddv2005
Copy link
Author

ddv2005 commented Aug 11, 2012

I am not sure that it is possible. 2.94 have so many core changes. Next week I will try to compare 2.90 and 2.94 logs.

@ddv2005
Copy link
Author

ddv2005 commented Aug 13, 2012

I hope that I found the problem and now I am testing the fix. It is look like they just remove Broadcom workaround for DMA burst len.

@popcornmix
Copy link
Collaborator

Interesting! I'm keen to move to this code if testing is good. Fingers crossed...

@ddv2005
Copy link
Author

ddv2005 commented Aug 14, 2012

7+ hours of testing and ... it still work :-)

@popcornmix
Copy link
Collaborator

Do you think you can rebase this against latest code?

It seems to have sdcard commits included, which don't apply (a second time).
Also the SOF patch has been reverted, which affects the DWC_OTG code.

@ddv2005
Copy link
Author

ddv2005 commented Aug 14, 2012

I did it but now I can't compile the kernel because it is required removed __aeabi_uldivmod . Could you revert this patch 9592f71#arch/arm/boot ?

@ddv2005
Copy link
Author

ddv2005 commented Aug 14, 2012

#83

@ddv2005 ddv2005 closed this Aug 14, 2012
@haolongzhang
Copy link

dear ddv2005
What is the final solution about blow log ,Thanks
[33525.261931@0] WARN::hc_xfer_timeout:2647: hc_xfer_timeout: timeout on channel 9
[33525.263649@0] WARN::hc_xfer_timeout:2649: start_hcchar_val 0x0258a200
[33525.270171@0] WARN::hc_xfer_timeout:2654: chn-9,ep4-IN:type:2,speed:2,len:4096,addr9

@jeanlemotan jeanlemotan mentioned this pull request Feb 1, 2015
anholt referenced this pull request in anholt/linux Aug 20, 2016
Below call chain causes system crash when OMAP device is
removed by calling of_platform_depopulate()/device_del():

device_del()
- blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 			     BUS_NOTIFY_DEL_DEVICE, dev);
  - _omap_device_notifier_call()
    - omap_device_delete()
      - od->pdev->archdata.od = NULL;
	kfree(od->hwmods);
	kfree(od);
  - bus_remove_device()
    - device_release_driver()
      - __device_release_driver()
	- pm_runtime_get_sync()
	   - _od_runtime_resume()
	     - omap_hwmod_enable() <- OOPS od's delted already

Backtrace:
Unable to handle kernel NULL pointer dereference at virtual address 0000000d
pgd = eb100000
[0000000d] *pgd=ad6e1831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT SMP ARM
CPU: 1 PID: 1273 Comm: modprobe Not tainted 4.4.15-rt19-00115-ge4d3cd3-dirty #68
Hardware name: Generic DRA74X (Flattened Device Tree)
task: eb1ee800 ti: ec962000 task.ti: ec962000
PC is at omap_device_enable+0x10/0x90
LR is at _od_runtime_resume+0x10/0x24
[...]
[<c00299dc>] (omap_device_enable) from [<c0029a6c>] (_od_runtime_resume+0x10/0x24)
[<c0029a6c>] (_od_runtime_resume) from [<c04ad404>] (__rpm_callback+0x20/0x34)
[<c04ad404>] (__rpm_callback) from [<c04ad438>] (rpm_callback+0x20/0x80)
[<c04ad438>] (rpm_callback) from [<c04aee28>] (rpm_resume+0x48c/0x964)
[<c04aee28>] (rpm_resume) from [<c04af360>] (__pm_runtime_resume+0x60/0x88)
[<c04af360>] (__pm_runtime_resume) from [<c04a4974>] (__device_release_driver+0x30/0x100)
[<c04a4974>] (__device_release_driver) from [<c04a4a60>] (device_release_driver+0x1c/0x28)
[<c04a4a60>] (device_release_driver) from [<c04a38c0>] (bus_remove_device+0xec/0x144)
[<c04a38c0>] (bus_remove_device) from [<c04a0764>] (device_del+0x10c/0x210)
[<c04a0764>] (device_del) from [<c04a67b0>] (platform_device_del+0x18/0x84)
[<c04a67b0>] (platform_device_del) from [<c04a6828>] (platform_device_unregister+0xc/0x20)
[<c04a6828>] (platform_device_unregister) from [<c05adcfc>] (of_platform_device_destroy+0x8c/0x90)
[<c05adcfc>] (of_platform_device_destroy) from [<c04a02f0>] (device_for_each_child+0x4c/0x78)
[<c04a02f0>] (device_for_each_child) from [<c05adc5c>] (of_platform_depopulate+0x30/0x44)
[<c05adc5c>] (of_platform_depopulate) from [<bf123920>] (cpsw_remove+0x68/0xf4 [ti_cpsw])
[<bf123920>] (cpsw_remove [ti_cpsw]) from [<c04a68d8>] (platform_drv_remove+0x24/0x3c)
[<c04a68d8>] (platform_drv_remove) from [<c04a49c8>] (__device_release_driver+0x84/0x100)
[<c04a49c8>] (__device_release_driver) from [<c04a4b20>] (driver_detach+0xac/0xb0)
[<c04a4b20>] (driver_detach) from [<c04a3be8>] (bus_remove_driver+0x60/0xd4)
[<c04a3be8>] (bus_remove_driver) from [<c00d9870>] (SyS_delete_module+0x184/0x20c)
[<c00d9870>] (SyS_delete_module) from [<c0010540>] (ret_fast_syscall+0x0/0x1c)
Code: e3500000 e92d4070 1590630c 01a06000 (e5d6300d)

Hence, fix it by using BUS_NOTIFY_REMOVED_DEVICE event for OMAP device
deletion which is sent when DD has finished processing of device
deletion.

Cc: Tony Lindgren <tony@atomide.com>
Cc: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
popcornmix pushed a commit that referenced this pull request Sep 5, 2016
of_mm_gpiochip_add_data() calls mm_gc->save_regs() before
setting the data. Therefore ->save_regs() cannot use
gpiochip_get_data()

[    0.275940] Unable to handle kernel paging request for data at address 0x00000130
[    0.283120] Faulting instruction address: 0xc01b44cc
[    0.288175] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.293343] PREEMPT CMPC885
[    0.296141] CPU: 0 PID: 1 Comm: swapper Not tainted 4.7.0-g65124df-dirty #68
[    0.304131] task: c6074000 ti: c6080000 task.ti: c6080000
[    0.309459] NIP: c01b44cc LR: c0011720 CTR: c0011708
[    0.314372] REGS: c6081d90 TRAP: 0300   Not tainted  (4.7.0-g65124df-dirty)
[    0.322267] MSR: 00009032 <EE,ME,IR,DR,RI>  CR: 24000028  XER: 20000000
[    0.328813] DAR: 00000130 DSISR: c0000000
GPR00: c01b6d0c c6081e40 c6074000 c6017000 c9028000 c601d028 c6081dd8 00000000
GPR08: c601d028 00000000 ffffffff 00000001 24000044 00000000 c0002790 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c05643b0 00000083
GPR24: c04a1a6c c0560000 c04a8308 c04c6480 c0012498 c6017000 c7ffcc78 c6017000
[    0.360806] NIP [c01b44cc] gpiochip_get_data+0x4/0xc
[    0.365684] LR [c0011720] cpm1_gpio16_save_regs+0x18/0x44
[    0.370972] Call Trace:
[    0.373451] [c6081e50] [c01b6d0c] of_mm_gpiochip_add_data+0x70/0xdc
[    0.379624] [c6081e70] [c00124c0] cpm_init_par_io+0x28/0x118
[    0.385238] [c6081e80] [c04a8ac0] do_one_initcall+0xb0/0x17c
[    0.390819] [c6081ef0] [c04a8cbc] kernel_init_freeable+0x130/0x1dc
[    0.396924] [c6081f30] [c00027a4] kernel_init+0x14/0x110
[    0.402177] [c6081f40] [c000b424] ret_from_kernel_thread+0x5c/0x64
[    0.408233] Instruction dump:
[    0.411168] 4182fafc 3f80c040 48234c6d 3bc0fff0 3b9c5ed0 4bfffaf4 81290020 712a0004
[    0.418825] 4182fb34 48234c51 4bfffb2c 81230004 <80690130> 4e800020 7c0802a6 9421ffe0
[    0.426763] ---[ end trace fe4113ee21d72ffa ]---

fixes: e65078f ("powerpc: sysdev: cpm1: use gpiochip data pointer")
fixes: a14a2d4 ("powerpc: cpm_common: use gpiochip data pointer")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
popcornmix pushed a commit that referenced this pull request Sep 24, 2016
commit 41017a7 upstream.

of_mm_gpiochip_add_data() calls mm_gc->save_regs() before
setting the data. Therefore ->save_regs() cannot use
gpiochip_get_data()

[    0.275940] Unable to handle kernel paging request for data at address 0x00000130
[    0.283120] Faulting instruction address: 0xc01b44cc
[    0.288175] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.293343] PREEMPT CMPC885
[    0.296141] CPU: 0 PID: 1 Comm: swapper Not tainted 4.7.0-g65124df-dirty #68
[    0.304131] task: c6074000 ti: c6080000 task.ti: c6080000
[    0.309459] NIP: c01b44cc LR: c0011720 CTR: c0011708
[    0.314372] REGS: c6081d90 TRAP: 0300   Not tainted  (4.7.0-g65124df-dirty)
[    0.322267] MSR: 00009032 <EE,ME,IR,DR,RI>  CR: 24000028  XER: 20000000
[    0.328813] DAR: 00000130 DSISR: c0000000
GPR00: c01b6d0c c6081e40 c6074000 c6017000 c9028000 c601d028 c6081dd8 00000000
GPR08: c601d028 00000000 ffffffff 00000001 24000044 00000000 c0002790 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c05643b0 00000083
GPR24: c04a1a6c c0560000 c04a8308 c04c6480 c0012498 c6017000 c7ffcc78 c6017000
[    0.360806] NIP [c01b44cc] gpiochip_get_data+0x4/0xc
[    0.365684] LR [c0011720] cpm1_gpio16_save_regs+0x18/0x44
[    0.370972] Call Trace:
[    0.373451] [c6081e50] [c01b6d0c] of_mm_gpiochip_add_data+0x70/0xdc
[    0.379624] [c6081e70] [c00124c0] cpm_init_par_io+0x28/0x118
[    0.385238] [c6081e80] [c04a8ac0] do_one_initcall+0xb0/0x17c
[    0.390819] [c6081ef0] [c04a8cbc] kernel_init_freeable+0x130/0x1dc
[    0.396924] [c6081f30] [c00027a4] kernel_init+0x14/0x110
[    0.402177] [c6081f40] [c000b424] ret_from_kernel_thread+0x5c/0x64
[    0.408233] Instruction dump:
[    0.411168] 4182fafc 3f80c040 48234c6d 3bc0fff0 3b9c5ed0 4bfffaf4 81290020 712a0004
[    0.418825] 4182fb34 48234c51 4bfffb2c 81230004 <80690130> 4e800020 7c0802a6 9421ffe0
[    0.426763] ---[ end trace fe4113ee21d72ffa ]---

fixes: e65078f ("powerpc: sysdev: cpm1: use gpiochip data pointer")
fixes: a14a2d4 ("powerpc: cpm_common: use gpiochip data pointer")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this pull request Jul 20, 2020
… list

commit 04e484c upstream.

[BUG]
The following small test script can trigger ASSERT() at unmount time:

  mkfs.btrfs -f $dev
  mount $dev $mnt
  mount -o remount,discard=async $mnt
  umount $mnt

The call trace:
  assertion failed: atomic_read(&block_group->count) == 1, in fs/btrfs/block-group.c:3431
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.h:3204!
  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 4 PID: 10389 Comm: umount Tainted: G           O      5.8.0-rc3-custom+ #68
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   btrfs_free_block_groups.cold+0x22/0x55 [btrfs]
   close_ctree+0x2cb/0x323 [btrfs]
   btrfs_put_super+0x15/0x17 [btrfs]
   generic_shutdown_super+0x72/0x110
   kill_anon_super+0x18/0x30
   btrfs_kill_super+0x17/0x30 [btrfs]
   deactivate_locked_super+0x3b/0xa0
   deactivate_super+0x40/0x50
   cleanup_mnt+0x135/0x190
   __cleanup_mnt+0x12/0x20
   task_work_run+0x64/0xb0
   __prepare_exit_to_usermode+0x1bc/0x1c0
   __syscall_return_slowpath+0x47/0x230
   do_syscall_64+0x64/0xb0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

The code:
                ASSERT(atomic_read(&block_group->count) == 1);
                btrfs_put_block_group(block_group);

[CAUSE]
Obviously it's some btrfs_get_block_group() call doesn't get its put
call.

The offending btrfs_get_block_group() happens here:

  void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
  {
  	if (list_empty(&bg->bg_list)) {
  		btrfs_get_block_group(bg);
		list_add_tail(&bg->bg_list, &fs_info->unused_bgs);
  	}
  }

So every call sites removing the block group from unused_bgs list should
reduce the ref count of that block group.

However for async discard, it didn't follow the call convention:

  void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info *fs_info)
  {
  	list_for_each_entry_safe(block_group, next, &fs_info->unused_bgs,
  				 bg_list) {
  		list_del_init(&block_group->bg_list);
  		btrfs_discard_queue_work(&fs_info->discard_ctl, block_group);
  	}
  }

And in btrfs_discard_queue_work(), it doesn't call
btrfs_put_block_group() either.

[FIX]
Fix the problem by reducing the reference count when we grab the block
group from unused_bgs list.

Reported-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Fixes: 6e80d4f ("btrfs: handle empty block_group removal for async discard")
CC: stable@vger.kernel.org # 5.6+
Tested-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this pull request Nov 3, 2020
Each EMAD transaction stores the skb used to issue the EMAD request
('trans->tx_skb') so that the request could be retried in case of a
timeout. The skb can be freed when a corresponding response is received
or as part of the retry logic (e.g., failed retransmit, exceeded maximum
number of retries).

The two tasks (i.e., response processing and retransmits) are
synchronized by the atomic 'trans->active' field which ensures that
responses to inactive transactions are ignored.

In case of a failed retransmit the transaction is finished and all of
its resources are freed. However, the current code does not mark it as
inactive. Syzkaller was able to hit a race condition in which a
concurrent response is processed while the transaction's resources are
being freed, resulting in a use-after-free [1].

Fix the issue by making sure to mark the transaction as inactive after a
failed retransmit and free its resources only if a concurrent task did
not already do that.

[1]
BUG: KASAN: use-after-free in consume_skb+0x30/0x370
net/core/skbuff.c:833
Read of size 4 at addr ffff88804f570494 by task syz-executor.0/1004

CPU: 0 PID: 1004 Comm: syz-executor.0 Not tainted 5.8.0-rc7+ #68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xf6/0x16e lib/dump_stack.c:118
 print_address_description.constprop.0+0x1c/0x250
mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 check_memory_region_inline mm/kasan/generic.c:186 [inline]
 check_memory_region+0x14e/0x1b0 mm/kasan/generic.c:192
 instrument_atomic_read include/linux/instrumented.h:56 [inline]
 atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
 refcount_read include/linux/refcount.h:147 [inline]
 skb_unref include/linux/skbuff.h:1044 [inline]
 consume_skb+0x30/0x370 net/core/skbuff.c:833
 mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592
 mlxsw_emad_process_response drivers/net/ethernet/mellanox/mlxsw/core.c:651 [inline]
 mlxsw_emad_rx_listener_func+0x5c9/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:672
 mlxsw_core_skb_receive+0x4df/0x770 drivers/net/ethernet/mellanox/mlxsw/core.c:2063
 mlxsw_pci_cqe_rdq_handle drivers/net/ethernet/mellanox/mlxsw/pci.c:595 [inline]
 mlxsw_pci_cq_tasklet+0x12a6/0x2520 drivers/net/ethernet/mellanox/mlxsw/pci.c:651
 tasklet_action_common.isra.0+0x13f/0x3e0 kernel/softirq.c:550
 __do_softirq+0x223/0x964 kernel/softirq.c:292
 asm_call_on_stack+0x12/0x20 arch/x86/entry/entry_64.S:711

Allocated by task 1006:
 save_stack+0x1b/0x40 mm/kasan/common.c:48
 set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc mm/kasan/common.c:494 [inline]
 __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467
 slab_post_alloc_hook mm/slab.h:586 [inline]
 slab_alloc_node mm/slub.c:2824 [inline]
 slab_alloc mm/slub.c:2832 [inline]
 kmem_cache_alloc+0xcd/0x2e0 mm/slub.c:2837
 __build_skb+0x21/0x60 net/core/skbuff.c:311
 __netdev_alloc_skb+0x1e2/0x360 net/core/skbuff.c:464
 netdev_alloc_skb include/linux/skbuff.h:2810 [inline]
 mlxsw_emad_alloc drivers/net/ethernet/mellanox/mlxsw/core.c:756 [inline]
 mlxsw_emad_reg_access drivers/net/ethernet/mellanox/mlxsw/core.c:787 [inline]
 mlxsw_core_reg_access_emad+0x1ab/0x1420 drivers/net/ethernet/mellanox/mlxsw/core.c:1817
 mlxsw_reg_trans_query+0x39/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:1831
 mlxsw_sp_sb_pm_occ_clear drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:260 [inline]
 mlxsw_sp_sb_occ_max_clear+0xbff/0x10a0 drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:1365
 mlxsw_devlink_sb_occ_max_clear+0x76/0xb0 drivers/net/ethernet/mellanox/mlxsw/core.c:1037
 devlink_nl_cmd_sb_occ_max_clear_doit+0x1ec/0x280 net/core/devlink.c:1765
 genl_family_rcv_msg_doit net/netlink/genetlink.c:669 [inline]
 genl_family_rcv_msg net/netlink/genetlink.c:714 [inline]
 genl_rcv_msg+0x617/0x980 net/netlink/genetlink.c:731
 netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2470
 genl_rcv+0x24/0x40 net/netlink/genetlink.c:742
 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
 netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1330
 netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg+0x150/0x190 net/socket.c:671
 ____sys_sendmsg+0x6d8/0x840 net/socket.c:2359
 ___sys_sendmsg+0xff/0x170 net/socket.c:2413
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2446
 do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:384
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 73:
 save_stack+0x1b/0x40 mm/kasan/common.c:48
 set_track mm/kasan/common.c:56 [inline]
 kasan_set_free_info mm/kasan/common.c:316 [inline]
 __kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455
 slab_free_hook mm/slub.c:1474 [inline]
 slab_free_freelist_hook mm/slub.c:1507 [inline]
 slab_free mm/slub.c:3072 [inline]
 kmem_cache_free+0xbe/0x380 mm/slub.c:3088
 kfree_skbmem net/core/skbuff.c:622 [inline]
 kfree_skbmem+0xef/0x1b0 net/core/skbuff.c:616
 __kfree_skb net/core/skbuff.c:679 [inline]
 consume_skb net/core/skbuff.c:837 [inline]
 consume_skb+0xe1/0x370 net/core/skbuff.c:831
 mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592
 mlxsw_emad_transmit_retry.isra.0+0x9d/0xc0 drivers/net/ethernet/mellanox/mlxsw/core.c:613
 mlxsw_emad_trans_timeout_work+0x43/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:625
 process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269
 worker_thread+0x9e/0x1050 kernel/workqueue.c:2415
 kthread+0x355/0x470 kernel/kthread.c:291
 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293

The buggy address belongs to the object at ffff88804f5703c0
 which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 212 bytes inside of
 224-byte region [ffff88804f5703c0, ffff88804f5704a0)
The buggy address belongs to the page:
page:ffffea00013d5c00 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0
flags: 0x100000000000200(slab)
raw: 0100000000000200 dead000000000100 dead000000000122 ffff88806c625400
raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88804f570380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
 ffff88804f570400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88804f570480: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
                         ^
 ffff88804f570500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff88804f570580: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc

Fixes: caf7297 ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
sigmaris pushed a commit to sigmaris/linux that referenced this pull request Nov 7, 2020
[ Upstream commit 0daf2bf ]

Each EMAD transaction stores the skb used to issue the EMAD request
('trans->tx_skb') so that the request could be retried in case of a
timeout. The skb can be freed when a corresponding response is received
or as part of the retry logic (e.g., failed retransmit, exceeded maximum
number of retries).

The two tasks (i.e., response processing and retransmits) are
synchronized by the atomic 'trans->active' field which ensures that
responses to inactive transactions are ignored.

In case of a failed retransmit the transaction is finished and all of
its resources are freed. However, the current code does not mark it as
inactive. Syzkaller was able to hit a race condition in which a
concurrent response is processed while the transaction's resources are
being freed, resulting in a use-after-free [1].

Fix the issue by making sure to mark the transaction as inactive after a
failed retransmit and free its resources only if a concurrent task did
not already do that.

[1]
BUG: KASAN: use-after-free in consume_skb+0x30/0x370
net/core/skbuff.c:833
Read of size 4 at addr ffff88804f570494 by task syz-executor.0/1004

CPU: 0 PID: 1004 Comm: syz-executor.0 Not tainted 5.8.0-rc7+ raspberrypi#68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xf6/0x16e lib/dump_stack.c:118
 print_address_description.constprop.0+0x1c/0x250
mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 check_memory_region_inline mm/kasan/generic.c:186 [inline]
 check_memory_region+0x14e/0x1b0 mm/kasan/generic.c:192
 instrument_atomic_read include/linux/instrumented.h:56 [inline]
 atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
 refcount_read include/linux/refcount.h:147 [inline]
 skb_unref include/linux/skbuff.h:1044 [inline]
 consume_skb+0x30/0x370 net/core/skbuff.c:833
 mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592
 mlxsw_emad_process_response drivers/net/ethernet/mellanox/mlxsw/core.c:651 [inline]
 mlxsw_emad_rx_listener_func+0x5c9/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:672
 mlxsw_core_skb_receive+0x4df/0x770 drivers/net/ethernet/mellanox/mlxsw/core.c:2063
 mlxsw_pci_cqe_rdq_handle drivers/net/ethernet/mellanox/mlxsw/pci.c:595 [inline]
 mlxsw_pci_cq_tasklet+0x12a6/0x2520 drivers/net/ethernet/mellanox/mlxsw/pci.c:651
 tasklet_action_common.isra.0+0x13f/0x3e0 kernel/softirq.c:550
 __do_softirq+0x223/0x964 kernel/softirq.c:292
 asm_call_on_stack+0x12/0x20 arch/x86/entry/entry_64.S:711

Allocated by task 1006:
 save_stack+0x1b/0x40 mm/kasan/common.c:48
 set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc mm/kasan/common.c:494 [inline]
 __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467
 slab_post_alloc_hook mm/slab.h:586 [inline]
 slab_alloc_node mm/slub.c:2824 [inline]
 slab_alloc mm/slub.c:2832 [inline]
 kmem_cache_alloc+0xcd/0x2e0 mm/slub.c:2837
 __build_skb+0x21/0x60 net/core/skbuff.c:311
 __netdev_alloc_skb+0x1e2/0x360 net/core/skbuff.c:464
 netdev_alloc_skb include/linux/skbuff.h:2810 [inline]
 mlxsw_emad_alloc drivers/net/ethernet/mellanox/mlxsw/core.c:756 [inline]
 mlxsw_emad_reg_access drivers/net/ethernet/mellanox/mlxsw/core.c:787 [inline]
 mlxsw_core_reg_access_emad+0x1ab/0x1420 drivers/net/ethernet/mellanox/mlxsw/core.c:1817
 mlxsw_reg_trans_query+0x39/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:1831
 mlxsw_sp_sb_pm_occ_clear drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:260 [inline]
 mlxsw_sp_sb_occ_max_clear+0xbff/0x10a0 drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:1365
 mlxsw_devlink_sb_occ_max_clear+0x76/0xb0 drivers/net/ethernet/mellanox/mlxsw/core.c:1037
 devlink_nl_cmd_sb_occ_max_clear_doit+0x1ec/0x280 net/core/devlink.c:1765
 genl_family_rcv_msg_doit net/netlink/genetlink.c:669 [inline]
 genl_family_rcv_msg net/netlink/genetlink.c:714 [inline]
 genl_rcv_msg+0x617/0x980 net/netlink/genetlink.c:731
 netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2470
 genl_rcv+0x24/0x40 net/netlink/genetlink.c:742
 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
 netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1330
 netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg+0x150/0x190 net/socket.c:671
 ____sys_sendmsg+0x6d8/0x840 net/socket.c:2359
 ___sys_sendmsg+0xff/0x170 net/socket.c:2413
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2446
 do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:384
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 73:
 save_stack+0x1b/0x40 mm/kasan/common.c:48
 set_track mm/kasan/common.c:56 [inline]
 kasan_set_free_info mm/kasan/common.c:316 [inline]
 __kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455
 slab_free_hook mm/slub.c:1474 [inline]
 slab_free_freelist_hook mm/slub.c:1507 [inline]
 slab_free mm/slub.c:3072 [inline]
 kmem_cache_free+0xbe/0x380 mm/slub.c:3088
 kfree_skbmem net/core/skbuff.c:622 [inline]
 kfree_skbmem+0xef/0x1b0 net/core/skbuff.c:616
 __kfree_skb net/core/skbuff.c:679 [inline]
 consume_skb net/core/skbuff.c:837 [inline]
 consume_skb+0xe1/0x370 net/core/skbuff.c:831
 mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592
 mlxsw_emad_transmit_retry.isra.0+0x9d/0xc0 drivers/net/ethernet/mellanox/mlxsw/core.c:613
 mlxsw_emad_trans_timeout_work+0x43/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:625
 process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269
 worker_thread+0x9e/0x1050 kernel/workqueue.c:2415
 kthread+0x355/0x470 kernel/kthread.c:291
 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293

The buggy address belongs to the object at ffff88804f5703c0
 which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 212 bytes inside of
 224-byte region [ffff88804f5703c0, ffff88804f5704a0)
The buggy address belongs to the page:
page:ffffea00013d5c00 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0
flags: 0x100000000000200(slab)
raw: 0100000000000200 dead000000000100 dead000000000122 ffff88806c625400
raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88804f570380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
 ffff88804f570400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88804f570480: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
                         ^
 ffff88804f570500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff88804f570580: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc

Fixes: caf7297 ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this pull request Nov 9, 2020
[ Upstream commit 0daf2bf ]

Each EMAD transaction stores the skb used to issue the EMAD request
('trans->tx_skb') so that the request could be retried in case of a
timeout. The skb can be freed when a corresponding response is received
or as part of the retry logic (e.g., failed retransmit, exceeded maximum
number of retries).

The two tasks (i.e., response processing and retransmits) are
synchronized by the atomic 'trans->active' field which ensures that
responses to inactive transactions are ignored.

In case of a failed retransmit the transaction is finished and all of
its resources are freed. However, the current code does not mark it as
inactive. Syzkaller was able to hit a race condition in which a
concurrent response is processed while the transaction's resources are
being freed, resulting in a use-after-free [1].

Fix the issue by making sure to mark the transaction as inactive after a
failed retransmit and free its resources only if a concurrent task did
not already do that.

[1]
BUG: KASAN: use-after-free in consume_skb+0x30/0x370
net/core/skbuff.c:833
Read of size 4 at addr ffff88804f570494 by task syz-executor.0/1004

CPU: 0 PID: 1004 Comm: syz-executor.0 Not tainted 5.8.0-rc7+ #68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xf6/0x16e lib/dump_stack.c:118
 print_address_description.constprop.0+0x1c/0x250
mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 check_memory_region_inline mm/kasan/generic.c:186 [inline]
 check_memory_region+0x14e/0x1b0 mm/kasan/generic.c:192
 instrument_atomic_read include/linux/instrumented.h:56 [inline]
 atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
 refcount_read include/linux/refcount.h:147 [inline]
 skb_unref include/linux/skbuff.h:1044 [inline]
 consume_skb+0x30/0x370 net/core/skbuff.c:833
 mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592
 mlxsw_emad_process_response drivers/net/ethernet/mellanox/mlxsw/core.c:651 [inline]
 mlxsw_emad_rx_listener_func+0x5c9/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:672
 mlxsw_core_skb_receive+0x4df/0x770 drivers/net/ethernet/mellanox/mlxsw/core.c:2063
 mlxsw_pci_cqe_rdq_handle drivers/net/ethernet/mellanox/mlxsw/pci.c:595 [inline]
 mlxsw_pci_cq_tasklet+0x12a6/0x2520 drivers/net/ethernet/mellanox/mlxsw/pci.c:651
 tasklet_action_common.isra.0+0x13f/0x3e0 kernel/softirq.c:550
 __do_softirq+0x223/0x964 kernel/softirq.c:292
 asm_call_on_stack+0x12/0x20 arch/x86/entry/entry_64.S:711

Allocated by task 1006:
 save_stack+0x1b/0x40 mm/kasan/common.c:48
 set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc mm/kasan/common.c:494 [inline]
 __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467
 slab_post_alloc_hook mm/slab.h:586 [inline]
 slab_alloc_node mm/slub.c:2824 [inline]
 slab_alloc mm/slub.c:2832 [inline]
 kmem_cache_alloc+0xcd/0x2e0 mm/slub.c:2837
 __build_skb+0x21/0x60 net/core/skbuff.c:311
 __netdev_alloc_skb+0x1e2/0x360 net/core/skbuff.c:464
 netdev_alloc_skb include/linux/skbuff.h:2810 [inline]
 mlxsw_emad_alloc drivers/net/ethernet/mellanox/mlxsw/core.c:756 [inline]
 mlxsw_emad_reg_access drivers/net/ethernet/mellanox/mlxsw/core.c:787 [inline]
 mlxsw_core_reg_access_emad+0x1ab/0x1420 drivers/net/ethernet/mellanox/mlxsw/core.c:1817
 mlxsw_reg_trans_query+0x39/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:1831
 mlxsw_sp_sb_pm_occ_clear drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:260 [inline]
 mlxsw_sp_sb_occ_max_clear+0xbff/0x10a0 drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:1365
 mlxsw_devlink_sb_occ_max_clear+0x76/0xb0 drivers/net/ethernet/mellanox/mlxsw/core.c:1037
 devlink_nl_cmd_sb_occ_max_clear_doit+0x1ec/0x280 net/core/devlink.c:1765
 genl_family_rcv_msg_doit net/netlink/genetlink.c:669 [inline]
 genl_family_rcv_msg net/netlink/genetlink.c:714 [inline]
 genl_rcv_msg+0x617/0x980 net/netlink/genetlink.c:731
 netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2470
 genl_rcv+0x24/0x40 net/netlink/genetlink.c:742
 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
 netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1330
 netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg+0x150/0x190 net/socket.c:671
 ____sys_sendmsg+0x6d8/0x840 net/socket.c:2359
 ___sys_sendmsg+0xff/0x170 net/socket.c:2413
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2446
 do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:384
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 73:
 save_stack+0x1b/0x40 mm/kasan/common.c:48
 set_track mm/kasan/common.c:56 [inline]
 kasan_set_free_info mm/kasan/common.c:316 [inline]
 __kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455
 slab_free_hook mm/slub.c:1474 [inline]
 slab_free_freelist_hook mm/slub.c:1507 [inline]
 slab_free mm/slub.c:3072 [inline]
 kmem_cache_free+0xbe/0x380 mm/slub.c:3088
 kfree_skbmem net/core/skbuff.c:622 [inline]
 kfree_skbmem+0xef/0x1b0 net/core/skbuff.c:616
 __kfree_skb net/core/skbuff.c:679 [inline]
 consume_skb net/core/skbuff.c:837 [inline]
 consume_skb+0xe1/0x370 net/core/skbuff.c:831
 mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592
 mlxsw_emad_transmit_retry.isra.0+0x9d/0xc0 drivers/net/ethernet/mellanox/mlxsw/core.c:613
 mlxsw_emad_trans_timeout_work+0x43/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:625
 process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269
 worker_thread+0x9e/0x1050 kernel/workqueue.c:2415
 kthread+0x355/0x470 kernel/kthread.c:291
 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293

The buggy address belongs to the object at ffff88804f5703c0
 which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 212 bytes inside of
 224-byte region [ffff88804f5703c0, ffff88804f5704a0)
The buggy address belongs to the page:
page:ffffea00013d5c00 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0
flags: 0x100000000000200(slab)
raw: 0100000000000200 dead000000000100 dead000000000122 ffff88806c625400
raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88804f570380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
 ffff88804f570400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88804f570480: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
                         ^
 ffff88804f570500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff88804f570580: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc

Fixes: caf7297 ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mripard pushed a commit to mripard/rpi-linux that referenced this pull request Oct 27, 2021
…tible()

Using wait_event_interruptible() to wait for complete transmission,
but do not check the result of wait_event_interruptible() which can be
interrupted. It will result in TX buffer has multiple accessors and
the later process interferes with the previous process.

Following is one of the problems reported by syzbot.

=============================================================
WARNING: CPU: 0 PID: 0 at net/can/isotp.c:840 isotp_tx_timer_handler+0x2e0/0x4c0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.13.0-rc7+ raspberrypi#68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
RIP: 0010:isotp_tx_timer_handler+0x2e0/0x4c0
Call Trace:
 <IRQ>
 ? isotp_setsockopt+0x390/0x390
 __hrtimer_run_queues+0xb8/0x610
 hrtimer_run_softirq+0x91/0xd0
 ? rcu_read_lock_sched_held+0x4d/0x80
 __do_softirq+0xe8/0x553
 irq_exit_rcu+0xf8/0x100
 sysvec_apic_timer_interrupt+0x9e/0xc0
 </IRQ>
 asm_sysvec_apic_timer_interrupt+0x12/0x20

Add result check for wait_event_interruptible() in isotp_sendmsg()
to avoid multiple accessers for tx buffer.

Fixes: e057dd3 ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/all/10ca695732c9dd267c76a3c30f37aefe1ff7e32f.1633764159.git.william.xuanziyang@huawei.com
Cc: stable@vger.kernel.org
Reported-by: syzbot+78bab6958a614b0c80b9@syzkaller.appspotmail.com
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
popcornmix pushed a commit that referenced this pull request Oct 28, 2021
…tible()

commit 9acf636 upstream.

Using wait_event_interruptible() to wait for complete transmission,
but do not check the result of wait_event_interruptible() which can be
interrupted. It will result in TX buffer has multiple accessors and
the later process interferes with the previous process.

Following is one of the problems reported by syzbot.

=============================================================
WARNING: CPU: 0 PID: 0 at net/can/isotp.c:840 isotp_tx_timer_handler+0x2e0/0x4c0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.13.0-rc7+ #68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
RIP: 0010:isotp_tx_timer_handler+0x2e0/0x4c0
Call Trace:
 <IRQ>
 ? isotp_setsockopt+0x390/0x390
 __hrtimer_run_queues+0xb8/0x610
 hrtimer_run_softirq+0x91/0xd0
 ? rcu_read_lock_sched_held+0x4d/0x80
 __do_softirq+0xe8/0x553
 irq_exit_rcu+0xf8/0x100
 sysvec_apic_timer_interrupt+0x9e/0xc0
 </IRQ>
 asm_sysvec_apic_timer_interrupt+0x12/0x20

Add result check for wait_event_interruptible() in isotp_sendmsg()
to avoid multiple accessers for tx buffer.

Fixes: e057dd3 ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/all/10ca695732c9dd267c76a3c30f37aefe1ff7e32f.1633764159.git.william.xuanziyang@huawei.com
Cc: stable@vger.kernel.org
Reported-by: syzbot+78bab6958a614b0c80b9@syzkaller.appspotmail.com
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
abihf pushed a commit to abihf/linux that referenced this pull request Oct 29, 2021
…tible()

commit 9acf636 upstream.

Using wait_event_interruptible() to wait for complete transmission,
but do not check the result of wait_event_interruptible() which can be
interrupted. It will result in TX buffer has multiple accessors and
the later process interferes with the previous process.

Following is one of the problems reported by syzbot.

=============================================================
WARNING: CPU: 0 PID: 0 at net/can/isotp.c:840 isotp_tx_timer_handler+0x2e0/0x4c0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.13.0-rc7+ raspberrypi#68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
RIP: 0010:isotp_tx_timer_handler+0x2e0/0x4c0
Call Trace:
 <IRQ>
 ? isotp_setsockopt+0x390/0x390
 __hrtimer_run_queues+0xb8/0x610
 hrtimer_run_softirq+0x91/0xd0
 ? rcu_read_lock_sched_held+0x4d/0x80
 __do_softirq+0xe8/0x553
 irq_exit_rcu+0xf8/0x100
 sysvec_apic_timer_interrupt+0x9e/0xc0
 </IRQ>
 asm_sysvec_apic_timer_interrupt+0x12/0x20

Add result check for wait_event_interruptible() in isotp_sendmsg()
to avoid multiple accessers for tx buffer.

Fixes: e057dd3 ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/all/10ca695732c9dd267c76a3c30f37aefe1ff7e32f.1633764159.git.william.xuanziyang@huawei.com
Cc: stable@vger.kernel.org
Reported-by: syzbot+78bab6958a614b0c80b9@syzkaller.appspotmail.com
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xukuohai pushed a commit to xukuohai/linux-raspberry-pi that referenced this pull request May 9, 2022
Add bpf trampoline support for arm64. Most of the logic is the same as
x86.

fentry before bpf trampoline hooked:
 mov x9, x30
 nop

fentry after bpf trampoline hooked:
 mov x9, x30
 bl  <bpf_trampoline>

Tested on qemu, result:
 raspberrypi#18  bpf_tcp_ca:OK
 raspberrypi#51  dummy_st_ops:OK
 raspberrypi#55  fentry_fexit:OK
 raspberrypi#56  fentry_test:OK
 raspberrypi#57  fexit_bpf2bpf:OK
 raspberrypi#58  fexit_sleep:OK
 raspberrypi#59  fexit_stress:OK
 raspberrypi#60  fexit_test:OK
 raspberrypi#67  get_func_args_test:OK
 raspberrypi#68  get_func_ip_test:OK
 raspberrypi#101 modify_return:OK
 raspberrypi#233 xdp_bpf2bpf:OK

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
xukuohai pushed a commit to xukuohai/linux-raspberry-pi that referenced this pull request May 12, 2022
Add bpf trampoline support for arm64. Most of the logic is the same as
x86.

fentry before bpf trampoline hooked:
 mov x9, x30
 nop

fentry after bpf trampoline hooked:
 mov x9, x30
 bl  <bpf_trampoline>

Tested on qemu, result:
 raspberrypi#18  bpf_tcp_ca:OK
 raspberrypi#51  dummy_st_ops:OK
 raspberrypi#55  fentry_fexit:OK
 raspberrypi#56  fentry_test:OK
 raspberrypi#57  fexit_bpf2bpf:OK
 raspberrypi#58  fexit_sleep:OK
 raspberrypi#59  fexit_stress:OK
 raspberrypi#60  fexit_test:OK
 raspberrypi#67  get_func_args_test:OK
 raspberrypi#68  get_func_ip_test:OK
 raspberrypi#101 modify_return:OK
 raspberrypi#233 xdp_bpf2bpf:OK

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
popcornmix pushed a commit that referenced this pull request May 30, 2023
When using DMA mode we are facing with Oops:
[  396.458157] Unable to handle kernel access to user memory without uaccess routines at virtual address 000000000000000c
[  396.469374] Oops [#1]
[  396.471839] Modules linked in:
[  396.475144] CPU: 0 PID: 114 Comm: arecord Not tainted 6.0.0-00164-g9a8eccdaf2be-dirty #68
[  396.483619] Hardware name: YMP ELCT FPGA (DT)
[  396.488156] epc : dmaengine_pcm_open+0x1d2/0x342
[  396.493227]  ra : dmaengine_pcm_open+0x1d2/0x342
[  396.498140] epc : ffffffff807fe346 ra : ffffffff807fe346 sp : ffffffc804e138f0
[  396.505602]  gp : ffffffff817bf730 tp : ffffffd8042c8ac0 t0 : 6500000000000000
[  396.513045]  t1 : 0000000000000064 t2 : 656e69676e65616d s0 : ffffffc804e13990
[  396.520477]  s1 : ffffffd801b86a18 a0 : 0000000000000026 a1 : ffffffff816920f8
[  396.527897]  a2 : 0000000000000010 a3 : fffffffffffffffe a4 : 0000000000000000
[  396.535319]  a5 : 0000000000000000 a6 : ffffffd801b87040 a7 : 0000000000000038
[  396.542740]  s2 : ffffffd801b94a00 s3 : 0000000000000000 s4 : ffffffd80427f5e8
[  396.550153]  s5 : ffffffd80427f5e8 s6 : ffffffd801b44410 s7 : fffffffffffffff5
[  396.557569]  s8 : 0000000000000800 s9 : 0000000000000001 s10: ffffffff8066d254
[  396.564978]  s11: ffffffd8059cf768 t3 : ffffffff817d5577 t4 : ffffffff817d5577
[  396.572391]  t5 : ffffffff817d5578 t6 : ffffffc804e136e8
[  396.577876] status: 0000000200000120 badaddr: 000000000000000c cause: 000000000000000d
[  396.586007] [<ffffffff806839f4>] snd_soc_component_open+0x1a/0x68
[  396.592439] [<ffffffff807fdd62>] __soc_pcm_open+0xf0/0x502
[  396.598217] [<ffffffff80685d86>] soc_pcm_open+0x2e/0x4e
[  396.603741] [<ffffffff8066cea4>] snd_pcm_open_substream+0x442/0x68e
[  396.610313] [<ffffffff8066d1ea>] snd_pcm_open+0xfa/0x212
[  396.615868] [<ffffffff8066d39c>] snd_pcm_capture_open+0x3a/0x60
[  396.622048] [<ffffffff8065b35a>] snd_open+0xa8/0x17a
[  396.627421] [<ffffffff801ae036>] chrdev_open+0xa0/0x218
[  396.632893] [<ffffffff801a5a28>] do_dentry_open+0x17c/0x2a6
[  396.638713] [<ffffffff801a6d9a>] vfs_open+0x1e/0x26
[  396.643850] [<ffffffff801b8544>] path_openat+0x96e/0xc96
[  396.649518] [<ffffffff801b9390>] do_filp_open+0x7c/0xf6
[  396.655034] [<ffffffff801a6ff2>] do_sys_openat2+0x8a/0x11e
[  396.660765] [<ffffffff801a735a>] sys_openat+0x50/0x7c
[  396.666068] [<ffffffff80003aca>] ret_from_syscall+0x0/0x2
[  396.674964] ---[ end trace 0000000000000000 ]---

It happens because of play_dma_data/capture_dma_data pointers are NULL.
Current implementation assigns these pointers at snd_soc_dai_driver
startup() callback and reset them back to NULL at shutdown(). But
soc_pcm_open() sequence uses DMA pointers in dmaengine_pcm_open()
before snd_soc_dai_driver startup().
Most generic DMA capable I2S drivers use snd_soc_dai_driver probe()
callback to init DMA pointers only once at probe. So move DMA init
to dw_i2s_dai_probe and drop shutdown() and startup() callbacks.

Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Link: https://lore.kernel.org/r/20230512110343.66664-1-fido_max@inbox.ru
Signed-off-by: Mark Brown <broonie@kernel.org>
popcornmix pushed a commit that referenced this pull request Jun 22, 2023
[ Upstream commit 011a871 ]

When using DMA mode we are facing with Oops:
[  396.458157] Unable to handle kernel access to user memory without uaccess routines at virtual address 000000000000000c
[  396.469374] Oops [#1]
[  396.471839] Modules linked in:
[  396.475144] CPU: 0 PID: 114 Comm: arecord Not tainted 6.0.0-00164-g9a8eccdaf2be-dirty #68
[  396.483619] Hardware name: YMP ELCT FPGA (DT)
[  396.488156] epc : dmaengine_pcm_open+0x1d2/0x342
[  396.493227]  ra : dmaengine_pcm_open+0x1d2/0x342
[  396.498140] epc : ffffffff807fe346 ra : ffffffff807fe346 sp : ffffffc804e138f0
[  396.505602]  gp : ffffffff817bf730 tp : ffffffd8042c8ac0 t0 : 6500000000000000
[  396.513045]  t1 : 0000000000000064 t2 : 656e69676e65616d s0 : ffffffc804e13990
[  396.520477]  s1 : ffffffd801b86a18 a0 : 0000000000000026 a1 : ffffffff816920f8
[  396.527897]  a2 : 0000000000000010 a3 : fffffffffffffffe a4 : 0000000000000000
[  396.535319]  a5 : 0000000000000000 a6 : ffffffd801b87040 a7 : 0000000000000038
[  396.542740]  s2 : ffffffd801b94a00 s3 : 0000000000000000 s4 : ffffffd80427f5e8
[  396.550153]  s5 : ffffffd80427f5e8 s6 : ffffffd801b44410 s7 : fffffffffffffff5
[  396.557569]  s8 : 0000000000000800 s9 : 0000000000000001 s10: ffffffff8066d254
[  396.564978]  s11: ffffffd8059cf768 t3 : ffffffff817d5577 t4 : ffffffff817d5577
[  396.572391]  t5 : ffffffff817d5578 t6 : ffffffc804e136e8
[  396.577876] status: 0000000200000120 badaddr: 000000000000000c cause: 000000000000000d
[  396.586007] [<ffffffff806839f4>] snd_soc_component_open+0x1a/0x68
[  396.592439] [<ffffffff807fdd62>] __soc_pcm_open+0xf0/0x502
[  396.598217] [<ffffffff80685d86>] soc_pcm_open+0x2e/0x4e
[  396.603741] [<ffffffff8066cea4>] snd_pcm_open_substream+0x442/0x68e
[  396.610313] [<ffffffff8066d1ea>] snd_pcm_open+0xfa/0x212
[  396.615868] [<ffffffff8066d39c>] snd_pcm_capture_open+0x3a/0x60
[  396.622048] [<ffffffff8065b35a>] snd_open+0xa8/0x17a
[  396.627421] [<ffffffff801ae036>] chrdev_open+0xa0/0x218
[  396.632893] [<ffffffff801a5a28>] do_dentry_open+0x17c/0x2a6
[  396.638713] [<ffffffff801a6d9a>] vfs_open+0x1e/0x26
[  396.643850] [<ffffffff801b8544>] path_openat+0x96e/0xc96
[  396.649518] [<ffffffff801b9390>] do_filp_open+0x7c/0xf6
[  396.655034] [<ffffffff801a6ff2>] do_sys_openat2+0x8a/0x11e
[  396.660765] [<ffffffff801a735a>] sys_openat+0x50/0x7c
[  396.666068] [<ffffffff80003aca>] ret_from_syscall+0x0/0x2
[  396.674964] ---[ end trace 0000000000000000 ]---

It happens because of play_dma_data/capture_dma_data pointers are NULL.
Current implementation assigns these pointers at snd_soc_dai_driver
startup() callback and reset them back to NULL at shutdown(). But
soc_pcm_open() sequence uses DMA pointers in dmaengine_pcm_open()
before snd_soc_dai_driver startup().
Most generic DMA capable I2S drivers use snd_soc_dai_driver probe()
callback to init DMA pointers only once at probe. So move DMA init
to dw_i2s_dai_probe and drop shutdown() and startup() callbacks.

Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Link: https://lore.kernel.org/r/20230512110343.66664-1-fido_max@inbox.ru
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this pull request Jun 23, 2023
[ Upstream commit 011a871 ]

When using DMA mode we are facing with Oops:
[  396.458157] Unable to handle kernel access to user memory without uaccess routines at virtual address 000000000000000c
[  396.469374] Oops [#1]
[  396.471839] Modules linked in:
[  396.475144] CPU: 0 PID: 114 Comm: arecord Not tainted 6.0.0-00164-g9a8eccdaf2be-dirty #68
[  396.483619] Hardware name: YMP ELCT FPGA (DT)
[  396.488156] epc : dmaengine_pcm_open+0x1d2/0x342
[  396.493227]  ra : dmaengine_pcm_open+0x1d2/0x342
[  396.498140] epc : ffffffff807fe346 ra : ffffffff807fe346 sp : ffffffc804e138f0
[  396.505602]  gp : ffffffff817bf730 tp : ffffffd8042c8ac0 t0 : 6500000000000000
[  396.513045]  t1 : 0000000000000064 t2 : 656e69676e65616d s0 : ffffffc804e13990
[  396.520477]  s1 : ffffffd801b86a18 a0 : 0000000000000026 a1 : ffffffff816920f8
[  396.527897]  a2 : 0000000000000010 a3 : fffffffffffffffe a4 : 0000000000000000
[  396.535319]  a5 : 0000000000000000 a6 : ffffffd801b87040 a7 : 0000000000000038
[  396.542740]  s2 : ffffffd801b94a00 s3 : 0000000000000000 s4 : ffffffd80427f5e8
[  396.550153]  s5 : ffffffd80427f5e8 s6 : ffffffd801b44410 s7 : fffffffffffffff5
[  396.557569]  s8 : 0000000000000800 s9 : 0000000000000001 s10: ffffffff8066d254
[  396.564978]  s11: ffffffd8059cf768 t3 : ffffffff817d5577 t4 : ffffffff817d5577
[  396.572391]  t5 : ffffffff817d5578 t6 : ffffffc804e136e8
[  396.577876] status: 0000000200000120 badaddr: 000000000000000c cause: 000000000000000d
[  396.586007] [<ffffffff806839f4>] snd_soc_component_open+0x1a/0x68
[  396.592439] [<ffffffff807fdd62>] __soc_pcm_open+0xf0/0x502
[  396.598217] [<ffffffff80685d86>] soc_pcm_open+0x2e/0x4e
[  396.603741] [<ffffffff8066cea4>] snd_pcm_open_substream+0x442/0x68e
[  396.610313] [<ffffffff8066d1ea>] snd_pcm_open+0xfa/0x212
[  396.615868] [<ffffffff8066d39c>] snd_pcm_capture_open+0x3a/0x60
[  396.622048] [<ffffffff8065b35a>] snd_open+0xa8/0x17a
[  396.627421] [<ffffffff801ae036>] chrdev_open+0xa0/0x218
[  396.632893] [<ffffffff801a5a28>] do_dentry_open+0x17c/0x2a6
[  396.638713] [<ffffffff801a6d9a>] vfs_open+0x1e/0x26
[  396.643850] [<ffffffff801b8544>] path_openat+0x96e/0xc96
[  396.649518] [<ffffffff801b9390>] do_filp_open+0x7c/0xf6
[  396.655034] [<ffffffff801a6ff2>] do_sys_openat2+0x8a/0x11e
[  396.660765] [<ffffffff801a735a>] sys_openat+0x50/0x7c
[  396.666068] [<ffffffff80003aca>] ret_from_syscall+0x0/0x2
[  396.674964] ---[ end trace 0000000000000000 ]---

It happens because of play_dma_data/capture_dma_data pointers are NULL.
Current implementation assigns these pointers at snd_soc_dai_driver
startup() callback and reset them back to NULL at shutdown(). But
soc_pcm_open() sequence uses DMA pointers in dmaengine_pcm_open()
before snd_soc_dai_driver startup().
Most generic DMA capable I2S drivers use snd_soc_dai_driver probe()
callback to init DMA pointers only once at probe. So move DMA init
to dw_i2s_dai_probe and drop shutdown() and startup() callbacks.

Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Link: https://lore.kernel.org/r/20230512110343.66664-1-fido_max@inbox.ru
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
0lxb pushed a commit to 0lxb/rpi_linux that referenced this pull request Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants