Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hsu dma optimize #34

Closed
wants to merge 50 commits into from
Closed

Hsu dma optimize #34

wants to merge 50 commits into from

Conversation

htot
Copy link

@htot htot commented May 3, 2021

I cleaned up "arm rx dma.." and hope it is now more clear. Also I enabled on all hsu ports that have dma enabled.
On the transmit side "use linear buffer..." the dma transmit still gets split up when the transfer is longer then 2048 (I see an additional interrupt occurring if I toggle a gpio), but I don't know what causes that. It seems to do no harm.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
This one is found on Intel Elkhart Lake and
Intel Galileo boards (Intel Quark).

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Additionally enables the following drivers:
	I2C_DESIGNWARE_PCI
	GPIO_MERRIFIELD
	INTEL_MID_WATCHDOG
	INTEL_MRFLD_PWRBTN
	INTEL_SOC_PMIC_MRFLD
	USB_CHIPIDEA
	PINCTRL_MERRIFIELD

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
This enables:
	8250_DW
	DW_DMAC, DMATEST, and DMA_API_DEBUG
	PINCTRL_BAYTRAIL, PINCTRL_CHERRYVIEW, PINCTRL_LYNXPOINT
	I2C_DESIGNWARE
	MMC_SDHCI
	SPI_PXA2XX
	PWM_LPSS

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Also enables corresponding pin control drivers

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
	USB_EG20T
	IE6XX_WDT
	GPIO_SCH
	MFD_INTEL_QUARK_I2C_GPIO

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
For my Intel Edison work.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Edison's SoC uses these endpoints for tracing and they shouldn't
be used for normal transfers, we need to skip them.

This hack will be reverted once we have a proper binding for
describing such endpoints.

• 1 High BW Bulk IN (IN#1) (RTIT)
• 1 1KB BW Bulk IN (IN#8)+ 1 1KB BW Bulk OUT (Run Control) (OUT#8)

Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
@htot htot marked this pull request as draft May 3, 2021 19:59
@htot htot marked this pull request as ready for review May 3, 2021 20:00
Copy link
Owner

@andy-shev andy-shev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the commit message capitalize and fix all references accordingly. But the main question, why it's 8250_mid specific change? Shouldn't it be done on 8250_dma level?

drivers/tty/serial/8250/8250_mid.c Outdated Show resolved Hide resolved
drivers/tty/serial/8250/8250_dma.c Outdated Show resolved Hide resolved
drivers/tty/serial/8250/8250_mid.c Outdated Show resolved Hide resolved
Copy link
Owner

@andy-shev andy-shev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides similar questions to the commit message (style related), what you are doing is called bounce buffers. So, the question is, can we rather prepare SG list out of two descriptors (one for tail and another for the beginning of the buffer if needed) and let DMA driver to handle it?

drivers/tty/serial/8250/8250_dma.c Outdated Show resolved Hide resolved
@htot
Copy link
Author

htot commented May 4, 2021

In the commit message capitalize and fix all references accordingly. But the main question, why it's 8250_mid specific change? Shouldn't it be done on 8250_dma level?

In principle this could apply to all uarts with dma support. The idea is to eliminate the uart interrupt at the beginning of a message that sets up dma. Of course dma interrupt at the end of message remains. So, if 2 messages come too quickly dma will not be setup in time for the 2nd. This is why interchar gaps on the transmitting end must be prevented, > 4 char gap will cause dma interrupt, but the handler will be too late to setup for the next (depending on rx fifo size and trig).

But the question is how the dma handles waiting for the first char (it times out after not receiving anything for 4 char times, but only after receiving a first char). There are so many different uarts, I didn't dare to touch code affecting all. But you are more brave then me - and know better what you are doing.

Remember when I wrote this code, it caused kernel to hang, due to the uart (all uarts including the console) hardware locking up. I couldn't find why. Now, I didn't change anything, the same code works. Somewhere some hardware initialized differently? And even now we still have to set /dev/cpu_dma_latency to 0. PM related stuff I don't really understand.

@htot
Copy link
Author

htot commented May 4, 2021

Besides similar questions to the commit message (style related), what you are doing is called bounce buffers. So, the question is, can we rather prepare SG list out of two descriptors (one for tail and another for the beginning of the buffer if needed) and let DMA driver to handle it?

Afaiu SG is implemented by auto setting up dma after each descriptor completes, but I'm not sure if that is done in hardware (i.e. does the dma hold a list of pointers it self, or is it do in software in the dma interrupt handler). As it is now, the whole transmit message is a single string that is send in one dma transfer. There is no interchar gap on transmit.

But, I did see for > 2048 transfers the message did get split into 2 causing an additional dma interrupt and potentially a gap. This is still not clear to me where the 2048 limit comes from.

htot and others added 3 commits May 4, 2021 22:24
8250_dma used the circular xmit->buf as DMA output buffer. This causes messages that wrap around
in the circular buffer to be transmitted using 2 DMA transfers. Depending on baud rate and processor
load this can cause an interchar gap in the middle of the message. On the receiving end the gap
may cause a short receive timeout, possibly long enough to terminate a DMA transfer, but too short
to restart a receive DMA transfer in time thus causing a receive buffer overrun.
Fix this but creating a linear tx_buffer and copying all of xmit->buf into it.

Signed-off-by: Ferry Toth <ftoth@exalondelft.nl>
When the port is receiving huge data flow it would be nice to reduce latency
for RX DMA. Do this by calling serial8250_rx_dma() directly from completion
callback.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ferry Toth <ftoth@exalondelft.nl>
Instead of initiating Rx DMA when the first char arrives in the UART, arm Rx DMA
continously. DMA transfers automatically empty the Rx FIFO, preventing overruns.
As before, transfer terminates when the DMA buffer is full or when a 4 char
interchar gap is received (timeout). After timeout we arm immediately the DMA again.

Signed-off-by: Ferry Toth <ftoth@exalondelft.nl>
@andy-shev
Copy link
Owner

andy-shev commented May 4, 2021

In the commit message capitalize and fix all references accordingly. But the main question, why it's 8250_mid specific change? Shouldn't it be done on 8250_dma level?

In principle this could apply to all uarts with dma support. The idea is to eliminate the uart interrupt at the beginning of a message that sets up dma. Of course dma interrupt at the end of message remains. So, if 2 messages come too quickly dma will not be setup in time for the 2nd. This is why interchar gaps on the transmitting end must be prevented, > 4 char gap will cause dma interrupt, but the handler will be too late to setup for the next (depending on rx fifo size and trig).

But the question is how the dma handles waiting for the first char (it times out after not receiving anything for 4 char times, but only after receiving a first char). There are so many different uarts, I didn't dare to touch code affecting all. But you are more brave then me - and know better what you are doing.

I guess it makes sense to submit this for all 8250 compatible ports that are using a generic 8250_dma (mostly Intel).

Remember when I wrote this code, it caused kernel to hang, due to the uart (all uarts including the console) hardware locking up. I couldn't find why. Now, I didn't change anything, the same code works. Somewhere some hardware initialized differently? And even now we still have to set /dev/cpu_dma_latency to 0. PM related stuff I don't really understand.

Yes, PM stuff is always needed for UART.

@andy-shev
Copy link
Owner

Besides similar questions to the commit message (style related), what you are doing is called bounce buffers. So, the question is, can we rather prepare SG list out of two descriptors (one for tail and another for the beginning of the buffer if needed) and let DMA driver to handle it?

Afaiu SG is implemented by auto setting up dma after each descriptor completes, but I'm not sure if that is done in hardware (i.e. does the dma hold a list of pointers it self, or is it do in software in the dma interrupt handler). As it is now, the whole transmit message is a single string that is send in one dma transfer. There is no interchar gap on transmit.

HSU DMA supports up to 4 hw descriptors, each of up to 128k size. Other DMA are even better in this sense. So, I would rather see this implemented as SG list out of 1 or two elements (depending if we have the buffer turned over or not).

But, I did see for > 2048 transfers the message did get split into 2 causing an additional dma interrupt and potentially a gap. This is still not clear to me where the 2048 limit comes from.

Isn't it abvious? The buffer size is one page,m i.e. 4k. 2k is a half and any size bigger the half has a chance to be turned over the circular buffer. That's why we got a split.

@htot
Copy link
Author

htot commented May 5, 2021

Isn't it abvious? The buffer size is one page,m i.e. 4k. 2k is a half and any size bigger the half has a chance to be turned over the circular buffer. That's why we got a split.

No, I mean after linearizing. So we send 1 larger then 2k DMA transfer, t seems still to be split into 2 parts. As if there is a limit in the DMA subsystem?

Signed-off-by: Ferry Toth <ftoth@exalondelft.nl>
@htot htot marked this pull request as draft May 11, 2021 21:15
@htot htot force-pushed the hsu_dma_optimize branch 3 times, most recently from 90236dd to 174a779 Compare May 14, 2021 13:00
@andy-shev andy-shev force-pushed the eds-acpi branch 2 times, most recently from 1ecc0ad to 04124f6 Compare May 19, 2021 08:20
Signed-off-by: Ferry Toth <ftoth@exalondelft.nl>
@htot htot closed this May 25, 2021
@htot htot deleted the hsu_dma_optimize branch May 25, 2021 14:16
htot pushed a commit to htot/linux that referenced this pull request Feb 3, 2022
The function obtain the next buffer without boundary check.
We should return with I/O error code.

The bug is found by fuzzing and the crash report is attached.
It is an OOB bug although reported as use-after-free.

[    4.804724] BUG: KASAN: use-after-free in aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.805661] Read of size 4 at addr ffff888034fe93a8 by task ksoftirqd/0/9
[    4.806505]
[    4.806703] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G        W         5.6.0 andy-shev#34
[    4.809030] Call Trace:
[    4.809343]  dump_stack+0x76/0xa0
[    4.809755]  print_address_description.constprop.0+0x16/0x200
[    4.810455]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.811234]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.813183]  __kasan_report.cold+0x37/0x7c
[    4.813715]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.814393]  kasan_report+0xe/0x20
[    4.814837]  aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.815499]  ? hw_atl_b0_hw_ring_rx_receive+0x9a5/0xb90 [atlantic]
[    4.816290]  aq_vec_poll+0x179/0x5d0 [atlantic]
[    4.816870]  ? _GLOBAL__sub_I_65535_1_aq_pci_func_init+0x20/0x20 [atlantic]
[    4.817746]  ? __next_timer_interrupt+0xba/0xf0
[    4.818322]  net_rx_action+0x363/0xbd0
[    4.818803]  ? call_timer_fn+0x240/0x240
[    4.819302]  ? __switch_to_asm+0x40/0x70
[    4.819809]  ? napi_busy_loop+0x520/0x520
[    4.820324]  __do_softirq+0x18c/0x634
[    4.820797]  ? takeover_tasklets+0x5f0/0x5f0
[    4.821343]  run_ksoftirqd+0x15/0x20
[    4.821804]  smpboot_thread_fn+0x2f1/0x6b0
[    4.822331]  ? smpboot_unregister_percpu_thread+0x160/0x160
[    4.823041]  ? __kthread_parkme+0x80/0x100
[    4.823571]  ? smpboot_unregister_percpu_thread+0x160/0x160
[    4.824301]  kthread+0x2b5/0x3b0
[    4.824723]  ? kthread_create_on_node+0xd0/0xd0
[    4.825304]  ret_from_fork+0x35/0x40

Signed-off-by: Zekun Shen <bruceshenzk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andy-shev pushed a commit that referenced this pull request Aug 26, 2024
When l2tp tunnels use a socket provided by userspace, we can hit
lockdep splats like the below when data is transmitted through another
(unrelated) userspace socket which then gets routed over l2tp.

This issue was previously discussed here:
https://lore.kernel.org/netdev/87sfialu2n.fsf@cloudflare.com/

The solution is to have lockdep treat socket locks of l2tp tunnel
sockets separately than those of standard INET sockets. To do so, use
a different lockdep subclass where lock nesting is possible.

  ============================================
  WARNING: possible recursive locking detected
  6.10.0+ #34 Not tainted
  --------------------------------------------
  iperf3/771 is trying to acquire lock:
  ffff8881027601d8 (slock-AF_INET/1){+.-.}-{2:2}, at: l2tp_xmit_skb+0x243/0x9d0

  but task is already holding lock:
  ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(slock-AF_INET/1);
    lock(slock-AF_INET/1);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  10 locks held by iperf3/771:
   #0: ffff888102650258 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x1a/0x40
   #1: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0
   #2: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130
   #3: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x28b/0x9f0
   #4: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0xf9/0x260
   #5: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10
   #6: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0
   #7: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130
   #8: ffffffff822ac1e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0xcc/0x1450
   #9: ffff888101f33258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock#2){+...}-{2:2}, at: __dev_queue_xmit+0x513/0x1450

  stack backtrace:
  CPU: 2 UID: 0 PID: 771 Comm: iperf3 Not tainted 6.10.0+ #34
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  Call Trace:
   <IRQ>
   dump_stack_lvl+0x69/0xa0
   dump_stack+0xc/0x20
   __lock_acquire+0x135d/0x2600
   ? srso_alias_return_thunk+0x5/0xfbef5
   lock_acquire+0xc4/0x2a0
   ? l2tp_xmit_skb+0x243/0x9d0
   ? __skb_checksum+0xa3/0x540
   _raw_spin_lock_nested+0x35/0x50
   ? l2tp_xmit_skb+0x243/0x9d0
   l2tp_xmit_skb+0x243/0x9d0
   l2tp_eth_dev_xmit+0x3c/0xc0
   dev_hard_start_xmit+0x11e/0x420
   sch_direct_xmit+0xc3/0x640
   __dev_queue_xmit+0x61c/0x1450
   ? ip_finish_output2+0xf4c/0x1130
   ip_finish_output2+0x6b6/0x1130
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __ip_finish_output+0x217/0x380
   ? srso_alias_return_thunk+0x5/0xfbef5
   __ip_finish_output+0x217/0x380
   ip_output+0x99/0x120
   __ip_queue_xmit+0xae4/0xbc0
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? tcp_options_write.constprop.0+0xcb/0x3e0
   ip_queue_xmit+0x34/0x40
   __tcp_transmit_skb+0x1625/0x1890
   __tcp_send_ack+0x1b8/0x340
   tcp_send_ack+0x23/0x30
   __tcp_ack_snd_check+0xa8/0x530
   ? srso_alias_return_thunk+0x5/0xfbef5
   tcp_rcv_established+0x412/0xd70
   tcp_v4_do_rcv+0x299/0x420
   tcp_v4_rcv+0x1991/0x1e10
   ip_protocol_deliver_rcu+0x50/0x220
   ip_local_deliver_finish+0x158/0x260
   ip_local_deliver+0xc8/0xe0
   ip_rcv+0xe5/0x1d0
   ? __pfx_ip_rcv+0x10/0x10
   __netif_receive_skb_one_core+0xce/0xe0
   ? process_backlog+0x28b/0x9f0
   __netif_receive_skb+0x34/0xd0
   ? process_backlog+0x28b/0x9f0
   process_backlog+0x2cb/0x9f0
   __napi_poll.constprop.0+0x61/0x280
   net_rx_action+0x332/0x670
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? find_held_lock+0x2b/0x80
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   handle_softirqs+0xda/0x480
   ? __dev_queue_xmit+0xa2c/0x1450
   do_softirq+0xa1/0xd0
   </IRQ>
   <TASK>
   __local_bh_enable_ip+0xc8/0xe0
   ? __dev_queue_xmit+0xa2c/0x1450
   __dev_queue_xmit+0xa48/0x1450
   ? ip_finish_output2+0xf4c/0x1130
   ip_finish_output2+0x6b6/0x1130
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __ip_finish_output+0x217/0x380
   ? srso_alias_return_thunk+0x5/0xfbef5
   __ip_finish_output+0x217/0x380
   ip_output+0x99/0x120
   __ip_queue_xmit+0xae4/0xbc0
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? tcp_options_write.constprop.0+0xcb/0x3e0
   ip_queue_xmit+0x34/0x40
   __tcp_transmit_skb+0x1625/0x1890
   tcp_write_xmit+0x766/0x2fb0
   ? __entry_text_end+0x102ba9/0x102bad
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __might_fault+0x74/0xc0
   ? srso_alias_return_thunk+0x5/0xfbef5
   __tcp_push_pending_frames+0x56/0x190
   tcp_push+0x117/0x310
   tcp_sendmsg_locked+0x14c1/0x1740
   tcp_sendmsg+0x28/0x40
   inet_sendmsg+0x5d/0x90
   sock_write_iter+0x242/0x2b0
   vfs_write+0x68d/0x800
   ? __pfx_sock_write_iter+0x10/0x10
   ksys_write+0xc8/0xf0
   __x64_sys_write+0x3d/0x50
   x64_sys_call+0xfaf/0x1f50
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f4d143af992
  Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 01 cc ff ff 41 54 b8 02 00 00 0
  RSP: 002b:00007ffd65032058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f4d143af992
  RDX: 0000000000000025 RSI: 00007f4d143f3bcc RDI: 0000000000000005
  RBP: 00007f4d143f2b28 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4d143f3bcc
  R13: 0000000000000005 R14: 0000000000000000 R15: 00007ffd650323f0
   </TASK>

Fixes: 0b2c597 ("l2tp: close all race conditions in l2tp_tunnel_register()")
Suggested-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+6acef9e0a4d1f46c83d4@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6acef9e0a4d1f46c83d4
CC: gnault@redhat.com
CC: cong.wang@bytedance.com
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Link: https://patch.msgid.link/20240806160626.1248317-1-jchapman@katalix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants