Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel oops in spi engine after read timeout #2287

Closed
dlech opened this issue Oct 16, 2023 · 9 comments · Fixed by #2292
Closed

kernel oops in spi engine after read timeout #2287

dlech opened this issue Oct 16, 2023 · 9 comments · Fixed by #2292

Comments

@dlech
Copy link
Collaborator

dlech commented Oct 16, 2023

I'm working on adding some new parts to the ad_pulsar.c driver. There is some issue causing communication to not work so we are hitting the timeout error path in the spi engine.

static void spi_engine_timeout(struct timer_list *t)
{
struct spi_engine *spi_engine = from_timer(spi_engine, t, watchdog_timer);
struct spi_master *master = spi_engine->master;
spin_lock(&spi_engine->lock);
if (spi_engine->msg) {
dev_err(&master->dev, "Timeout occured while waiting for transfer to complete. Hardware is probably broken.\n");
spi_engine_complete_message(master, -ETIMEDOUT);
}
spin_unlock(&spi_engine->lock);
}

This issue can be reproduced by reading the raw analog value twice. The first time, we get the timeout error from the link above. The second time, we get the kernel crash below. So it appears that something is not getting cleaned up properly after the timout error.

Unable to handle kernel paging request at virtual address ffffffac              
[ffffffac] *pgd=1ffbe861, *pte=00000000, *ppte=00000000                         
Internal error: Oops: 37 [#1] PREEMPT SMP ARM                                   
Modules linked in: 8021q ad_pulsar                                              
CPU: 0 PID: 1227 Comm: iio_attr Not tainted 6.1.0-22219-g190b7577f6c9 #5        
Hardware name: Xilinx Zynq Platform                                             
PC is at spi_engine_rx_next+0x1c/0x74                                           
LR is at spi_engine_transfer_one_message+0xfc/0x170                             
pc : [<c065fa6c>]    lr : [<c066033c>]    psr: a0000093                         
sp : e0be1ce0  ip : 00000007  fp : 00400cc0                                     
r10: 7ffff000  r9 : c1fe088c  r8 : c1fe08c8                                     
r7 : 60000013  r6 : e0be1d80  r5 : 00000000  r4 : c1fe0880                      
r3 : ffffffa8  r2 : 00000000  r1 : e0be1d80  r0 : c1fe0880                      
Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none              
Control: 18c5387d  Table: 02dc004a  DAC: 00000051                               
Register r0 information: slab kmalloc-192 start c1fe0840 pointer offset 64 size2
Register r1 information: 2-page vmalloc region starting at 0xe0be0000 allocated8
Register r2 information: NULL pointer                                           
Register r3 information: non-paged memory                                       
Register r4 information: slab kmalloc-192 start c1fe0840 pointer offset 64 size2
Register r5 information: NULL pointer                                           
Register r6 information: 2-page vmalloc region starting at 0xe0be0000 allocated8
Register r7 information: non-paged memory                                       
Register r8 information: slab kmalloc-192 start c1fe0840 pointer offset 136 siz2
Register r9 information: slab kmalloc-192 start c1fe0840 pointer offset 76 size2
Register r10 information: non-paged memory                                      
Register r11 information: non-paged memory                                      
Register r12 information: non-paged memory                                      
Process iio_attr (pid: 1227, stack limit = 0x72d213d0)                          
Stack: (0xe0be1ce0 to 0xe0be2000)                                               
1ce0: 00000006 f84c0bb9 c2486800 00000001 c2486800 00000000 e0be1d80 c2486800   
1d00: d27ff158 c065b2e4 00400cc0 c0655da4 e0be1dd0 c1c03080 c27ed00c 00000000   
1d20: e0be1d80 c2486a00 c2486800 d3e57f54 d27ff158 7ffff000 00400cc0 c065bc54   
1d40: c14baa80 00000000 00000000 e0be1d4c e0be1d4c f84c0bb9 c2486400 e0be1d80   
1d60: e0be1e6c c4a28008 c4aa5200 c065bdcc e0be1dd0 c2486400 e0be1e6c bf000164   
1d80: e0be1e28 e0be1e28 c2486400 00000000 00000000 00000000 00000004 00000000   
1da0: ffffff8d 00000000 00000000 00000000 e0be1db0 e0be1db0 00000000 f84c0bb9   
1dc0: c1803f10 c4a28340 00003049 bf00054c 00000000 c4a28380 00000001 00000000   
1de0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00100040   
1e00: 00000000 00000000 00000000 04c4b3ff 00000000 00000000 00000000 00000000   
1e20: 00000000 00000000 e0be1d80 e0be1d80 00000000 f84c0bb9 c4a28340 e0be1e6c   
1e40: c64b8f40 bf000674 c01002c4 ca65a580 bf000580 d3e0f000 e0be1e6c c08c313c   
1e60: 00000000 c0297224 00000002 00400cc0 ffffffff c025e108 c02cf194 f84c0bb9   
1e80: d27ff140 c48dd540 d3e0f000 c05d6894 d27ff140 c0e5b9e0 d3e0f000 c0329f90   
1ea0: d27ff140 00004000 e0be1f28 00000001 00000001 c02cf038 652dac9d 00000000   
1ec0: e0be1f10 d27ff168 000033df c02a0a0c 0000002f c2e4f000 ffffffff d3d98600   
1ee0: e0be1f80 c0328618 00000001 00004000 00000000 00004004 00000000 c02a0a0c   
1f00: 00000000 00000000 c2e4f0bc dfc139f4 01000006 00000000 00004000 000398f0   
1f20: 00000000 00000000 d3d98600 00000000 00000000 00000000 00000000 00000000   
1f40: 00000000 00004004 00000000 00000000 652dac9d f84c0bb9 00000003 d3d98600   
1f60: d3d98600 00000000 00000000 c01002c4 ca65a580 00000003 000159c4 c02a140c   
1f80: 00000000 00000000 00000000 f84c0bb9 b6e92c18 00000003 b6f260c0 00004000   
1fa0: 00000003 c0100060 00000003 b6f260c0 00000003 000398f0 00004000 00000000   
1fc0: 00000003 b6f260c0 00004000 00000003 00004000 b6e92780 000398f0 000159c4   
1fe0: fbad2488 be8d9538 b6e0dc94 b6e0dcb0 60000010 00000003 00000000 00000000   
 spi_engine_rx_next from spi_engine_transfer_one_message+0xfc/0x170             
 spi_engine_transfer_one_message from __spi_pump_transfer_message+0x31c/0x62c   
 __spi_pump_transfer_message from __spi_sync+0x318/0x464                        
 __spi_sync from spi_sync+0x2c/0x44                                             
 spi_sync from spi_sync_transfer.constprop.0+0x68/0x90 [ad_pulsar]              
 spi_sync_transfer.constprop.0 [ad_pulsar] from ad_pulsar_read_channel+0x8c/0xc]
 ad_pulsar_read_channel [ad_pulsar] from ad_pulsar_read_raw+0xf4/0x1c0 [ad_puls]
 ad_pulsar_read_raw [ad_pulsar] from iio_read_channel_info+0xec/0x100           
 iio_read_channel_info from dev_attr_show+0x20/0x50                             
 dev_attr_show from sysfs_kf_seq_show+0x94/0x124                                
 sysfs_kf_seq_show from seq_read_iter+0x1b0/0x508                               
 seq_read_iter from vfs_read+0x1d0/0x298                                        
 vfs_read from ksys_read+0x68/0xec                                              
 ksys_read from ret_fast_syscall+0x0/0x54                                       
Exception stack(0xe0be1fa8 to 0xe0be1ff0)                                       
1fa0:                   00000003 b6f260c0 00000003 000398f0 00004000 00000000   
1fc0: 00000003 b6f260c0 00004000 00000003 00004000 b6e92780 000398f0 000159c4   
1fe0: fbad2488 be8d9538 b6e0dc94 b6e0dcb0                                       
Code: e5901014 ea000004 e3530000 0a00000a (e5932004)                            
---[ end trace 0000000000000000 ]---                                            
note: iio_attr[1227] exited with preempt_count 1    
@dlech dlech changed the title kernel oops in spi-engine after read timeout kernel oops in spi engine after read timeout Oct 16, 2023
@mhennerich
Copy link
Contributor

Hi David,
Can you try to dig a bit deeper in spi_engine_rx_next() and see what pointer is causing the issue?

@dlech
Copy link
Collaborator Author

dlech commented Oct 17, 2023

Yes, I will do that today.

@dlech
Copy link
Collaborator Author

dlech commented Oct 17, 2023

I enabled ftrace and added noinline to all of the functions in the SPI Engine driver to trace the code path to compare working vs. the crash. I used iio_attr ad7985 -ci voltage0-voltage1 to trigger an SPI transaction.

First, here is what a normal transaction looks like:

root@zed-3:~# cat /sys/kernel/tracing/trace
# tracer: function
#
# entries-in-buffer/entries-written: 46/46   #P:2
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
        iio_attr-1170    [000] ...1.   129.119160: spi_engine_transfer_one_message <-__spi_pump_transfer_message
        iio_attr-1170    [000] ...1.   129.119167: spi_engine_compile_message <-spi_engine_transfer_one_message
        iio_attr-1170    [000] ...1.   129.119168: spi_engine_get_config <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119170: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119172: spi_engine_get_clk_div <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119175: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119177: spi_engine_get_word_length.constprop.0 <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119178: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119180: spi_engine_gen_cs <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119181: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119182: spi_engine_update_xfer_len <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119183: spi_engine_gen_xfer <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119185: spi_engine_program_add_cmd <-spi_engine_gen_xfer
        iio_attr-1170    [000] ...1.   129.119186: spi_engine_gen_sleep.constprop.0 <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119188: spi_engine_gen_cs <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119189: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119191: spi_engine_compile_message <-spi_engine_transfer_one_message
        iio_attr-1170    [000] ...1.   129.119193: spi_engine_get_config <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119194: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119195: spi_engine_get_clk_div <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119197: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119198: spi_engine_get_word_length.constprop.0 <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119199: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119201: spi_engine_gen_cs <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119202: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119203: spi_engine_update_xfer_len <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119204: spi_engine_gen_xfer <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119206: spi_engine_program_add_cmd <-spi_engine_gen_xfer
        iio_attr-1170    [000] ...1.   129.119207: spi_engine_gen_sleep.constprop.0 <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119209: spi_engine_gen_cs <-spi_engine_compile_message
        iio_attr-1170    [000] ...1.   129.119210: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1170    [000] d..2.   129.119211: spi_engine_program_add_cmd <-spi_engine_transfer_one_message
        iio_attr-1170    [000] d..2.   129.119213: spi_engine_write_cmd_fifo <-spi_engine_transfer_one_message
        iio_attr-1170    [000] d..2.   129.119215: spi_engine_tx_next <-spi_engine_transfer_one_message
        iio_attr-1170    [000] d..2.   129.119216: spi_engine_xfer_next <-spi_engine_tx_next
        iio_attr-1170    [000] d..2.   129.119218: spi_engine_xfer_next <-spi_engine_tx_next
        iio_attr-1170    [000] d..2.   129.119219: spi_engine_write_tx_fifo <-spi_engine_transfer_one_message
        iio_attr-1170    [000] d..2.   129.119221: spi_engine_rx_next <-spi_engine_transfer_one_message
        iio_attr-1170    [000] d..2.   129.119222: spi_engine_xfer_next <-spi_engine_rx_next
        iio_attr-1170    [000] d.h1.   129.119229: spi_engine_irq <-__handle_irq_event_percpu
        iio_attr-1170    [000] d.h2.   129.119232: spi_engine_read_rx_fifo <-spi_engine_irq
        iio_attr-1170    [000] d.h2.   129.119233: spi_engine_read_buff <-spi_engine_read_rx_fifo
        iio_attr-1170    [000] d.h2.   129.119235: spi_engine_rx_next <-spi_engine_read_rx_fifo
        iio_attr-1170    [000] d.h2.   129.119236: spi_engine_xfer_next <-spi_engine_rx_next
        iio_attr-1170    [000] d.h2.   129.119238: spi_engine_complete_message <-spi_engine_irq
          <idle>-0       [000] ..s2.   134.152771: spi_engine_timeout <-call_timer_fn

Then here is what a timeout looks like (first call that doesn't crash):
(this can be reproduced by using a wrong BOOT.BIN file)

# tracer: function
#
# entries-in-buffer/entries-written: 41/41   #P:2
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
        iio_attr-1155    [000] ...1.    61.430742: spi_engine_transfer_one_message <-__spi_pump_transfer_message
        iio_attr-1155    [000] ...1.    61.430750: spi_engine_compile_message <-spi_engine_transfer_one_message
        iio_attr-1155    [000] ...1.    61.430752: spi_engine_get_config <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430753: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430755: spi_engine_get_clk_div <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430759: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430760: spi_engine_get_word_length.constprop.0 <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430762: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430763: spi_engine_gen_cs <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430764: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430766: spi_engine_update_xfer_len <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430767: spi_engine_gen_xfer <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430768: spi_engine_program_add_cmd <-spi_engine_gen_xfer
        iio_attr-1155    [000] ...1.    61.430770: spi_engine_gen_sleep.constprop.0 <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430772: spi_engine_gen_cs <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430773: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430776: spi_engine_compile_message <-spi_engine_transfer_one_message
        iio_attr-1155    [000] ...1.    61.430777: spi_engine_get_config <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430779: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430780: spi_engine_get_clk_div <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430781: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430783: spi_engine_get_word_length.constprop.0 <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430784: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430785: spi_engine_gen_cs <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430786: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430788: spi_engine_update_xfer_len <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430789: spi_engine_gen_xfer <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430790: spi_engine_program_add_cmd <-spi_engine_gen_xfer
        iio_attr-1155    [000] ...1.    61.430792: spi_engine_gen_sleep.constprop.0 <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430793: spi_engine_gen_cs <-spi_engine_compile_message
        iio_attr-1155    [000] ...1.    61.430794: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1155    [000] d..2.    61.430796: spi_engine_program_add_cmd <-spi_engine_transfer_one_message
        iio_attr-1155    [000] d..2.    61.430798: spi_engine_write_cmd_fifo <-spi_engine_transfer_one_message
        iio_attr-1155    [000] d..2.    61.430800: spi_engine_tx_next <-spi_engine_transfer_one_message
        iio_attr-1155    [000] d..2.    61.430801: spi_engine_xfer_next <-spi_engine_tx_next
        iio_attr-1155    [000] d..2.    61.430803: spi_engine_xfer_next <-spi_engine_tx_next
        iio_attr-1155    [000] d..2.    61.430804: spi_engine_write_tx_fifo <-spi_engine_transfer_one_message
        iio_attr-1155    [000] d..2.    61.430806: spi_engine_rx_next <-spi_engine_transfer_one_message
        iio_attr-1155    [000] d..2.    61.430807: spi_engine_xfer_next <-spi_engine_rx_next
      PK-Backend-1152    [000] ..s1.    66.472911: spi_engine_timeout <-call_timer_fn
      PK-Backend-1152    [000] ..s2.    66.472946: spi_engine_complete_message <-spi_engine_timeout

Then the same command again after the timeout which triggers the crash:

# tracer: function
#
# entries-in-buffer/entries-written: 80/80   #P:2
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
        iio_attr-1166    [000] ...1.    99.727267: spi_engine_transfer_one_message <-__spi_pump_transfer_message
        iio_attr-1166    [000] ...1.    99.727273: spi_engine_compile_message <-spi_engine_transfer_one_message
        iio_attr-1166    [000] ...1.    99.727275: spi_engine_get_config <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727277: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727278: spi_engine_get_clk_div <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727282: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727283: spi_engine_get_word_length.constprop.0 <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727285: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727286: spi_engine_gen_cs <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727287: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727289: spi_engine_update_xfer_len <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727290: spi_engine_gen_xfer <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727291: spi_engine_program_add_cmd <-spi_engine_gen_xfer
        iio_attr-1166    [000] ...1.    99.727293: spi_engine_gen_sleep.constprop.0 <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727295: spi_engine_gen_cs <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727296: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727298: spi_engine_compile_message <-spi_engine_transfer_one_message
        iio_attr-1166    [000] ...1.    99.727300: spi_engine_get_config <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727301: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727302: spi_engine_get_clk_div <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727304: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727305: spi_engine_get_word_length.constprop.0 <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727306: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727308: spi_engine_gen_cs <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727309: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727310: spi_engine_update_xfer_len <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727311: spi_engine_gen_xfer <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727313: spi_engine_program_add_cmd <-spi_engine_gen_xfer
        iio_attr-1166    [000] ...1.    99.727314: spi_engine_gen_sleep.constprop.0 <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727315: spi_engine_gen_cs <-spi_engine_compile_message
        iio_attr-1166    [000] ...1.    99.727317: spi_engine_program_add_cmd <-spi_engine_compile_message
        iio_attr-1166    [000] d..2.    99.727318: spi_engine_program_add_cmd <-spi_engine_transfer_one_message
        iio_attr-1166    [000] d..2.    99.727320: spi_engine_write_cmd_fifo <-spi_engine_transfer_one_message
        iio_attr-1166    [000] d..2.    99.727322: spi_engine_tx_next <-spi_engine_transfer_one_message
        iio_attr-1166    [000] d..2.    99.727323: spi_engine_xfer_next <-spi_engine_tx_next
        iio_attr-1166    [000] d..2.    99.727324: spi_engine_xfer_next <-spi_engine_tx_next
        iio_attr-1166    [000] d..2.    99.727326: spi_engine_write_tx_fifo <-spi_engine_transfer_one_message
        iio_attr-1166    [000] d..2.    99.727328: spi_engine_rx_next <-spi_engine_transfer_one_message
        iio_attr-1166    [000] d..2.    99.727329: spi_engine_xfer_next <-spi_engine_rx_next

@dlech
Copy link
Collaborator Author

dlech commented Oct 17, 2023

From the traces above, we can see that in the working case, everything is the same up to:

        iio_attr-1170    [000] d.h1.   129.119229: spi_engine_irq <-__handle_irq_event_percpu
        iio_attr-1170    [000] d.h2.   129.119232: spi_engine_read_rx_fifo <-spi_engine_irq
        iio_attr-1170    [000] d.h2.   129.119233: spi_engine_read_buff <-spi_engine_read_rx_fifo
        iio_attr-1170    [000] d.h2.   129.119235: spi_engine_rx_next <-spi_engine_read_rx_fifo
        iio_attr-1170    [000] d.h2.   129.119236: spi_engine_xfer_next <-spi_engine_rx_next
        iio_attr-1170    [000] d.h2.   129.119238: spi_engine_complete_message <-spi_engine_irq
          <idle>-0       [000] ..s2.   134.152771: spi_engine_timeout <-call_timer_fn

In this call tree, spi_engine_rx_next() sets spi_engine->rx_xfer to the next xfer if one is available or NULL if there are no more queued xfers. Then spi_engine_complete_message() frees the previous xfer by calling spi_finalize_current_message().

In the case of the timeout, spi_engine_complete_message() is called but spi_engine->rx_xfer is still pointing to memory that gets freed by spi_engine_complete_message() via the call to spi_engine_complete_message() .

      PK-Backend-1152    [000] ..s1.    66.472911: spi_engine_timeout <-call_timer_fn
      PK-Backend-1152    [000] ..s2.    66.472946: spi_engine_complete_message <-spi_engine_timeout

Then the next time we try to do an SPI xfer, the call to spi_engine_rx_next() (via spi_engine_transfer_one_message()) calls spi_engine_xfer_next() with spi_engine->rx_xfer that is still pointing to memory that was freed during the last spi_engine_complete_message().

        iio_attr-1166    [000] d..2.    99.727328: spi_engine_rx_next <-spi_engine_transfer_one_message
        iio_attr-1166    [000] d..2.    99.727329: spi_engine_xfer_next <-spi_engine_rx_next

spi_engine_xfer_next() accesses the member xfer->transfer_list of the freed memory and causes the crash.

@dlech
Copy link
Collaborator Author

dlech commented Oct 17, 2023

Unrelated, but I noticed another odd thing when everything is working properly (no timeout): the watchdog timer doesn't seem to be canceled on successful completion and still fires 5 seconds afterwards.

        iio_attr-1170    [000] d.h2.   129.119232: spi_engine_read_rx_fifo <-spi_engine_irq
        iio_attr-1170    [000] d.h2.   129.119233: spi_engine_read_buff <-spi_engine_read_rx_fifo
        iio_attr-1170    [000] d.h2.   129.119235: spi_engine_rx_next <-spi_engine_read_rx_fifo
        iio_attr-1170    [000] d.h2.   129.119236: spi_engine_xfer_next <-spi_engine_rx_next
        iio_attr-1170    [000] d.h2.   129.119238: spi_engine_complete_message <-spi_engine_irq
          <idle>-0       [000] ..s2.   134.152771: spi_engine_timeout <-call_timer_fn

del_timer() is called immediately before spi_engine_complete_message() so I would expect it to be canceled and not call the callback.

@mhennerich
Copy link
Contributor

mhennerich commented Oct 18, 2023

This sounds odd. I would expect the timer to be terminated after del_timer().

CONFIG_DEBUG_OBJECTS_TIMERS=y ?

@nunojsa
Copy link
Collaborator

nunojsa commented Oct 18, 2023

del_timer() is called immediately before spi_engine_complete_message() so I would expect it to be canceled and not call the callback.

I guess it might depend on concurrency... del_timer() does not care about the callback being scheduled in another CPU... It just deactivates the timer.

 Return:
 * * %0 - The timer was not pending
 * * %1 - The timer was pending and deactivated

The above are the return codes... I wonder if we get 1 in that trace. I'm thinking del_timer_sync() might give us the expected trace?

@nunojsa
Copy link
Collaborator

nunojsa commented Oct 18, 2023

I'm thinking del_timer_sync() might give us the expected trace?

hmm, actually looking at the code, this could give us a nasty deadlock...

@nunojsa
Copy link
Collaborator

nunojsa commented Oct 20, 2023

Apparently merging #2292 automatically closed this... I looked a bit more and now just realized the callback is being called 5 seconds later (as you said) so there's no concurrent calls. Having a quick look on the timer code, I would not expect this to happen but I might be missing something... So, @dlech if you wanna have a deeper look on it, might actually be interesting stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants