-
Notifications
You must be signed in to change notification settings - Fork 846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel oops in spi engine after read timeout #2287
Comments
Hi David, |
Yes, I will do that today. |
I enabled ftrace and added First, here is what a normal transaction looks like: root@zed-3:~# cat /sys/kernel/tracing/trace
# tracer: function
#
# entries-in-buffer/entries-written: 46/46 #P:2
#
# _-----=> irqs-off/BH-disabled
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
iio_attr-1170 [000] ...1. 129.119160: spi_engine_transfer_one_message <-__spi_pump_transfer_message
iio_attr-1170 [000] ...1. 129.119167: spi_engine_compile_message <-spi_engine_transfer_one_message
iio_attr-1170 [000] ...1. 129.119168: spi_engine_get_config <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119170: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119172: spi_engine_get_clk_div <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119175: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119177: spi_engine_get_word_length.constprop.0 <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119178: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119180: spi_engine_gen_cs <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119181: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119182: spi_engine_update_xfer_len <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119183: spi_engine_gen_xfer <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119185: spi_engine_program_add_cmd <-spi_engine_gen_xfer
iio_attr-1170 [000] ...1. 129.119186: spi_engine_gen_sleep.constprop.0 <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119188: spi_engine_gen_cs <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119189: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119191: spi_engine_compile_message <-spi_engine_transfer_one_message
iio_attr-1170 [000] ...1. 129.119193: spi_engine_get_config <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119194: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119195: spi_engine_get_clk_div <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119197: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119198: spi_engine_get_word_length.constprop.0 <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119199: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119201: spi_engine_gen_cs <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119202: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119203: spi_engine_update_xfer_len <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119204: spi_engine_gen_xfer <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119206: spi_engine_program_add_cmd <-spi_engine_gen_xfer
iio_attr-1170 [000] ...1. 129.119207: spi_engine_gen_sleep.constprop.0 <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119209: spi_engine_gen_cs <-spi_engine_compile_message
iio_attr-1170 [000] ...1. 129.119210: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1170 [000] d..2. 129.119211: spi_engine_program_add_cmd <-spi_engine_transfer_one_message
iio_attr-1170 [000] d..2. 129.119213: spi_engine_write_cmd_fifo <-spi_engine_transfer_one_message
iio_attr-1170 [000] d..2. 129.119215: spi_engine_tx_next <-spi_engine_transfer_one_message
iio_attr-1170 [000] d..2. 129.119216: spi_engine_xfer_next <-spi_engine_tx_next
iio_attr-1170 [000] d..2. 129.119218: spi_engine_xfer_next <-spi_engine_tx_next
iio_attr-1170 [000] d..2. 129.119219: spi_engine_write_tx_fifo <-spi_engine_transfer_one_message
iio_attr-1170 [000] d..2. 129.119221: spi_engine_rx_next <-spi_engine_transfer_one_message
iio_attr-1170 [000] d..2. 129.119222: spi_engine_xfer_next <-spi_engine_rx_next
iio_attr-1170 [000] d.h1. 129.119229: spi_engine_irq <-__handle_irq_event_percpu
iio_attr-1170 [000] d.h2. 129.119232: spi_engine_read_rx_fifo <-spi_engine_irq
iio_attr-1170 [000] d.h2. 129.119233: spi_engine_read_buff <-spi_engine_read_rx_fifo
iio_attr-1170 [000] d.h2. 129.119235: spi_engine_rx_next <-spi_engine_read_rx_fifo
iio_attr-1170 [000] d.h2. 129.119236: spi_engine_xfer_next <-spi_engine_rx_next
iio_attr-1170 [000] d.h2. 129.119238: spi_engine_complete_message <-spi_engine_irq
<idle>-0 [000] ..s2. 134.152771: spi_engine_timeout <-call_timer_fn Then here is what a timeout looks like (first call that doesn't crash): # tracer: function
#
# entries-in-buffer/entries-written: 41/41 #P:2
#
# _-----=> irqs-off/BH-disabled
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
iio_attr-1155 [000] ...1. 61.430742: spi_engine_transfer_one_message <-__spi_pump_transfer_message
iio_attr-1155 [000] ...1. 61.430750: spi_engine_compile_message <-spi_engine_transfer_one_message
iio_attr-1155 [000] ...1. 61.430752: spi_engine_get_config <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430753: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430755: spi_engine_get_clk_div <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430759: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430760: spi_engine_get_word_length.constprop.0 <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430762: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430763: spi_engine_gen_cs <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430764: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430766: spi_engine_update_xfer_len <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430767: spi_engine_gen_xfer <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430768: spi_engine_program_add_cmd <-spi_engine_gen_xfer
iio_attr-1155 [000] ...1. 61.430770: spi_engine_gen_sleep.constprop.0 <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430772: spi_engine_gen_cs <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430773: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430776: spi_engine_compile_message <-spi_engine_transfer_one_message
iio_attr-1155 [000] ...1. 61.430777: spi_engine_get_config <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430779: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430780: spi_engine_get_clk_div <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430781: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430783: spi_engine_get_word_length.constprop.0 <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430784: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430785: spi_engine_gen_cs <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430786: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430788: spi_engine_update_xfer_len <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430789: spi_engine_gen_xfer <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430790: spi_engine_program_add_cmd <-spi_engine_gen_xfer
iio_attr-1155 [000] ...1. 61.430792: spi_engine_gen_sleep.constprop.0 <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430793: spi_engine_gen_cs <-spi_engine_compile_message
iio_attr-1155 [000] ...1. 61.430794: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1155 [000] d..2. 61.430796: spi_engine_program_add_cmd <-spi_engine_transfer_one_message
iio_attr-1155 [000] d..2. 61.430798: spi_engine_write_cmd_fifo <-spi_engine_transfer_one_message
iio_attr-1155 [000] d..2. 61.430800: spi_engine_tx_next <-spi_engine_transfer_one_message
iio_attr-1155 [000] d..2. 61.430801: spi_engine_xfer_next <-spi_engine_tx_next
iio_attr-1155 [000] d..2. 61.430803: spi_engine_xfer_next <-spi_engine_tx_next
iio_attr-1155 [000] d..2. 61.430804: spi_engine_write_tx_fifo <-spi_engine_transfer_one_message
iio_attr-1155 [000] d..2. 61.430806: spi_engine_rx_next <-spi_engine_transfer_one_message
iio_attr-1155 [000] d..2. 61.430807: spi_engine_xfer_next <-spi_engine_rx_next
PK-Backend-1152 [000] ..s1. 66.472911: spi_engine_timeout <-call_timer_fn
PK-Backend-1152 [000] ..s2. 66.472946: spi_engine_complete_message <-spi_engine_timeout Then the same command again after the timeout which triggers the crash: # tracer: function
#
# entries-in-buffer/entries-written: 80/80 #P:2
#
# _-----=> irqs-off/BH-disabled
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
iio_attr-1166 [000] ...1. 99.727267: spi_engine_transfer_one_message <-__spi_pump_transfer_message
iio_attr-1166 [000] ...1. 99.727273: spi_engine_compile_message <-spi_engine_transfer_one_message
iio_attr-1166 [000] ...1. 99.727275: spi_engine_get_config <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727277: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727278: spi_engine_get_clk_div <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727282: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727283: spi_engine_get_word_length.constprop.0 <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727285: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727286: spi_engine_gen_cs <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727287: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727289: spi_engine_update_xfer_len <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727290: spi_engine_gen_xfer <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727291: spi_engine_program_add_cmd <-spi_engine_gen_xfer
iio_attr-1166 [000] ...1. 99.727293: spi_engine_gen_sleep.constprop.0 <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727295: spi_engine_gen_cs <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727296: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727298: spi_engine_compile_message <-spi_engine_transfer_one_message
iio_attr-1166 [000] ...1. 99.727300: spi_engine_get_config <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727301: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727302: spi_engine_get_clk_div <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727304: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727305: spi_engine_get_word_length.constprop.0 <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727306: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727308: spi_engine_gen_cs <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727309: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727310: spi_engine_update_xfer_len <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727311: spi_engine_gen_xfer <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727313: spi_engine_program_add_cmd <-spi_engine_gen_xfer
iio_attr-1166 [000] ...1. 99.727314: spi_engine_gen_sleep.constprop.0 <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727315: spi_engine_gen_cs <-spi_engine_compile_message
iio_attr-1166 [000] ...1. 99.727317: spi_engine_program_add_cmd <-spi_engine_compile_message
iio_attr-1166 [000] d..2. 99.727318: spi_engine_program_add_cmd <-spi_engine_transfer_one_message
iio_attr-1166 [000] d..2. 99.727320: spi_engine_write_cmd_fifo <-spi_engine_transfer_one_message
iio_attr-1166 [000] d..2. 99.727322: spi_engine_tx_next <-spi_engine_transfer_one_message
iio_attr-1166 [000] d..2. 99.727323: spi_engine_xfer_next <-spi_engine_tx_next
iio_attr-1166 [000] d..2. 99.727324: spi_engine_xfer_next <-spi_engine_tx_next
iio_attr-1166 [000] d..2. 99.727326: spi_engine_write_tx_fifo <-spi_engine_transfer_one_message
iio_attr-1166 [000] d..2. 99.727328: spi_engine_rx_next <-spi_engine_transfer_one_message
iio_attr-1166 [000] d..2. 99.727329: spi_engine_xfer_next <-spi_engine_rx_next |
From the traces above, we can see that in the working case, everything is the same up to: iio_attr-1170 [000] d.h1. 129.119229: spi_engine_irq <-__handle_irq_event_percpu
iio_attr-1170 [000] d.h2. 129.119232: spi_engine_read_rx_fifo <-spi_engine_irq
iio_attr-1170 [000] d.h2. 129.119233: spi_engine_read_buff <-spi_engine_read_rx_fifo
iio_attr-1170 [000] d.h2. 129.119235: spi_engine_rx_next <-spi_engine_read_rx_fifo
iio_attr-1170 [000] d.h2. 129.119236: spi_engine_xfer_next <-spi_engine_rx_next
iio_attr-1170 [000] d.h2. 129.119238: spi_engine_complete_message <-spi_engine_irq
<idle>-0 [000] ..s2. 134.152771: spi_engine_timeout <-call_timer_fn In this call tree, In the case of the timeout, PK-Backend-1152 [000] ..s1. 66.472911: spi_engine_timeout <-call_timer_fn
PK-Backend-1152 [000] ..s2. 66.472946: spi_engine_complete_message <-spi_engine_timeout Then the next time we try to do an SPI xfer, the call to iio_attr-1166 [000] d..2. 99.727328: spi_engine_rx_next <-spi_engine_transfer_one_message
iio_attr-1166 [000] d..2. 99.727329: spi_engine_xfer_next <-spi_engine_rx_next
|
Unrelated, but I noticed another odd thing when everything is working properly (no timeout): the watchdog timer doesn't seem to be canceled on successful completion and still fires 5 seconds afterwards. iio_attr-1170 [000] d.h2. 129.119232: spi_engine_read_rx_fifo <-spi_engine_irq
iio_attr-1170 [000] d.h2. 129.119233: spi_engine_read_buff <-spi_engine_read_rx_fifo
iio_attr-1170 [000] d.h2. 129.119235: spi_engine_rx_next <-spi_engine_read_rx_fifo
iio_attr-1170 [000] d.h2. 129.119236: spi_engine_xfer_next <-spi_engine_rx_next
iio_attr-1170 [000] d.h2. 129.119238: spi_engine_complete_message <-spi_engine_irq
<idle>-0 [000] ..s2. 134.152771: spi_engine_timeout <-call_timer_fn
|
This sounds odd. I would expect the timer to be terminated after del_timer(). CONFIG_DEBUG_OBJECTS_TIMERS=y ? |
I guess it might depend on concurrency... del_timer() does not care about the callback being scheduled in another CPU... It just deactivates the timer.
The above are the return codes... I wonder if we get 1 in that trace. I'm thinking del_timer_sync() might give us the expected trace? |
hmm, actually looking at the code, this could give us a nasty deadlock... |
Apparently merging #2292 automatically closed this... I looked a bit more and now just realized the callback is being called 5 seconds later (as you said) so there's no concurrent calls. Having a quick look on the timer code, I would not expect this to happen but I might be missing something... So, @dlech if you wanna have a deeper look on it, might actually be interesting stuff. |
I'm working on adding some new parts to the
ad_pulsar.c
driver. There is some issue causing communication to not work so we are hitting the timeout error path in the spi engine.linux/drivers/spi/spi-axi-spi-engine.c
Lines 592 to 603 in a834f68
This issue can be reproduced by reading the raw analog value twice. The first time, we get the timeout error from the link above. The second time, we get the kernel crash below. So it appears that something is not getting cleaned up properly after the timout error.
The text was updated successfully, but these errors were encountered: