-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AR9271 random freezes in Ad-hoc mode when distant client disconnect #145
Comments
Here is a other log, slightly different: 7,7330,1070405992,-;ath: phy2: Choose slot: 0, tsf: 7654670338, tsftu: 7475264, intval: 100 |
Hi, i know there is some thing fishy with ad-hoc mode. For some time I made this script to crash the system just by spoofing ad-hock interface: |
Just tried your script, looks like I can use it to reproduce our bug. Simply ifconfig down; macchanger; ifconfig up can trigger the same bug. Do you think it is a firmware or driver related problem ? |
Hi,
This is a driver bug, not a firmware bug? It's hung inside the driver.
…-a
On Thu, 12 Jul 2018 at 09:59, dagf2101 ***@***.***> wrote:
This is similar than issue #137
<#137> but since it
is Ad-hoc related and the problem happen when a remote client
disconnect(instead of connect), I have decided to repost it in its own
thread.
I'm also providing new logs with all ath debugs enabled.
Issue:
On AWUS036NHA in Ad-Hoc mode on kernel 4.15 / 4.16. We are using a network
of around 8 devices connected to NUC PCs in our tests. Every time a device
disconnect, other devices have a chance to crash. I reproduce it by
rebooting one of the PC and eventually other devices start crashing the
kernel thread freeze on those crashed nodes.
First the log show a working sta entry remove, and then later it show one
that crashed the card.
Here is the kernel log:
7,9361,911527345,-;ath: phy2: Choose slot: 0, tsf: 528240644, tsftu:
515860, intval: 100
7,9362,911629831,-;ath: phy2: Choose slot: 0, tsf: 528343045, tsftu:
515960, intval: 100
7,9363,911732122,-;ath: phy2: Choose slot: 0, tsf: 528445445, tsftu:
516060, intval: 100
7,9364,911744161,-;ath: phy2: Set HW Key
4,9365,911745132,-;FDB: ath9k_htc_sta_remove before cancel_work_sync
4,9366,911745139,-;FDB: ath9k_htc_sta_remove after cancel_work_sync
7,9367,911745618,-;ath: phy2: Removed a station entry for:
00:c0:ca:97:3a:cc (idx: 4)
7,9368,911834579,-;ath: phy2: Choose slot: 0, tsf: 528547842, tsftu:
516160, intval: 100
7,9369,911936951,-;ath: phy2: Choose slot: 0, tsf: 528650245, tsftu:
516260, intval: 100
7,9370,912039331,-;ath: phy2: Choose slot: 0, tsf: 528752643, tsftu:
516360, intval: 100
...
7,9439,919102079,-;ath: phy2: Choose slot: 0, tsf: 535818244, tsftu:
523260, intval: 100
7,9440,919204380,-;ath: phy2: Choose slot: 0, tsf: 535920644, tsftu:
523360, intval: 100
7,9441,919306830,-;ath: phy2: Choose slot: 0, tsf: 536023045, tsftu:
523460, intval: 100
4,9442,919392253,-;FDB: ath9k_htc_sta_remove before cancel_work_sync
7,9443,919408963,-;ath: phy2: Choose slot: 0, tsf: 536125445, tsftu:
523560, intval: 100
7,9444,919511254,-;ath: phy2: Choose slot: 0, tsf: 536227843, tsftu:
523660, intval: 100
7,9445,919613631,-;ath: phy2: Choose slot: 0, tsf: 536330244, tsftu:
523760, intval: 100
...
7,9572,932615043,-;ath: phy2: Choose slot: 0, tsf: 549335042, tsftu:
536460, intval: 100
7,9573,932717546,-;ath: phy2: Choose slot: 0, tsf: 549437443, tsftu:
536560, intval: 100
7,9574,932819792,-;ath: phy2: Choose slot: 0, tsf: 549539845, tsftu:
536660, intval: 100
4,9575,932832054,-;asynchronous wait on fence i915:gnome-shell[781]/1:2963
timed out
7,9576,932922171,-;ath: phy2: Choose slot: 0, tsf: 549642243, tsftu:
536760, intval: 100
7,9577,933024543,-;ath: phy2: Choose slot: 0, tsf: 549744645, tsftu:
536860, intval: 100
7,9578,933126926,-;ath: phy2: Choose slot: 0, tsf: 549847043, tsftu:
536960, intval: 100
7,9579,933229292,-;ath: phy2: Choose slot: 0, tsf: 549949442, tsftu:
537060, intval: 100
7,9580,933434044,-;ath: phy2: Resuming beacon xmit after 1 misses
7,9581,933434045,-;ath: phy2: Choose slot: 0, tsf: 550154242, tsftu:
537260, intval: 100
7,9582,933536543,-;ath: phy2: Choose slot: 0, tsf: 550256644, tsftu:
537360, intval: 100
7,9583,933638920,-;ath: phy2: Choose slot: 0, tsf: 550359046, tsftu:
537460, intval: 100
...
7,9647,940190669,-;ath: phy2: Choose slot: 0, tsf: 556912645, tsftu:
543860, intval: 100
7,9648,940293051,-;ath: phy2: Choose slot: 0, tsf: 557015044, tsftu:
543960, intval: 100
7,9649,940395542,-;ath: phy2: Choose slot: 0, tsf: 557117443, tsftu:
544060, intval: 100
3,9650,940404014,-;INFO: rcu_sched self-detected stall on CPU
3,9651,940404019,-;\x091-....: (5249 ticks this GP)
idle=cca/1/4611686018427387906 softirq=115167/115167 fqs=2625
3,9652,940404020,-;\x09 (t=5250 jiffies g=75919 c=75918 q=1429)
4,9653,940404022,-;NMI backtrace for cpu 1
4,9654,940404024,-;CPU: 1 PID: 131 Comm: kworker/u8:2 Tainted: G W
4.16.16-custom #1
<#1>
4,9655,940404025,-;Hardware name: Intel Corporation NUC7i5DNB/NUC7i5DNB,
BIOS DNKBLi5v.86A.0040.2018.0315.1451 03/15/2018
4,9656,940404039,-;Workqueue: phy2 ieee80211_iface_work [mac80211]
4,9657,940404040,-;Call Trace:
4,9658,940404043,-;
4,9659,940404046,-; dump_stack+0x5c/0x85
4,9660,940404048,-; nmi_cpu_backtrace+0xbf/0xd0
4,9661,940404050,-; ? lapic_can_unplug_cpu+0xa0/0xa0
4,9662,940404051,-; nmi_trigger_cpumask_backtrace+0xf5/0x130
4,9663,940404053,-; rcu_dump_cpu_stacks+0x9e/0xd7
4,9664,940404055,-; rcu_check_callbacks+0x6c8/0x910
4,9665,940404057,-; ? update_wall_time+0x474/0x6f0
4,9666,940404059,-; ? tick_sched_do_timer+0x40/0x40
4,9667,940404060,-; update_process_times+0x28/0x50
4,9668,940404061,-; tick_sched_handle+0x22/0x70
4,9669,940404062,-; tick_sched_timer+0x34/0x70
4,9670,940404064,-; __hrtimer_run_queues+0x108/0x290
4,9671,940404065,-; hrtimer_interrupt+0xe5/0x240
4,9672,940404067,-; smp_apic_timer_interrupt+0x62/0x120
4,9673,940404068,-; apic_timer_interrupt+0xf/0x20
4,9674,940404069,-;
4,9675,940404071,-;RIP: 0010:try_to_grab_pending+0x118/0x150
4,9676,940404072,-;RSP: 0018:ffffb8ab41ee3cc0 EFLAGS: 00000287 ORIG_RAX:
ffffffffffffff12
4,9677,940404073,-;RAX: 00000000fffffffe RBX: ffff8e3ac9c05928 RCX:
0000000000000000
4,9678,940404074,-;RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000286
4,9679,940404074,-;RBP: ffffb8ab41ee3ce8 R08: ffff8e3b20002fe8 R09:
ffff8e3b30c211c0
4,9680,940404075,-;R10: 0000000000000000 R11: 0000000000000040 R12:
ffff8e3b30c211c0
4,9681,940404075,-;R13: ffffb8ab41ee3d08 R14: ffffffffb4890810 R15:
ffff8e3ac9c05000
4,9682,940404077,-; ? get_work_pool+0x40/0x40
4,9683,940404079,-; __cancel_work_timer+0x42/0x1b0
4,9684,940404083,-; *ath9k_htc_sta_remove*+0x3b/0xa0 [ath9k_htc]
4,9685,940404089,-; drv_sta_state+0x25b/0x3f0 [mac80211]
4,9686,940404096,-; sta_info_move_state+0x181/0x260 [mac80211]
4,9687,940404102,-; __sta_info_destroy_part2+0x54/0x110 [mac80211]
4,9688,940404107,-; __sta_info_destroy+0x27/0x40 [mac80211]
4,9689,940404113,-; ieee80211_ibss_work+0x1be/0x580 [mac80211]
4,9690,940404115,-; ? kmem_cache_free+0x19c/0x1d0
4,9691,940404117,-; ? skb_dequeue+0x52/0x60
4,9692,940404124,-; ? ieee80211_iface_work+0xbe/0x340 [mac80211]
4,9693,940404125,-; process_one_work+0x17b/0x360
4,9694,940404127,-; worker_thread+0x2e/0x390
4,9695,940404129,-; ? process_one_work+0x360/0x360
4,9696,940404130,-; kthread+0x113/0x130
4,9697,940404131,-; ? kthread_create_worker_on_cpu+0x70/0x70
4,9698,940404133,-; ret_from_fork+0x35/0x40
7,9699,940497752,-;ath: phy2: Choose slot: 0, tsf: 557219844, tsftu:
544160, intval: 100
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#145>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/ABGl7SjJOPyLO2wJBl8qm6GjTQ0I-7YTks5uF4BXgaJpZM4VNYWE>
.
|
I thought it was freezing because the card was dead and not responding anymore. I will try to get a cable to connect to the UART and log the firmware output... Anything else I can try or enable in the kernel that could help ? |
Hi,
Well if the firmware is freezing then ideally ath9k-htc would also
handle that case without it crashing.
…-adrian
On Thu, 12 Jul 2018 at 14:16, dagf2101 ***@***.***> wrote:
I thought it was freezing because the card was dead and not responding anymore. I will try to get a cable to connect to the UART and log the firmware output...
Anything else I can try or enable in the kernel that could help ?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
My last research on this topic was in 20 Oct 2014: |
Thanks @olerem for this information. It looks like I would need to increase the max station support into the firmware. The Idea proposed here is to remove some station information from the firmware and keep it in the kernel driver instead. I am pretty lost looking at the firmware code, can someone point me where to start looking or where this hard limit is enforced in the firmware ? Or if you guys have a idea where we can easily save resources. |
@dagf2101 , on you place I would attach UART and add some prints in to the firmware. If will allow you to see how it works. Suddenly we have no available open source solution to JTAG this chip. |
Yeah I will do but still haven't received the cable... damn bureaucracy. |
I thought each SW/HW developer has a bucket full of cheap uart adapters... |
Looks like the bucket is empty here. |
Ok I received this cable from digikey but cant make it work. Soldered the rx, tx and ground as shown here: https://photos.app.goo.gl/PVC8acPcVmnzPTnq8 Trying to connect to the USB serial device (19200) but I dont see anything... Maybe I have the wrong cable or I'm missing something ? |
I already had one issue with exact this cable. Purchased 3v3 version and got 5V version. Check this with scope or logic analyzer if you have. If not, make sure the WiFi is still working after desoldering of this cable. |
This is similar than issue #137 but since it is Ad-hoc related and the problem happen when a remote client disconnect(instead of connect), I have decided to repost it in its own thread.
I'm also providing new logs with all ath debugs enabled.
Issue:
On AWUS036NHA in Ad-Hoc mode on kernel 4.15 / 4.16. We are using a network of around 8 devices connected to NUC PCs in our tests. Every time a device disconnect, other devices have a chance to crash. I reproduce it by rebooting one of the PC and eventually other devices start crashing the kernel thread freeze on those crashed nodes.
First the log show a working sta entry remove, and then later it show one that crashed the card.
Here is the kernel log:
7,9361,911527345,-;ath: phy2: Choose slot: 0, tsf: 528240644, tsftu: 515860, intval: 100
7,9362,911629831,-;ath: phy2: Choose slot: 0, tsf: 528343045, tsftu: 515960, intval: 100
7,9363,911732122,-;ath: phy2: Choose slot: 0, tsf: 528445445, tsftu: 516060, intval: 100
7,9364,911744161,-;ath: phy2: Set HW Key
4,9365,911745132,-;FDB: ath9k_htc_sta_remove before cancel_work_sync
4,9366,911745139,-;FDB: ath9k_htc_sta_remove after cancel_work_sync
7,9367,911745618,-;ath: phy2: Removed a station entry for: 00:c0:ca:97:3a:cc (idx: 4)
7,9368,911834579,-;ath: phy2: Choose slot: 0, tsf: 528547842, tsftu: 516160, intval: 100
7,9369,911936951,-;ath: phy2: Choose slot: 0, tsf: 528650245, tsftu: 516260, intval: 100
7,9370,912039331,-;ath: phy2: Choose slot: 0, tsf: 528752643, tsftu: 516360, intval: 100
...
7,9439,919102079,-;ath: phy2: Choose slot: 0, tsf: 535818244, tsftu: 523260, intval: 100
7,9440,919204380,-;ath: phy2: Choose slot: 0, tsf: 535920644, tsftu: 523360, intval: 100
7,9441,919306830,-;ath: phy2: Choose slot: 0, tsf: 536023045, tsftu: 523460, intval: 100
4,9442,919392253,-;FDB: ath9k_htc_sta_remove before cancel_work_sync
7,9443,919408963,-;ath: phy2: Choose slot: 0, tsf: 536125445, tsftu: 523560, intval: 100
7,9444,919511254,-;ath: phy2: Choose slot: 0, tsf: 536227843, tsftu: 523660, intval: 100
7,9445,919613631,-;ath: phy2: Choose slot: 0, tsf: 536330244, tsftu: 523760, intval: 100
...
7,9572,932615043,-;ath: phy2: Choose slot: 0, tsf: 549335042, tsftu: 536460, intval: 100
7,9573,932717546,-;ath: phy2: Choose slot: 0, tsf: 549437443, tsftu: 536560, intval: 100
7,9574,932819792,-;ath: phy2: Choose slot: 0, tsf: 549539845, tsftu: 536660, intval: 100
4,9575,932832054,-;asynchronous wait on fence i915:gnome-shell[781]/1:2963 timed out
7,9576,932922171,-;ath: phy2: Choose slot: 0, tsf: 549642243, tsftu: 536760, intval: 100
7,9577,933024543,-;ath: phy2: Choose slot: 0, tsf: 549744645, tsftu: 536860, intval: 100
7,9578,933126926,-;ath: phy2: Choose slot: 0, tsf: 549847043, tsftu: 536960, intval: 100
7,9579,933229292,-;ath: phy2: Choose slot: 0, tsf: 549949442, tsftu: 537060, intval: 100
7,9580,933434044,-;ath: phy2: Resuming beacon xmit after 1 misses
7,9581,933434045,-;ath: phy2: Choose slot: 0, tsf: 550154242, tsftu: 537260, intval: 100
7,9582,933536543,-;ath: phy2: Choose slot: 0, tsf: 550256644, tsftu: 537360, intval: 100
7,9583,933638920,-;ath: phy2: Choose slot: 0, tsf: 550359046, tsftu: 537460, intval: 100
...
7,9647,940190669,-;ath: phy2: Choose slot: 0, tsf: 556912645, tsftu: 543860, intval: 100
7,9648,940293051,-;ath: phy2: Choose slot: 0, tsf: 557015044, tsftu: 543960, intval: 100
7,9649,940395542,-;ath: phy2: Choose slot: 0, tsf: 557117443, tsftu: 544060, intval: 100
3,9650,940404014,-;INFO: rcu_sched self-detected stall on CPU
3,9651,940404019,-;\x091-....: (5249 ticks this GP) idle=cca/1/4611686018427387906 softirq=115167/115167 fqs=2625
3,9652,940404020,-;\x09 (t=5250 jiffies g=75919 c=75918 q=1429)
4,9653,940404022,-;NMI backtrace for cpu 1
4,9654,940404024,-;CPU: 1 PID: 131 Comm: kworker/u8:2 Tainted: G W 4.16.16-custom #1
4,9655,940404025,-;Hardware name: Intel Corporation NUC7i5DNB/NUC7i5DNB, BIOS DNKBLi5v.86A.0040.2018.0315.1451 03/15/2018
4,9656,940404039,-;Workqueue: phy2 ieee80211_iface_work [mac80211]
4,9657,940404040,-;Call Trace:
4,9658,940404043,-;
4,9659,940404046,-; dump_stack+0x5c/0x85
4,9660,940404048,-; nmi_cpu_backtrace+0xbf/0xd0
4,9661,940404050,-; ? lapic_can_unplug_cpu+0xa0/0xa0
4,9662,940404051,-; nmi_trigger_cpumask_backtrace+0xf5/0x130
4,9663,940404053,-; rcu_dump_cpu_stacks+0x9e/0xd7
4,9664,940404055,-; rcu_check_callbacks+0x6c8/0x910
4,9665,940404057,-; ? update_wall_time+0x474/0x6f0
4,9666,940404059,-; ? tick_sched_do_timer+0x40/0x40
4,9667,940404060,-; update_process_times+0x28/0x50
4,9668,940404061,-; tick_sched_handle+0x22/0x70
4,9669,940404062,-; tick_sched_timer+0x34/0x70
4,9670,940404064,-; __hrtimer_run_queues+0x108/0x290
4,9671,940404065,-; hrtimer_interrupt+0xe5/0x240
4,9672,940404067,-; smp_apic_timer_interrupt+0x62/0x120
4,9673,940404068,-; apic_timer_interrupt+0xf/0x20
4,9674,940404069,-;
4,9675,940404071,-;RIP: 0010:try_to_grab_pending+0x118/0x150
4,9676,940404072,-;RSP: 0018:ffffb8ab41ee3cc0 EFLAGS: 00000287 ORIG_RAX: ffffffffffffff12
4,9677,940404073,-;RAX: 00000000fffffffe RBX: ffff8e3ac9c05928 RCX: 0000000000000000
4,9678,940404074,-;RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000286
4,9679,940404074,-;RBP: ffffb8ab41ee3ce8 R08: ffff8e3b20002fe8 R09: ffff8e3b30c211c0
4,9680,940404075,-;R10: 0000000000000000 R11: 0000000000000040 R12: ffff8e3b30c211c0
4,9681,940404075,-;R13: ffffb8ab41ee3d08 R14: ffffffffb4890810 R15: ffff8e3ac9c05000
4,9682,940404077,-; ? get_work_pool+0x40/0x40
4,9683,940404079,-; __cancel_work_timer+0x42/0x1b0
4,9684,940404083,-; ath9k_htc_sta_remove+0x3b/0xa0 [ath9k_htc]
4,9685,940404089,-; drv_sta_state+0x25b/0x3f0 [mac80211]
4,9686,940404096,-; sta_info_move_state+0x181/0x260 [mac80211]
4,9687,940404102,-; __sta_info_destroy_part2+0x54/0x110 [mac80211]
4,9688,940404107,-; __sta_info_destroy+0x27/0x40 [mac80211]
4,9689,940404113,-; ieee80211_ibss_work+0x1be/0x580 [mac80211]
4,9690,940404115,-; ? kmem_cache_free+0x19c/0x1d0
4,9691,940404117,-; ? skb_dequeue+0x52/0x60
4,9692,940404124,-; ? ieee80211_iface_work+0xbe/0x340 [mac80211]
4,9693,940404125,-; process_one_work+0x17b/0x360
4,9694,940404127,-; worker_thread+0x2e/0x390
4,9695,940404129,-; ? process_one_work+0x360/0x360
4,9696,940404130,-; kthread+0x113/0x130
4,9697,940404131,-; ? kthread_create_worker_on_cpu+0x70/0x70
4,9698,940404133,-; ret_from_fork+0x35/0x40
7,9699,940497752,-;ath: phy2: Choose slot: 0, tsf: 557219844, tsftu: 544160, intval: 100
The text was updated successfully, but these errors were encountered: