Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in tfw_http_conn_msg_free() #956

Closed
krizhanovsky opened this issue Mar 14, 2018 · 1 comment
Closed

Crash in tfw_http_conn_msg_free() #956

krizhanovsky opened this issue Mar 14, 2018 · 1 comment
Assignees
Milestone

Comments

@krizhanovsky
Copy link
Contributor

Prerequisites

Current Tempesta FW master as of ac6f085 with #926 fixed kernel https://github.com/tempesta-tech/linux-4.9.35-tfw/pull/5 . Run full test suite with default settings.

Crash

All the tests passes until

test_502_resp_fault (regression.test_stress_pipeline.PipelineFaultInjection) ... ERROR
Exception in stoping Tempesta: Remote error: Cannot remove stap from Tempesta., stdout = , stderr = rmmod: ERROR: Module stap_tempesta is not currently loaded

ERROR
test_cache (regression.test_reboot.RebootUnderLoadNoTimeoutTest) ... FAIL
test_proxy (regression.test_reboot.RebootUnderLoadNoTimeoutTest) ... FAIL
test_cache (regression.test_reboot.RebootUnderLoadTest) ... FAIL
test_proxy (regression.test_reboot.RebootUnderLoadTest) ... FAIL
test_204_with_body (regression.test_invalid.TestInvalidResponse) ... expected failure
packet_write_wait: Connection to 192.168.100.4 port 22: Broken pipe

Since regression.test_invalid.TestInvalidResponse seems finished it's unclear which test exactly causes the bug.

Oopses

[ 1719.184485] [tdb] Shutdown Tempesta DB
[ 1719.613603] [tdb] Start Tempesta DB
[ 1719.640209] [tempesta] Initializing Tempesta FW kernel module...
[ 1719.667893] [tempesta] Registering new scheduler: hash
[ 1719.678375] [tempesta] Registering new scheduler: http
[ 1719.718320] [tdb] Opened table /opt/tempesta/db/filter.tdb: size=16777216 rec_size=20 base=ffff98a539c00000
[ 1719.909950] [tdb] Close table 'filter0.tdb'
[ 1720.232417] [tdb] Shutdown Tempesta DB
[ 1720.708343] [tdb] Start Tempesta DB
[ 1720.834648] [tdb] Opened table /opt/tempesta/db/filter.tdb: size=16777216 rec_size=20 base=ffff98a539c00000
[ 1720.860242] ------------[ cut here ]------------
[ 1720.861670] WARNING: CPU: 1 PID: 0 at /root/tempesta/tempesta_fw/http.c:1560 tfw_http_req_destruct+0x96/0xc0 [tempesta_fw]
[ 1720.863915] Modules linked in: tfw_sched_ratio(O) tfw_sched_http(O) tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) tempesta_tls(O) binfmt_misc crct10dif_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 bochs_drm ttm lrw drm_kms_helper gf128mul glue_helper ablk_helper cryptd ppdev drm fb_sys_fops syscopyarea sysfillrect sg sysimgblt parport_pc parport serio_raw pcspkr button ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache sr_mod sd_mod cdrom ata_generic ata_piix libata e1000 psmouse scsi_mod i2c_piix4 [last unloaded: tempesta_tls]
[ 1720.876932] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           O    4.9.35-kdump #144
[ 1720.878458] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[ 1720.880109]  ffff98a53f283648 ffffffffb9afa4e5 0000000000000000 0000000000000000
[ 1720.880337]  ffff98a53f283688 ffffffffb9861d41 000006183f283718 ffff98a48e0fd020
[ 1720.880337]  ffff98a48b912eb0 ffff98a53f283718 ffff98a53f283718 ffff98a53f283718
[ 1720.880337] Call Trace:
[ 1720.880337]  <IRQ> [ 1720.880337]  [<ffffffffb9afa4e5>] dump_stack+0x68/0x93
[ 1720.880337]  [<ffffffffb9861d41>] __warn+0xc1/0xe0
[ 1720.880337]  [<ffffffffb9861e68>] warn_slowpath_null+0x18/0x20
[ 1720.880337]  [<ffffffffc0705706>] tfw_http_req_destruct+0x96/0xc0 [tempesta_fw]
[ 1720.880337]  [<ffffffffc0709da5>] tfw_http_msg_free+0xa5/0xc0 [tempesta_fw]
[ 1720.880337]  [<ffffffffc0704f25>] tfw_http_conn_msg_free+0x55/0x70 [tempesta_fw]
[ 1720.880337]  [<ffffffffc07058b6>] tfw_http_resp_fwd+0x186/0x300 [tempesta_fw]
[ 1720.880337]  [<ffffffffc0705bef>] tfw_http_send_resp+0x1bf/0x1e0 [tempesta_fw]
[ 1720.880337]  [<ffffffffc07071df>] tfw_http_srv_error_resp_and_log.part.32+0x5f/0x100 [tempesta_fw]
[ 1720.880337]  [<ffffffffc0707319>] tfw_http_srv_error_resp_and_log+0x99/0x110 [tempesta_fw]
[ 1720.880337]  [<ffffffffc0707da5>] tfw_http_msg_process+0x595/0xb60 [tempesta_fw]
[ 1720.880337]  [<ffffffffb9d005b0>] ? ip_finish_output+0x1c0/0x320
[ 1720.880337]  [<ffffffffb9c98dad>] ? __alloc_skb+0x3d/0x200
[ 1720.880337]  [<ffffffffc070328f>] __gfsm_fsm_exec+0x4f/0x90 [tempesta_fw]
[ 1720.880337]  [<ffffffffc07034af>] tfw_gfsm_dispatch+0xf/0x20 [tempesta_fw]
[ 1720.880337]  [<ffffffffc07029f9>] tfw_connection_recv+0x49/0x60 [tempesta_fw]
[ 1720.880337]  [<ffffffffc0727183>] ss_tcp_process_data+0x1c3/0x490 [tempesta_fw]
[ 1720.880337]  [<ffffffffc0727afa>] ss_tcp_data_ready+0x5a/0xa0 [tempesta_fw]
[ 1720.880337]  [<ffffffffb9d18558>] tcp_rcv_established+0x478/0x720
[ 1720.880337]  [<ffffffffb9d2534f>] ? tcp_v4_rcv+0xa0f/0xba0
[ 1720.880337]  [<ffffffffb9d23e60>] tcp_v4_do_rcv+0x140/0x200
[ 1720.880337]  [<ffffffffb9d253dd>] tcp_v4_rcv+0xa9d/0xba0
[ 1720.880337]  [<ffffffffb9cfa166>] ip_local_deliver_finish+0xe6/0x380
[ 1720.880337]  [<ffffffffb9cfa0aa>] ? ip_local_deliver_finish+0x2a/0x380
[ 1720.880337]  [<ffffffffb9cfa64f>] ip_local_deliver+0x6f/0x210
[ 1720.880337]  [<ffffffffc0702df6>] ? tfw_ipv4_nf_hook+0xc6/0x150 [tempesta_fw]
[ 1720.880337]  [<ffffffffb9cef0d8>] ? nf_iterate+0x58/0xe0
[ 1720.880337]  [<ffffffffb9cf9bd1>] ip_rcv_finish+0x1d1/0x680
[ 1720.880337]  [<ffffffffb9cfaa8b>] ip_rcv+0x29b/0x500
[ 1720.880337]  [<ffffffffb9cfaac7>] ? ip_rcv+0x2d7/0x500
[ 1720.880337]  [<ffffffffb9cf9a00>] ? inet_del_offload+0x40/0x40
[ 1720.880337]  [<ffffffffb9cacf75>] __netif_receive_skb_core+0x215/0xca0
[ 1720.880337]  [<ffffffffb9cadb29>] ? process_backlog+0xc9/0x220
[ 1720.880337]  [<ffffffffb9cadb29>] ? process_backlog+0xc9/0x220
[ 1720.880337]  [<ffffffffb9cada13>] __netif_receive_skb+0x13/0x60
[ 1720.880337]  [<ffffffffb9cadacd>] process_backlog+0x6d/0x220
[ 1720.880337]  [<ffffffffb9cadb29>] ? process_backlog+0xc9/0x220
[ 1720.880337]  [<ffffffffb9caf481>] net_rx_action+0x211/0x300
[ 1720.880337]  [<ffffffffb986726e>] __do_softirq+0x11e/0x250
[ 1720.880337]  [<ffffffffb986752e>] irq_exit+0xbe/0xd0
[ 1720.880337]  [<ffffffffb98341de>] smp_trace_call_function_single_interrupt+0x2e/0x30
[ 1720.880337]  [<ffffffffb98341e9>] smp_call_function_single_interrupt+0x9/0x10
[ 1720.880337]  [<ffffffffb9dde859>] call_function_single_interrupt+0x89/0x90
[ 1720.880337]  <EOI> [ 1720.880337]  [<ffffffffb9ddcaf9>] ? default_idle+0x9/0x10
[ 1720.880337]  [<ffffffffb9ddcafb>] ? default_idle+0xb/0x10
[ 1720.880337]  [<ffffffffb981e63a>] arch_cpu_idle+0xa/0x10
[ 1720.880337]  [<ffffffffb9ddcc9e>] default_idle_call+0x1e/0x30
[ 1720.880337]  [<ffffffffb98a87c5>] cpu_startup_entry+0x1c5/0x230
[ 1720.880337]  [<ffffffffb9834ac1>] start_secondary+0x141/0x160
[ 1720.957422] ---[ end trace 8e5171475ac85876 ]---
[ 1720.964868] ------------[ cut here ]------------
[ 1720.966257] WARNING: CPU: 1 PID: 0 at /root/tempesta/tempesta_fw/sock.c:408 ss_send+0x251/0x280 [tempesta_fw]
[ 1720.968439] Modules linked in: tfw_sched_ratio(O) tfw_sched_http(O) tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) tempesta_tls(O) binfmt_misc crct10dif_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 bochs_drm ttm lrw drm_kms_helper gf128mul glue_helper ablk_helper cryptd ppdev drm fb_sys_fops syscopyarea sysfillrect sg sysimgblt parport_pc parport serio_raw pcspkr button ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache sr_mod sd_mod cdrom ata_generic ata_piix libata e1000 psmouse scsi_mod i2c_piix4 [last unloaded: tempesta_tls]
[ 1720.979755] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W  O    4.9.35-kdump #144
[ 1720.980106] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[ 1720.980106]  ffff98a53f283820 ffffffffb9afa4e5 0000000000000000 0000000000000000
[ 1720.980106]  ffff98a53f283860 ffffffffb9861d41 000001983f283858 ffff98a4b00d4ec0
[ 1720.980106]  ffff98a493d89500 ffff98a48e0fd3f8 ffff98a48e0fd038 ffff98a49881fee0
[ 1720.980106] Call Trace:
[ 1720.980106]  <IRQ> [ 1720.980106]  [<ffffffffb9afa4e5>] dump_stack+0x68/0x93
[ 1720.980106]  [<ffffffffb9861d41>] __warn+0xc1/0xe0
[ 1720.980106]  [<ffffffffb9861e68>] warn_slowpath_null+0x18/0x20
[ 1720.980106]  [<ffffffffc0728211>] ss_send+0x251/0x280 [tempesta_fw]
[ 1720.980106]  [<ffffffffc0703fd7>] tfw_http_conn_send+0x17/0x20 [tempesta_fw]
[ 1720.980106]  [<ffffffffc070299d>] tfw_connection_send+0x1d/0x30 [tempesta_fw]
[ 1720.980106]  [<ffffffffc07046c6>] tfw_http_conn_resend+0x1f6/0x500 [tempesta_fw]
[ 1720.980106]  [<ffffffffc07068ea>] ? tfw_http_conn_repair+0x5a/0x8f0 [tempesta_fw]
[ 1720.980106]  [<ffffffffc070692e>] tfw_http_conn_repair+0x9e/0x8f0 [tempesta_fw]
[ 1720.980106]  [<ffffffffc07028ed>] tfw_connection_repair+0x1d/0x20 [tempesta_fw]
[ 1720.980106]  [<ffffffffc072a71a>] tfw_sock_srv_connect_complete+0xda/0xf0 [tempesta_fw]
[ 1720.980106]  [<ffffffffb98db82a>] ? ktime_get_with_offset+0xaa/0x140
[ 1720.980106]  [<ffffffffb9d5e3f1>] ? bictcp_init+0xa1/0x100
[ 1720.980106]  [<ffffffffc07279c5>] ss_tcp_state_change+0xe5/0x1c0 [tempesta_fw]
[ 1720.980106]  [<ffffffffb9d18a7b>] tcp_finish_connect+0xeb/0x230
[ 1720.980106]  [<ffffffffb9d1922b>] tcp_rcv_state_process+0x66b/0xed0
[ 1720.980106]  [<ffffffffb9d23dcd>] tcp_v4_do_rcv+0xad/0x200
[ 1720.980106]  [<ffffffffb9d253dd>] tcp_v4_rcv+0xa9d/0xba0
[ 1720.980106]  [<ffffffffb9cfa166>] ip_local_deliver_finish+0xe6/0x380
[ 1720.980106]  [<ffffffffb9cfa0aa>] ? ip_local_deliver_finish+0x2a/0x380
[ 1720.980106]  [<ffffffffb9cfa64f>] ip_local_deliver+0x6f/0x210
[ 1720.980106]  [<ffffffffc0702df6>] ? tfw_ipv4_nf_hook+0xc6/0x150 [tempesta_fw]
[ 1720.980106]  [<ffffffffb9cef0d8>] ? nf_iterate+0x58/0xe0
[ 1720.980106]  [<ffffffffb9cf9bd1>] ip_rcv_finish+0x1d1/0x680
[ 1720.980106]  [<ffffffffb9cfaa8b>] ip_rcv+0x29b/0x500
[ 1720.980106]  [<ffffffffb9cfaac7>] ? ip_rcv+0x2d7/0x500
[ 1720.980106]  [<ffffffffb9cf9a00>] ? inet_del_offload+0x40/0x40
[ 1720.980106]  [<ffffffffb9cacf75>] __netif_receive_skb_core+0x215/0xca0
[ 1720.980106]  [<ffffffffb9cadb29>] ? process_backlog+0xc9/0x220
[ 1720.980106]  [<ffffffffb9cadb29>] ? process_backlog+0xc9/0x220
[ 1720.980106]  [<ffffffffb9cada13>] __netif_receive_skb+0x13/0x60
[ 1720.980106]  [<ffffffffb9cadacd>] process_backlog+0x6d/0x220
[ 1720.980106]  [<ffffffffb9cadb29>] ? process_backlog+0xc9/0x220
[ 1720.980106]  [<ffffffffb9caf481>] net_rx_action+0x211/0x300
[ 1720.980106]  [<ffffffffb986726e>] __do_softirq+0x11e/0x250
[ 1720.980106]  [<ffffffffb986752e>] irq_exit+0xbe/0xd0
[ 1720.980106]  [<ffffffffb98341de>] smp_trace_call_function_single_interrupt+0x2e/0x30
[ 1720.980106]  [<ffffffffb98341e9>] smp_call_function_single_interrupt+0x9/0x10
[ 1720.980106]  [<ffffffffb9dde859>] call_function_single_interrupt+0x89/0x90
[ 1720.980106]  <EOI> [ 1720.980106]  [<ffffffffb9ddcaf9>] ? default_idle+0x9/0x10
[ 1720.980106]  [<ffffffffb9ddcafb>] ? default_idle+0xb/0x10
[ 1720.980106]  [<ffffffffb981e63a>] arch_cpu_idle+0xa/0x10
[ 1720.980106]  [<ffffffffb9ddcc9e>] default_idle_call+0x1e/0x30
[ 1720.980106]  [<ffffffffb98a87c5>] cpu_startup_entry+0x1c5/0x230
[ 1720.980106]  [<ffffffffb9834ac1>] start_secondary+0x141/0x160
[ 1721.058398] ---[ end trace 8e5171475ac85877 ]---
[ 1721.162327] BUG: unable to handle kernel paging request at ffff98a48b912f50
[ 1721.164017] IP: [<ffffffffc0704ef2>] tfw_http_conn_msg_free+0x22/0x70 [tempesta_fw]
[ 1721.165615] PGD 515f8067 [ 1721.165987] PUD 13fb06067
PMD 13faa9067 [ 1721.166285] PTE 800000008b912060
[ 1721.166285]
[ 1721.166285] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[ 1721.166285] Modules linked in: tfw_sched_ratio(O) tfw_sched_http(O) tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) tempesta_tls(O) binfmt_misc crct10dif_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 bochs_drm ttm lrw drm_kms_helper gf128mul glue_helper ablk_helper cryptd ppdev drm fb_sys_fops syscopyarea sysfillrect sg sysimgblt parport_pc parport serio_raw pcspkr button ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache sr_mod sd_mod cdrom ata_generic ata_piix libata e1000 psmouse scsi_mod i2c_piix4 [last unloaded: tempesta_tls]
[ 1721.166285] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W  O    4.9.35-kdump #144
[ 1721.166285] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[ 1721.166285] task: ffff98a51a3bc300 task.stack: ffffabf240684000
[ 1721.166285] RIP: 0010:[<ffffffffc0704ef2>]  [<ffffffffc0704ef2>] tfw_http_conn_msg_free+0x22/0x70 [tempesta_fw]
[ 1721.166285] RSP: 0018:ffff98a53f283da8  EFLAGS: 00010246
[ 1721.166285] RAX: ffff98a48e0fd020 RBX: ffff98a48e0fd020 RCX: 0000000000000000
[ 1721.166285] RDX: ffff98a48b912eb0 RSI: 0000000000000200 RDI: ffff98a48e0fd020
[ 1721.166285] RBP: ffff98a53f283db8 R08: ffff98a48e0fd3f8 R09: 0000000000000000
[ 1721.166285] R10: 0000000000000000 R11: 0000000000000000 R12: ffff98a53f283dd8
[ 1721.166285] R13: ffff98a53f283dd8 R14: ffff98a53f283a00 R15: ffff98a4b00d4ba8
[ 1721.166285] FS:  0000000000000000(0000) GS:ffff98a53f280000(0000) knlGS:0000000000000000
[ 1721.166285] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1721.166285] CR2: ffff98a48b912f50 CR3: 000000010c80e000 CR4: 00000000003406e0
[ 1721.166285] Stack:
[ 1721.166285]  ffffffffb9ddd200 ffff98a53f283a00 ffff98a53f283e28 ffffffffc070681e
[ 1721.166285]  ffff98a4b00d4fa0 ffff98a4b00d4f90 ffff98a53f283dd8 ffff98a53f283dd8
[ 1721.166285]  0000000000000246 76c76f163c50ec22 0000000000000246 ffff98a4b00d4ec0
[ 1721.166285] Call Trace:
[ 1721.166285]  <IRQ> [ 1721.166285]  [<ffffffffb9ddd200>] ? _raw_spin_unlock_bh+0x30/0x40
[ 1721.166285]  [<ffffffffc070681e>] tfw_http_conn_release+0x1ce/0x240 [tempesta_fw]
[ 1721.166285]  [<ffffffffc0702957>] tfw_connection_release+0x27/0x50 [tempesta_fw]
[ 1721.166285]  [<ffffffffc072b611>] tfw_srv_conn_release+0x11/0x80 [tempesta_fw]
[ 1721.166285]  [<ffffffffc072adc3>] tfw_sock_srv_connect_drop+0x53/0x60 [tempesta_fw]
[ 1721.166285]  [<ffffffffb9ddd1c2>] ? _raw_spin_unlock+0x22/0x30
[ 1721.166285]  [<ffffffffc0727e2b>] ss_tx_action+0x2eb/0x480 [tempesta_fw]
[ 1721.166285]  [<ffffffffc0727b40>] ? ss_tcp_data_ready+0xa0/0xa0 [tempesta_fw]
[ 1721.166285]  [<ffffffffb9cac7bd>] net_tx_action+0xed/0x280
[ 1721.166285]  [<ffffffffb9cac760>] ? net_tx_action+0x90/0x280
[ 1721.166285]  [<ffffffffb986726e>] __do_softirq+0x11e/0x250
[ 1721.166285]  [<ffffffffb986752e>] irq_exit+0xbe/0xd0
[ 1721.166285]  [<ffffffffb98341de>] smp_trace_call_function_single_interrupt+0x2e/0x30
[ 1721.166285]  [<ffffffffb98341e9>] smp_call_function_single_interrupt+0x9/0x10
[ 1721.166285]  [<ffffffffb9dde859>] call_function_single_interrupt+0x89/0x90
[ 1721.166285]  <EOI> [ 1721.166285]  [<ffffffffb9ddcaf9>] ? default_idle+0x9/0x10
[ 1721.166285]  [<ffffffffb9ddcafb>] ? default_idle+0xb/0x10
[ 1721.166285]  [<ffffffffb981e63a>] arch_cpu_idle+0xa/0x10
[ 1721.166285]  [<ffffffffb9ddcc9e>] default_idle_call+0x1e/0x30
[ 1721.166285]  [<ffffffffb98a87c5>] cpu_startup_entry+0x1c5/0x230
[ 1721.166285]  [<ffffffffb9834ac1>] start_secondary+0x141/0x160
[ 1721.166285] Code: 0f 0b 0f 0b 0f 1f 44 00 00 48 85 ff 74 56 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 97 00 03 00 00 48 85 d2 74 30 48 89 f8 31 c9 <f0> 48 0f b1 8a a0 00 00 00 48 8b bf 00 03 00 00 48 85 ff 74 16
[ 1721.166285] RIP  [<ffffffffc0704ef2>] tfw_http_conn_msg_free+0x22/0x70 [tempesta_fw]
[ 1721.166285]  RSP <ffff98a53f283da8>
[ 1721.166285] CR2: ffff98a48b912f50
@krizhanovsky
Copy link
Contributor Author

krizhanovsky commented Mar 15, 2018

The bug is 100% reproducable with

# ./run_tests.py -vvn long_body.test_response_wrong_length.ResponseDuplicateBodyLength

The reason for the bug is the same as in #936: dereferencing freed TfwConn. The dereferencing happens on hm->conn->msg in tfw_http_conn_msg_free(). The first warning WARN_ON_ONCE(!list_empty(&req->fwd_list)); is crucial since it means that the request is freed being in fwd_list, so the list is corrupted and the request can be freed again. The root cause is that there are many places where we just get resp->req now and free the request w/o removing it from fwd_list.

krizhanovsky added a commit that referenced this issue Mar 18, 2018
…ss().

2. #959: print a warning and let the system crash if we can't remove a
   server grooup.
3. Add req->fwd_list and req->nip_list warnings to make sure that the req
   is removed from the list when we go to code where we don't have a pointer
   to the server connection.
4. Replace tfw_http_msg_free() by tfw_http_conn_msg_free() in couple of
   places to put connections on requests freeing to avoid memory leakage
   on connections reference counters.
5. Code leanups all around.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant