Deadlock on Tempesta shutdown #2054

EvgeniiMekhanik · 2024-02-05T15:42:34Z

test_reaching_the_limit_2 (t_frang.test_http_resp_code_block.HttpRespCodeBlockOneClientHttp) ...

Exception in stopping process: Can't stop TempestaFW on 192.168.50.115 (TimeoutError: ), type: <class 'helpers.error.Error'>

b"[tdb] Start Tempesta DB\n[tempesta fw] Initializing Tempesta FW kernel module...\n[tempesta fw] Warning: Vhost default doesn't have certificate with matching SAN/CN.\n Maybe that's fine, but it's worth checking the\n config - if there is no relations between the\n names, then host name confusion attack is possible.\n[tempesta fw] Configuration processing is completed.\n[tdb] Opened table /opt/tempesta/db/filter0.tdb: size=16777216 rec_size=20 base=00000000c3a79e81\n[tdb] Opened table /opt/tempesta/db/sessions0.tdb: size=16777216 rec_size=312 base=000000000765bc12\n[tdb] Opened table /opt/tempesta/db/client0.tdb: size=16777216 rec_size=624 base=000000008301b52f\n[tempesta fw] Open listen socket on: 0.0.0.0:443\n[tempesta fw] Open listen socket on: 0.0.0.0:81\n[tempesta fw] Open listen socket on: 0.0.0.0\n[tempesta fw] Tempesta FW is ready\n[tempesta fw] Warning: frang: http_resp_code_block limit exceeded for 192.168.50.188: 6 (lim=5)\n[tempesta fw] Warning: response blocked: filtered out: 192.168.50.188\nwatchdog:

BUG: soft lockup - CPU#0 stuck for 23s! [sysctl:529345]\nModules linked in: tempesta_fw(OE) tempesta_db(OE) tempesta_tls(OE) tempesta_lib(OE) iptable_mangle xt_mark tcp_diag inet_diag xt_nat xt_tcpudp veth sha256_ssse3 sha512_ssse3 xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter bpfilter br_netfilter bridge stp llc md4 cmac nls_utf8 cifs libarc4 fscache libdes overlay nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua kvm_amd ccp sch_fq_codel kvm binfmt_misc input_leds joydev netconsole serio_raw qemu_fw_cfg mac_hid msr ramoops reed_solomon efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper psmouse bochs_drm\n drm_vram_helper drm_ttm_helper ttm virtio_scsi drm_kms_helper syscopyarea sysfillrect sysimgblt e1000 fb_sys_fops cec i2c_piix4 pata_acpi drm floppy [last unloaded: tempesta_lib]\nCPU: 0 PID: 529345 Comm: sysctl Kdump: loaded Tainted: G OE 5.10.35.tfw-310ffae #1\nHardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014\nRIP: 0010:native_safe_halt+0xe/0x10\nCode: 7b ff ff ff eb bd cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d 36 bf 47 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 26 bf 47 00 fb f4 cc 0f 1f 44 00 00 55 48 89 e5 53 65 8b 15 af 0d 68 7c 0f 1f 44\nRSP: 0018:ffffbfa502273b60 EFLAGS: 00000202\nRAX: 0000000000000003 RBX: 0000000000000246 RCX: 0000000000000008\nRDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff9c037a80c220\nRBP: ffffbfa502273b70 R08: ffff9c03bffcb800 R09: 0000000000000084\nR10: 0000000000000000 R11: 0000000000000001 R12: ffff9c037a80c220\nR13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000100\nFS: 00007f7d11c2a580(0000) GS:ffff9c03b7c00000(0000) knlGS:0000000000000000\nCS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033\nCR2: 000056330c46b618 CR3: 00000001334d8002 CR4: 0000000000770ef0\nPKRU: 55555554\nCall Trace:\n ?

kvm_wait+0x50/0x60\n __pv_queued_spin_lock_slowpath+0x27b/0x2c0\n _raw_spin_lock+0x1e/0x30\n tfw_sock_srv_del_conns+0x83/0x160 [tempesta_fw]\n tfw_server_destroy+0x1f/0x80 [tempesta_fw]\n tfw_sock_srv_disconnect+0xeb/0xf0 [tempesta_fw]\n tfw_sock_srv_disconnect_srv+0x52/0x80 [tempesta_fw]\n ? tfw_sock_srv_stop+0x150/0x150 [tempesta_fw]\n tfw_sg_for_each_srv+0x91/0xf0 [tempesta_fw]\n tfw_sock_srv_stop+0x107/0x150 [tempesta_fw]\n tfw_mods_stop+0x38/0xb0 [tempesta_fw]\n tfw_ctlfn_state_io+0x12b/0x340 [tempesta_fw]\n ? tfw_cleanup+0x30/0x30 [tempesta_fw]\n proc_sys_call_handler+0x13f/0x240\n proc_sys_write+0x13/0x20\n new_sync_write+0x117/0x1b0\n vfs_write+0x185/0x250\n ksys_write+0x67/0xe0\n __x64_sys_write+0x1a/0x20\n do_syscall_64+0x38/0x90\n entry_SYSCALL_64_after_hwframe+0x44/0xa9\nRIP: 0033:0x7f7d11b45297\nCode: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24\nRSP: 002b:00007fffb0d8a4b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001\nRAX: ffffffffffffffda RBX: 000056330c4674a0 RCX: 00007f7d11b45297\nRDX: 0000000000000005 RSI: 000056330c4674e0 RDI: 0000000000000004\nRBP: 000056330c469610 R08: 0000000000000010 R09: 0000000000000005\nR10: 000056330b42616c R11: 0000000000000246 R12: 0000000000000005\nR13: 0000000000000005 R14: 00007f7d11c1fa20 R15: 00007f7d11c1f8a0\n"

There was a race between `tfw_sock_srv_disconnect` and `tfw_srv_conn_release` when last one is called from `ss_conn_drop_guard_exit` when we process FIN from remote peer. Connection can be released after we check that connection refcount is not equal to TFW_CONN_DEATHCNT and before we call `tfw_connection_close`. Later we increment connection reference counter (for already stopped connection, which is equal to zero) and put it again. This leads to second connection release and extra decrement of struct server reference counter. We need to call `__tfw_connection_get_if_live` instead of simple check that connection reference counter is not equal to TFW_CONN_DEATHCNT before connection closing. Closes #2047 #2054

There was a race between `tfw_sock_srv_disconnect` and `tfw_srv_conn_release` when last one is called from `ss_conn_drop_guard_exit` when we process FIN from remote peer. Connection can be released after we check that connection refcount is not equal to TFW_CONN_DEATHCNT and before we call `tfw_connection_close`. Later we increment connection reference counter (for already stopped connection, which is equal to zero) and put it again. This leads to second connection release and extra decrement of struct server reference counter. We need to call `__tfw_connection_get_if_not_death` instead of simple check that connection reference counter is not equal to TFW_CONN_DEATHCNT before connection closing. Closes #2047 #2054

EvgeniiMekhanik · 2024-04-19T07:40:21Z

Closed by PR # 2070

EvgeniiMekhanik self-assigned this Feb 5, 2024

krizhanovsky added this to the 0.8 - Beta milestone Feb 5, 2024

krizhanovsky added the bug label Feb 5, 2024

EvgeniiMekhanik mentioned this issue Feb 29, 2024

Fix race in tfw_sock_srv_disconnect. #2070

Merged

EvgeniiMekhanik closed this as completed Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock on Tempesta shutdown #2054

Deadlock on Tempesta shutdown #2054

EvgeniiMekhanik commented Feb 5, 2024

EvgeniiMekhanik commented Apr 19, 2024

Deadlock on Tempesta shutdown #2054

Deadlock on Tempesta shutdown #2054

Comments

EvgeniiMekhanik commented Feb 5, 2024

EvgeniiMekhanik commented Apr 19, 2024