Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

it is observed that ovs-vswitchd is crashed when restart ovs-vswitchd many times. #154862

Closed
renweichun opened this issue Jan 13, 2022 · 1 comment
Labels
0.kind: bug Something is broken 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md

Comments

@renweichun
Copy link

Describe the bug

We are using ovs 2.14.1 and dpdk 20.08 on CentOS Linux release 8.3.2011 with kernel of 5.10.44 and glibc of 2.28. When I restart ovs-vswitchd many times, it is observed that ovs-vswitchd is crashed. After some debugging I found.
I have two ovs crash problems in context with DPDK. OVS process crashes when I restart ovs-vswitchd

First stack trace:
#0 ofproto_dpif_credit_table_stats (ofproto=0x3629d20, table_id=0 '\000', n_matches=195, n_misses=0)
at ofproto/ofproto-dpif.c:4350
#1 0x00000000010cbcac in xlate_push_stats_entry (entry=0x7fa270030588, stats=0x7fff1ed2cac0, offloaded=)
at ofproto/ofproto-dpif-xlate-cache.c:99
#2 0x00000000010cbe7b in xlate_push_stats (xcache=, stats=stats@entry=0x7fff1ed2cac0,
offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:181
#3 0x00000000010b8e27 in push_dp_ops (udpif=udpif@entry=0x36ace90, ops=ops@entry=0x7fff1ed2cfd0, n_ops=n_ops@entry=1)
at ofproto/ofproto-dpif-upcall.c:2409
#4 0x00000000010b9c0e in push_dp_ops (n_ops=n_ops@entry=1, ops=0x7fff1ed2cfd0, ops@entry=0x7fff1ed2b670,
udpif=udpif@entry=0x36ace90) at ofproto/ofproto-dpif-upcall.c:2441
#5 push_ukey_ops (udpif=udpif@entry=0x36ace90, umap=umap@entry=0x36b2288, ops=ops@entry=0x7fff1ed2cfd0,
n_ops=n_ops@entry=1) at ofproto/ofproto-dpif-upcall.c:2441
#6 0x00000000010b9d8b in dp_purge_cb (aux=0x36ace90, pmd_id=25) at ofproto/ofproto-dpif-upcall.c:2870
#7 0x00000000010eb476 in dp_netdev_del_pmd (dp=dp@entry=0x362b110, pmd=pmd@entry=0x7fab57cd8010) at lib/dpif-netdev.c:6555
#8 0x00000000010edec7 in reconfigure_pmd_threads (dp=0x362b110) at lib/dpif-netdev.c:5175
#9 reconfigure_datapath (dp=dp@entry=0x362b110) at lib/dpif-netdev.c:5266
#10 0x00000000010eedbd in do_del_port (dp=0x362b110, port=0x37798f0) at lib/dpif-netdev.c:2287
#11 0x00000000010ef287 in dpif_netdev_port_del (dpif=, port_no=27) at lib/dpif-netdev.c:2182
#12 0x00000000010f935f in dpif_port_del (dpif=0x325b360, port_no=27, local_delete=local_delete@entry=false)
at lib/dpif.c:631
#13 0x00000000010a71b2 in port_destruct (port_=0x3781520, del=) at ofproto/ofproto-dpif.c:2147
#14 0x00000000010935bb in ofport_destroy (port=0x3781520, del=) at ofproto/ofproto.c:2615
#15 0x000000000109b7c0 in ofproto_destroy (p=0x371f120, del=) at ofproto/ofproto.c:1722
#16 0x0000000001085a0e in bridge_destroy (br=0x327bf80, del=del@entry=false) at vswitchd/bridge.c:3605
#17 0x000000000108a369 in bridge_exit (delete_datapath=) at vswitchd/bridge.c:552
#18 0x0000000000573e29 in main (argc=, argv=) at vswitchd/ovs-vswitchd.c:143

(gdb) info reg
rax 0x0 0
rbx 0x7fa270030588 140335640675720
rcx 0x0 0
rdx 0xc3 195
rsi 0x0 0
rdi 0x3629d20 56794400
rbp 0x7fff1ed2cac0 0x7fff1ed2cac0
rsp 0x7fff1ed2ca48 0x7fff1ed2ca48
r8 0x0 0
r9 0x3803e90 58736272
r10 0x0 0
r11 0x6 6
r12 0x7fff1ed2cac0 140733710518976
r13 0x70030650 1879246416
r14 0x7fff1ed2cab8 140733710518968
r15 0x7fff1ed2cac0 140733710518976
rip 0x10a8818 0x10a8818 <ofproto_dpif_credit_table_stats+56>
eflags 0x10206 [ PF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
k0 0x0 0
k1 0x0 0
k2 0x0 0
k3 0x0 0
k4 0x0 0
k5 0x0 0
k6 0x0 0
k7 0x0 0

(gdb) disassemble 0x10a8818
Dump of assembler code for function ofproto_dpif_credit_table_stats:
0x00000000010a87e0 <+0>: movzbl %sil,%esi
0x00000000010a87e4 <+4>: mov %rsi,%rax
0x00000000010a87e7 <+7>: shl $0x4,%rax
0x00000000010a87eb <+11>: sub %rsi,%rax
0x00000000010a87ee <+14>: shl $0x4,%rax
0x00000000010a87f2 <+18>: add 0x128(%rdi),%rax
0x00000000010a87f9 <+25>: test %rdx,%rdx
0x00000000010a87fc <+28>: jne 0x10a8818 <ofproto_dpif_credit_table_stats+56>
0x00000000010a87fe <+30>: test %rcx,%rcx
0x00000000010a8801 <+33>: jne 0x10a8808 <ofproto_dpif_credit_table_stats+40>
0x00000000010a8803 <+35>: retq
0x00000000010a8804 <+36>: nopl 0x0(%rax)
0x00000000010a8808 <+40>: lock add %rcx,0xe8(%rax)
0x00000000010a8810 <+48>: retq
0x00000000010a8811 <+49>: nopl 0x0(%rax)
=> 0x00000000010a8818 <+56>: lock add %rdx,0xe0(%rax)
0x00000000010a8820 <+64>: jmp 0x10a87fe <ofproto_dpif_credit_table_stats+30>
End of assembler dump.

(gdb) p *ofproto
$3 = {all_ofproto_dpifs_by_name_node = {hash = 0, next = 0x3741ba0}, all_ofproto_dpifs_by_uuid_node = {hash = 57992720, next = 0x1051}, up = {hmap_node = {hash = 1163936137340, next = 0x3f0000003f},
ofproto_class = 0xce2b216a, type = 0x0, name = 0x0, fallback_dpid = 0, datapath_id = 0, forward_bpdu = false, mfr_desc = 0x59fdad4700000006 <error: Cannot access memory at address 0x59fdad4700000006>,
hw_desc = 0x430a8cdc868fc310 <error: Cannot access memory at address 0x430a8cdc868fc310>, sw_desc = 0x0, serial_desc = 0x7fa26009fd20 "", dp_desc = 0x7fa2600d23d0 "", frag_handling = 1611358112, ports = {
buckets = 0x0, one = 0x0, mask = 369490328463343620, n = 2758902708}, port_by_name = {map = {buckets = 0x0, one = 0x7fa2600a4980, mask = 140335372699424, n = 0}}, ofp_requests = {map = {buckets = 0x0,
one = 0x0, mask = 1020897070376026114, n = 0}}, alloc_port_no = 0, max_ports = 0, ofport_usage = {buckets = 0x7fa2600fce40, one = 0x0, mask = 0, n = 0}, change_seq = 0, eviction_group_timer = 0,
tables = 0x0, n_tables = 0, tables_version = 0, cookies = {buckets = 0x0, one = 0x0, mask = 0, n_unique = 0}, learned_cookies = {buckets = 0xa09f639c00000004, one = 0x2b33f99c, mask = 0,
n = 140335372891088}, expirable = {prev = 0x7fa260108cb0, next = 0x0}, meter_features = {max_meters = 0, band_types = 0, capabilities = 0, max_bands = 0 '\000', max_color = 0 '\000'}, meters = {
buckets = 0x4b4c0f7a00000002, one = 0x0, mask = 0, n = 140335372886432}, slowpath_meter_id = 0, controller_meter_id = 0, connmgr = 0x0, min_mtu = 0, groups = {impl = {p = 0x0}}, n_groups = {2,
1415127537, 0, 0}, ogf = {types = 0, capabilities = 0, max_groups = {1612239072, 32674, 0, 0}, ofpacts = {0, 0, 0, 0}}, metadata_tab = {p = 0x0}, vl_mff_map = {cmap = {impl = {p = 0x0}}, mutex = {lock = {
__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0},
where = 0xba27fb500000004 <error: Cannot access memory at address 0xba27fb500000004>}}}, backer = 0x906958e3, uuid = {parts = {0, 0, 1610935792, 32674}}, tables_version = 140335372744048, dump_seq = 0,
miss_rule = 0x0, no_packet_in_rule = 0x0, drop_frags_rule = 0xd79f750d00000006, netflow = 0xf1991026332d2be0, sflow = 0x0, ipfix = 0x7fa260043830, bundles = {buckets = 0x7fa2600a99a0, one = 0x7fa2601fd6b0,
mask = 0, n = 0}, ml = 0xb9971b5200000002, ms = 0x0, has_bonded_bundles = false, lacp_enabled = false, mbridge = 0x7fa260168b50, stats_mutex = {lock = {__data = {__lock = 0, __count = 0, __owner = 0,
__nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0xc2d8714b00000006}}, __size = '\000' <repeats 32 times>, "\006\000\000\000Kq\330", <incomplete sequence \302>,
__align = 0}, where = 0x3bb82d338c6f5ba6 <error: Cannot access memory at address 0x3bb82d338c6f5ba6>}, stats = {rx_packets = 0, tx_packets = 140335373557152, rx_bytes = 140335373226416,
tx_bytes = 140335372593392, rx_errors = 0, tx_errors = 0, rx_dropped = 8378066835995623426, tx_dropped = 0, multicast = 0, collisions = 140335373981728, rx_length_errors = 0, rx_over_errors = 0,
rx_crc_errors = 0, rx_frame_errors = 0, rx_fifo_errors = 12410622260254081028, rx_missed_errors = 1186231435, tx_aborted_errors = 0, tx_carrier_errors = 140335373939168, tx_fifo_errors = 140335373424512,
tx_heartbeat_errors = 0, tx_window_errors = 0, rx_1_to_64_packets = 0, rx_65_to_127_packets = 0, rx_128_to_255_packets = 0, rx_256_to_511_packets = 0, rx_512_to_1023_packets = 0,
rx_1024_to_1522_packets = 0, rx_1523_to_max_packets = 0, tx_1_to_64_packets = 0, tx_65_to_127_packets = 0, tx_128_to_255_packets = 6361799859137150982, tx_256_to_511_packets = 8919369999343897625,
tx_512_to_1023_packets = 0, tx_1024_to_1522_packets = 140335373711424, tx_1523_to_max_packets = 140335373447328, tx_multicast_packets = 140335373316560, rx_broadcast_packets = 0, tx_broadcast_packets = 0,
rx_undersized_errors = 1507772303897788418, rx_oversize_errors = 0, rx_fragmented_errors = 0, rx_jabber_errors = 0}, stp = 0x0, stp_last_tick = 0, rstp = 0x0, rstp_last_tick = 0, ports = {map = {
buckets = 0xf3c911ba00000004, one = 0xe0e96d08, mask = 0, n = 140335373890928}}, ghost_ports = {map = {buckets = 0x7fa26014ed80, one = 0x0, mask = 0, n = 0}}, port_poll_set = {map = {buckets = 0x0,
one = 0x0, mask = 0, n = 0}}, port_poll_errno = 0, change_seq = 0, ams = {mutex = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 4, __spins = -26912, __elision = -18608,
__list = {__prev = 0xbb62aa2c, __next = 0x0}}, __size = '\000' <repeats 16 times>, "\004\000\000\000\340\226P\267,\252b\273", '\000' <repeats 11 times>, __align = 0}, where = 0x7fa260074160 ""},
list = {prev = 0x7fa260125520, next = 0x0}, n = 0}, ams_seq = 0x0, ams_seqno = 3765277242901397506, is_controller_connected = false}


Second stack trace:
(gdb) bt
#0 0xfffffac0e9000000 in ?? ()
#1 0x00000000010918df in rule_destroy_cb (rule=0x3942430) at ofproto/ofproto.c:2943
#2 0x0000000001175e16 in ovsrcu_call_postponed () at lib/ovs-rcu.c:348
#3 0x0000000001175f04 in ovsrcu_postpone_thread (arg=) at lib/ovs-rcu.c:364
#4 0x000000000117808d in ovsthread_wrapper (aux_=) at lib/ovs-thread.c:383
#5 0x00007fd6f0c5a14a in start_thread () from /lib64/libpthread.so.0
#6 0x00007fd6efeb3f23 in clone () from /lib64/libc.so.6
(gdb) frame 1
#1 0x00000000010918df in rule_destroy_cb (rule=0x3942430) at ofproto/ofproto.c:2943
2943 rule->ofproto->ofproto_class->rule_destruct(rule);
(gdb) info reg
rax 0x408d00 4230400
rbx 0x3942430 60040240
rcx 0xc11 3089
rdx 0x7fd6b0000080 140560052519040
rsi 0x0 0
rdi 0x3942430 60040240
rbp 0x7fd6b0011be0 0x7fd6b0011be0
rsp 0x7fd6ecb36fb0 0x7fd6ecb36fb0
r8 0x7fd6d8000924 140560723609892
r9 0x7 7
r10 0x2d83b96 47725462
r11 0x206 518
r12 0x7fd6ecb36fc0 140561070911424
r13 0x7fd6ed337ebf 140561079303871
r14 0x7fd6ed337f50 140561079304016
r15 0x7fd6ecb37100 140561070911744
rip 0x10918df 0x10918df <rule_destroy_cb+47>
eflags 0x10246 [ PF ZF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
k0 0x0 0
k1 0x0 0
k2 0x0 0
k3 0x0 0
k4 0x0 0
k5 0x0 0
k6 0x0 0
k7 0x0 0
(gdb) disassemble 0x10918df
Dump of assembler code for function rule_destroy_cb:
0x00000000010918b0 <+0>: push %rbx
0x00000000010918b1 <+1>: mov %rdi,%rbx
0x00000000010918b4 <+4>: testb $0x1,0xa0(%rdi)
0x00000000010918bb <+11>: je 0x10918cf <rule_destroy_cb+31>
0x00000000010918bd <+13>: cmpb $0x6,0xaa(%rdi)
0x00000000010918c4 <+20>: je 0x10918cf <rule_destroy_cb+31>
0x00000000010918c6 <+22>: cmpl $0xffff,0x18(%rdi)
0x00000000010918cd <+29>: jle 0x1091918 <rule_destroy_cb+104>
0x00000000010918cf <+31>: mov (%rbx),%rax
0x00000000010918d2 <+34>: mov %rbx,%rdi
0x00000000010918d5 <+37>: mov 0x10(%rax),%rax
0x00000000010918d9 <+41>: callq *0x158(%rax)
=> 0x00000000010918df <+47>: mov (%rbx),%rax
0x00000000010918e2 <+50>: mov 0x118(%rbx),%rsi
0x00000000010918e9 <+57>: lea 0x210(%rax),%rdi
0x00000000010918f0 <+64>: callq 0x1117220 <mf_vl_mff_unref>
0x00000000010918f5 <+69>: mov (%rbx),%rax
0x00000000010918f8 <+72>: mov 0x120(%rbx),%rsi
0x00000000010918ff <+79>: lea 0x210(%rax),%rdi
0x0000000001091906 <+86>: callq 0x1117220 <mf_vl_mff_unref>
0x000000000109190b <+91>: mov %rbx,%rdi
0x000000000109190e <+94>: pop %rbx
0x000000000109190f <+95>: jmpq 0x1090b50 <ofproto_rule_destroy__>
0x0000000001091914 <+100>: nopl 0x0(%rax)
0x0000000001091918 <+104>: callq 0x1091760 <ofproto_rule_send_removed>
0x000000000109191d <+109>: jmp 0x10918cf <rule_destroy_cb+31>
End of assembler dump.
(gdb) p *rule
$1 = {ofproto = 0x37623e0, cr = {node = {prev = 0xcccccccccccccccc, next = {p = 0x3874bc0}}, priority = 0, cls_match = {p = 0x0}, match = {{{flow = 0x38e9de0, mask = 0x38e9df0}, flows = {0x38e9de0,
0x38e9df0}}, tun_md = 0x0}}, table_id = 0 '\000', state = RULE_REMOVED, mutex = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 2, __spins = 0, __elision = 0,
__list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 16 times>, "\002", '\000' <repeats 22 times>, __align = 0}, where = 0x14ab7e8 ""}, ref_count = {count = 0},
flow_cookie = 5638004435696427197, cookie_node = {hash = 3567688855, d = 0x387b8b8, s = 0x0}, flags = (unknown: 0), hard_timeout = 0, idle_timeout = 0, importance = 0, removed_reason = 2 '\002',
eviction_group = 0x0, evg_node = {idx = 0, priority = 0}, actions = 0x3889fa0, meter_list_node = {prev = 0x3942500, next = 0x3942500}, monitor_flags = (unknown: 0), add_seqno = 49, modify_seqno = 49,
expirable = {prev = 0x3942528, next = 0x3942528}, created = 95441005, modified = 95441005, match_tlv_bitmap = 0, ofpacts_tlv_bitmap = 0}
(gdb) p *rule->ofproto->ofproto_class->rule_destruct
Cannot access memory at address 0xfffffac0e9000000
(gdb)

Steps To Reproduce

Steps to reproduce the behavior:

  1. systemctl restart openvswitch for many times
  2. ...
  3. ...

Expected behavior

There is a ovs crash problem ,which is stack trace as above.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Notify maintainers

ovs-dev@openvswitch.org

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
output here
@renweichun renweichun added the 0.kind: bug Something is broken label Jan 13, 2022
@mohe2015
Copy link
Contributor

@renweichun

CentOS Linux

This is not the bug tracker for CentOS Linux?

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jul 31, 2022
@mweinelt mweinelt closed this as completed May 4, 2023
@mweinelt mweinelt closed this as not planned Won't fix, can't repro, duplicate, stale May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md
Projects
None yet
Development

No branches or pull requests

3 participants