Lacp member down after waiting for urcu2 to quiesce. #345

danieldin95 · 2024-10-12T02:06:45Z

OpenvSwitch version

[root@node-2 ~]# ovs-vsctl show | grep ovs_v
    ovs_version: "2.16.2"
[root@node-2 ~]#

We add enp4s0f0np0 and enp2s0f0np0 as lacp bond with fast mode. When ucru thread occured unreasonably long time poll interval, the main thread will be blocked exceed 2000ms, and the member of lacp bond has link state down.

2024-10-08 19:16:16.577 +0800 ¦ node-46 ¦ openvswitch ¦  ¦ 2024-10-08T11:16:16.577Z|29081266|ovs_rcu|WARN|blocked 1000 ms waiting for urcu2 to quiesce
2024-10-08 19:16:17.577 +0800 ¦ node-46 ¦ openvswitch ¦  ¦ 2024-10-08T11:16:17.577Z|29081267|ovs_rcu|WARN|blocked 2000 ms waiting for urcu2 to quiesce
2024-10-08 19:16:17.858 +0800 ¦ node-46 ¦ openvswitch ¦  ¦ 2024-10-08T11:16:17.858Z|03663|timeval(urcu2)|WARN|Unreasonably long 2306ms poll interval (2285ms user, 0ms system)
2024-10-08 19:16:17.858 +0800 ¦ node-46 ¦ openvswitch ¦  ¦ 2024-10-08T11:16:17.858Z|03664|timeval(urcu2)|WARN|faults: 1 minor, 0 major
2024-10-08 19:16:17.858 +0800 ¦ node-46 ¦ openvswitch ¦  ¦ 2024-10-08T11:16:17.858Z|03665|timeval(urcu2)|WARN|context switches: 3225 voluntary, 39 involuntary
2024-10-08 19:16:17.858 +0800 ¦ node-46 ¦ openvswitch ¦  ¦ 2024-10-08T11:16:17.858Z|03666|poll_loop(urcu2)|INFO|Dropped 57443 log messages in last 9 seconds (most recently, 5 seconds ago) due to excessive rate
2024-10-08 19:16:17.858 +0800 ¦ node-46 ¦ openvswitch ¦  ¦ 2024-10-08T11:16:17.858Z|03667|poll_loop(urcu2)|INFO|wakeup due to 0-ms timeout at lib/ovs-rcu.c:236 (67% CPU usage)
2024-10-08 19:16:17.867 +0800 ¦ node-46 ¦ openvswitch ¦  ¦ 2024-10-08T11:16:17.867Z|29081268|bond|INFO|member enp4s0f0np0: link state down
2024-10-08 19:16:17.867 +0800 ¦ node-46 ¦ openvswitch ¦  ¦ 2024-10-08T11:16:17.867Z|29081269|bond|INFO|member enp4s0f0np0: disabled
2024-10-08 19:16:17.964 +0800 ¦ node-46 ¦ openvswitch ¦  ¦ 2024-10-08T11:16:17.964Z|29081270|bond|INFO|member enp4s0f0np0: link state up
2024-10-08 19:16:17.964 +0800 ¦ node-46 ¦ openvswitch ¦  ¦ 2024-10-08T11:16:17.964Z|29081271|bond|INFO|member enp4s0f0np0: enabled

The text was updated successfully, but these errors were encountered:

chaudron · 2024-10-14T08:10:48Z

Did it recover on its own, or did you have to restart OVS? If it recovered automatically, the issue is likely due to an overloaded system. Check out the following blog for troubleshooting steps: https://developers.redhat.com/articles/2024/01/31/debugging-open-vswitch-logs-long-poll-interval-blocked-waiting

danieldin95 · 2024-10-18T01:31:35Z

Thank you @chaudron. But I cann't understand why lacp hasn't beed moved to monitor thread or single thread. In main thread to send the lacp packet is dangerous, we don't know what causes the man thread to get blocked.

danieldin95 · 2024-10-23T09:17:39Z

Hi @chaudron , the long time blocking can recover on its own.

chaudron · 2024-10-23T09:29:06Z

If your system is overloaded the blocking will recover on its own. If for some reason the locking continues it's probably a bug. It could also be that there is a lot of rcu backlog work, but spending 2+ seconds doing this sounds like a lot.

danieldin95 · 2024-10-23T09:36:27Z

Yes @chaudron. But this issue arises in our production environment, and we used ovn for openstack network. In order to be able to carry larger network traffic from tenant network to external network, we enable ovs-dpdk support for gateway node(also network node). Although dpdk improves the throughput of elephant streams and the forwarding latency of a single packet, once a large number (~20W) of small streams (pps < 5) are generated instantly, it still puts considerable load pressure on the revalidator threads, causing the ovs-rcu to spend a lot of time dealing with callback.

chaudron · 2024-10-23T09:59:43Z

How many datapath flows do you have in this scenario, and how busy are the revalidators(ovs-appctl upcall/show)? Are you sure enough CPU resources are available for OVS?

danieldin95 · 2024-10-23T11:39:48Z

The datapath has ~20W flows, and vswitch has ~50W openflows.

[root@node-10 ~]# ovs-ofctl dump-flows br-int | wc
 559508 5030055 128885820
[root@node-10 ~]# ovs-appctl upcall/show | grep flows
  flows         : (current 260044) (avg 260037) (max 704399) (limit 382736)

chaudron · 2024-10-23T12:13:43Z

Does not look too much, what is your dump duration, it was cut off in your output.

danieldin95 · 2024-10-23T12:44:12Z

The dump duration is about ~2000ms, but this is not root cause of lacp link down. Revalidator will exit non-quiesce in each dump batch(50 flows). Would it be more reasonable to implement the sending of lacp messages in a single thread?

chaudron · 2024-10-23T12:52:48Z

Is the ~2000ms the output of the upcall output? Or are you basing this off the log messages? 2000ms for the dump duration sounds way too long to me. You might want to figure out why this is. Do you have some kind of rte offloading that takes time (assuming this is dpdk datapath)? Also, I think you should investigate why the urcu2 thread gets stalled, as callbacks should be quick.

danieldin95 · 2024-10-24T01:08:50Z

Yes, we also seen Unreasonable long time in the log. And we want to know how long the callback took, after applying the following patch, it show ~1500ms.

    long long int start = time_msec();

    LIST_FOR_EACH_POP (cbset, list_node, &cbsets) {
        struct ovsrcu_cb *cb;

        for (cb = cbset->cbs; cb < &cbset->cbs[cbset->n_cbs]; cb++) {
            cb->function(cb->aux);
            call ++;
        }
        free(cbset->cbs);
        free(cbset);
    }

    elapsed = time_msec() - start;
    if (elapsed >= 100) {
        VLOG_WARN("call %d free guarded list with %u ms.", call, elapsed);
    }

So suspected that many OpenFlow rules are associated in a ukey and released the atomic reference count of the rules, which consumed a lot of time cumulatively.

chaudron · 2024-10-24T07:25:40Z

It would be nice to know exactly what function is causing this so you can further analyze what is taking 1.5 seconds, which seems way too long. You could attach a perf probe to your new function. Locate it with perf probe -x /usr/sbin/ovs-vswitchd -L ovsrcu_call_postponed@ovs-rcu.c and then record it with -g --call-graph dwarf so you can see the call graph.

danieldin95 · 2024-10-24T08:47:09Z

Thanks @chaudron, Do you have any suggestions for implementing lacp in a single thread like monitor thread? or What suggestions does the community have? To avoid the main thread getting blocking and affecting the sending of lacp packets?

chaudron · 2024-10-24T08:55:28Z

Your problem is that the RCU thread is being blocked, not the main thread. Is the main thread blocked waiting on this synchronization? Anyway, I think it's better to find the root cause of the delay and fix this, instead of working around the real issue.

danieldin95 · 2024-10-24T10:11:03Z

I understand your point of view, and I will continue to investigate the issue of RCU blocked. But if it is ~55w openflow rules, it also takes a lot of time for dumping all flow rules via ovs-ofctl, and query openflow rules message is also in the main thread.

2024-10-24T10:04:36.859Z|16210|timeval|WARN|Unreasonably long 2085ms poll interval (2079ms user, 0ms system)
2024-10-24T10:04:36.860Z|16211|timeval|WARN|faults: 4444 minor, 0 major
2024-10-24T10:04:36.860Z|16212|timeval|WARN|context switches: 72 voluntary, 9 involuntary
2024-10-24T10:04:36.860Z|16213|coverage|INFO|Skipping details of duplicate event coverage for hash=843564ad
2024-10-24T10:04:36.860Z|16214|poll_loop|INFO|Dropped 150 log messages in last 167 seconds (most recently, 164 seconds ago) due to excessive rate
2024-10-24T10:04:36.860Z|16215|poll_loop|INFO|wakeup due to [POLLIN] on fd 950 (/var/run/openvswitch/br-Bond1.mgmt<->) at lib/stream-fd.c:274 (51% CPU usage)
2024-10-24T10:04:36.863Z|16216|poll_loop|INFO|wakeup due to [POLLOUT] on fd 985 (/var/run/openvswitch/br-Bond1.mgmt<->) at lib/stream-fd.c:153 (51% CPU usage)
2024-10-24T10:04:36.867Z|16217|poll_loop|INFO|wakeup due to [POLLIN] on fd 950 (/var/run/openvswitch/br-Bond1.mgmt<->) at lib/stream-fd.c:274 (51% CPU usage)
2024-10-24T10:04:36.871Z|16218|poll_loop|INFO|wakeup due to [POLLOUT] on fd 987 (/var/run/openvswitch/br-Bond1.mgmt<->) at lib/stream-fd.c:153 (51% CPU usage)
2024-10-24T10:04:36.874Z|16219|poll_loop|INFO|wakeup due to [POLLIN] on fd 987 (/var/run/openvswitch/br-Bond1.mgmt<->) at lib/stream-fd.c:157 (51% CPU usage)

This means that every time we try to dump openflow rules, it will affect the lacp packet transmission. I don't think it's a good implementation.

chaudron · 2024-10-24T12:15:34Z

I have a system here with 200k datapath flows, and dumping takes around 150ms, with peaks to <500ms when revalidating flows. Dumping (open)flows might take time, but the system/main thread is not blocked during this dump.

danieldin95 · 2024-10-25T01:49:05Z

Thank you @chaudron for your reply, can you show me for your cpu info. the following is mine.

Architecture:        aarch64
CPU(s):              96
NUMA node(s):        4
Model:               0
Model name:          Kunpeng-920
CPU max MHz:         2600.0000
CPU min MHz:         200.0000
NUMA node0 CPU(s):   0-23
NUMA node1 CPU(s):   24-47
NUMA node2 CPU(s):   48-71
NUMA node3 CPU(s):   72-95

chaudron · 2024-10-25T09:38:25Z

I have an Intel system:

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 57 bits virtual
  Byte Order:            Little Endian
CPU(s):                  64
  On-line CPU(s) list:   0-63
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel
  Model name:            Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz
    BIOS Model name:     Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz
    CPU family:          6
    Model:               106
    Thread(s) per core:  2
    Core(s) per socket:  16
    Socket(s):           2
    Stepping:            6
...
NUMA:                    
  NUMA node(s):          2
  NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62
  NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63

danieldin95 · 2024-11-06T07:01:40Z

Thank you @chaudron for your reply. I'm very sorry for taking so long to reply to you. I have learned about the implementation of lacp in the kernel, as well as dpdk. In the kernel Use work queues to update the lacp state machine and send lacp messages. And In the DPDK also queue the lacp message in a tx-ring of the network port during the timer task, waiting for the DPDK application to call the rte_eth_tx_burst api to empty the lacp buffer queue. So is it possible for the OVS community to implement the sending of lacp messages in a timer task? This design is more conducive for lacp in production environments.

chaudron · 2024-11-06T10:02:49Z

It doesn’t look like anyone is currently working on this upstream, likely because no one has encountered any issues with it yet. Once your root cause is resolved, you might find that it’s no longer necessary. You could start a discussion on the OVS discuss mailing list to see if others have run into this issue as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lacp member down after waiting for urcu2 to quiesce. #345

Lacp member down after waiting for urcu2 to quiesce. #345

danieldin95 commented Oct 12, 2024

chaudron commented Oct 14, 2024 via email •

edited

Loading

danieldin95 commented Oct 18, 2024 •

edited

Loading

danieldin95 commented Oct 23, 2024

chaudron commented Oct 23, 2024

danieldin95 commented Oct 23, 2024 •

edited

Loading

chaudron commented Oct 23, 2024

danieldin95 commented Oct 23, 2024

chaudron commented Oct 23, 2024

danieldin95 commented Oct 23, 2024 •

edited

Loading

chaudron commented Oct 23, 2024

danieldin95 commented Oct 24, 2024 •

edited

Loading

chaudron commented Oct 24, 2024

danieldin95 commented Oct 24, 2024

chaudron commented Oct 24, 2024

danieldin95 commented Oct 24, 2024

chaudron commented Oct 24, 2024

danieldin95 commented Oct 25, 2024 •

edited

Loading

chaudron commented Oct 25, 2024

danieldin95 commented Nov 6, 2024

chaudron commented Nov 6, 2024

Lacp member down after waiting for urcu2 to quiesce. #345

Lacp member down after waiting for urcu2 to quiesce. #345

Comments

danieldin95 commented Oct 12, 2024

chaudron commented Oct 14, 2024 via email • edited Loading

danieldin95 commented Oct 18, 2024 • edited Loading

danieldin95 commented Oct 23, 2024

chaudron commented Oct 23, 2024

danieldin95 commented Oct 23, 2024 • edited Loading

chaudron commented Oct 23, 2024

danieldin95 commented Oct 23, 2024

chaudron commented Oct 23, 2024

danieldin95 commented Oct 23, 2024 • edited Loading

chaudron commented Oct 23, 2024

danieldin95 commented Oct 24, 2024 • edited Loading

chaudron commented Oct 24, 2024

danieldin95 commented Oct 24, 2024

chaudron commented Oct 24, 2024

danieldin95 commented Oct 24, 2024

chaudron commented Oct 24, 2024

danieldin95 commented Oct 25, 2024 • edited Loading

chaudron commented Oct 25, 2024

danieldin95 commented Nov 6, 2024

chaudron commented Nov 6, 2024

chaudron commented Oct 14, 2024 via email •

edited

Loading

danieldin95 commented Oct 18, 2024 •

edited

Loading

danieldin95 commented Oct 23, 2024 •

edited

Loading

danieldin95 commented Oct 23, 2024 •

edited

Loading

danieldin95 commented Oct 24, 2024 •

edited

Loading

danieldin95 commented Oct 25, 2024 •

edited

Loading