Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flex Counter] observe race condition while removing a RIF object #14628

Closed
Junchao-Mellanox opened this issue Apr 12, 2023 · 13 comments
Closed
Assignees
Labels
Bug 🐛 NVIDIA Triaged this issue has been triaged

Comments

@Junchao-Mellanox
Copy link
Collaborator

Junchao-Mellanox commented Apr 12, 2023

Description

In intfsorch.cpp function removeRouterIntfs, it removes flex counter before removing RIF object. See code:

    removeRifFromFlexCounter(id, port.m_alias);

    sai_status_t status = sai_router_intfs_api->remove_router_interface(port.m_rif_id);

However, it cannot guarantee that Flex counter will be removed before RIF object removing. This is because the async architecture of Syncd. The flow may like this:

  1. Orchagent removes flex counter from redis
  2. orchage removes RIF object via SAI redis API
  3. Syncd selects selectables and processes RIF object removing first
  4. Flex counter object tries to collect counter for the removed RIF, error happened

Steps to reproduce the issue:

  1. Removing a RIF, there is a small chance to reproduce the issue.

Describe the results you received:

Apr  4 02:47:15.195991 r-lionfish-07 NOTICE swss#orchagent: :- removeRouterIntfs: Remove router interface for port Ethernet0
...
Apr  4 02:47:15.910507 r-lionfish-07 ERR syncd#SDK: [ROUTER.ERR] cntr (0) err: Entry Not Found.
Apr  4 02:47:15.910507 r-lionfish-07 ERR syncd#SDK: [ROUTER.ERR] counter get impl function failed with cmd=31 cntr 0, err=Entry Not Found
Apr  4 02:47:15.910507 r-lionfish-07 ERR syncd#SDK: [SAI_RIF.ERR] mlnx_sai_rif.c[1887]- mlnx_get_router_interface_stats_ext: Failed to READ sx counter 0 - Entry Not Found
Apr  4 02:47:15.910507 r-lionfish-07 ERR syncd#SDK: :- collectData: Failed to get stats of Rif Counter 0x4100000006: -7

Describe the results you expected:

There is no race condition

Output of show version:

SONiC Software Version: SONiC.202211_RC9.2-243f8ce9b_Internal
Distribution: Debian 11.6
Kernel: 5.10.0-18-2-amd64
Build commit: 243f8ce9b
Build date: Mon Apr  3 03:16:12 UTC 2023
Built by: sw-r2d2-bot@r-build-sonic-ci02-244

Platform: x86_64-mlnx_msn3420-r0
HwSKU: ACS-MSN3420
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2012X01822
Model Number: MSN3420-CB2FO
Hardware Revision: A1
Uptime: 02:48:43 up 4 min,  1 user,  load average: 1.16, 1.02, 0.47
Date: Tue 04 Apr 2023 02:48:43

Docker images:
REPOSITORY                                         TAG                               IMAGE ID       SIZE
docker-syncd-mlnx                                  202211_RC9.2-243f8ce9b_Internal   48b149e11b13   938MB
docker-syncd-mlnx                                  latest                            48b149e11b13   938MB
docker-orchagent                                   202211_RC9.2-243f8ce9b_Internal   7facbcd63503   538MB
docker-orchagent                                   latest                            7facbcd63503   538MB
docker-fpm-frr                                     202211_RC9.2-243f8ce9b_Internal   d719af6ce678   549MB
docker-fpm-frr                                     latest                            d719af6ce678   549MB
docker-teamd                                       202211_RC9.2-243f8ce9b_Internal   58949bfb9e91   519MB
docker-teamd                                       latest                            58949bfb9e91   519MB
docker-macsec                                      latest                            dc04bd492752   521MB
docker-nat                                         202211_RC9.2-243f8ce9b_Internal   4bf4d3ee8c31   484MB
docker-nat                                         latest                            4bf4d3ee8c31   484MB
docker-sflow                                       202211_RC9.2-243f8ce9b_Internal   53ab120b818c   482MB
docker-sflow                                       latest                            53ab120b818c   482MB
docker-platform-monitor                            202211_RC9.2-243f8ce9b_Internal   80e59a66e3a2   933MB
docker-platform-monitor                            latest                            80e59a66e3a2   933MB
docker-sonic-telemetry                             202211_RC9.2-243f8ce9b_Internal   f9164b47bc9b   795MB
docker-sonic-telemetry                             latest                            f9164b47bc9b   795MB
docker-snmp                                        202211_RC9.2-243f8ce9b_Internal   0f55b37da14f   541MB
docker-snmp                                        latest                            0f55b37da14f   541MB
docker-dhcp-relay                                  latest                            090cf52aaa8c   506MB
docker-eventd                                      202211_RC9.2-243f8ce9b_Internal   fc0a84e38eed   496MB
docker-eventd                                      latest                            fc0a84e38eed   496MB
docker-router-advertiser                           202211_RC9.2-243f8ce9b_Internal   fb8544bb50d6   496MB
docker-router-advertiser                           latest                            fb8544bb50d6   496MB
docker-lldp                                        202211_RC9.2-243f8ce9b_Internal   835494383385   539MB
docker-lldp                                        latest                            835494383385   539MB
docker-database                                    202211_RC9.2-243f8ce9b_Internal   4777bcd8b94a   496MB
docker-database                                    latest                            4777bcd8b94a   496MB
docker-mux                                         202211_RC9.2-243f8ce9b_Internal   a812afb9fc4d   545MB
docker-mux                                         latest                            a812afb9fc4d   545MB
docker-sonic-p4rt                                  202211_RC9.2-243f8ce9b_Internal   af29201218f7   579MB
docker-sonic-p4rt                                  latest                            af29201218f7   579MB
docker-sonic-mgmt-framework                        202211_RC9.2-243f8ce9b_Internal   d97317a2eeeb   614MB
docker-sonic-mgmt-framework                        latest                            d97317a2eeeb   614MB
urm.nvidia.com/sw-nbu-sws-sonic-docker/sonic-wjh   1.4.0-202211-internal-9           3367487a69e9   323MB
urm.nvidia.com/sw-nbu-sws-sonic-docker/doroce      1.1.0-202211-internal-5           078ba366e374   200MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@Junchao-Mellanox
Copy link
Collaborator Author

@prsunny @kcudnik , could you please provide your suggestion for this issue?

@neethajohn
Copy link
Contributor

Does this PR fix the issue - sonic-net/sonic-swss#2488?

@neethajohn neethajohn added the Triaged this issue has been triaged label Apr 26, 2023
@neethajohn
Copy link
Contributor

@dgsudharsan to check with Junchao and get back

@Junchao-Mellanox
Copy link
Collaborator Author

Junchao-Mellanox commented Apr 27, 2023

Hi @neethajohn , the PR sonic-net/sonic-swss#2488 does not fix the issue. Here is my analyzation, correct me if anything is wrong.

On removing a RIF, intfsorch.cpp would do it like this:

removeRifFromFlexCounter(id, port.m_alias);

sai_status_t status = sai_router_intfs_api->remove_router_interface(port.m_rif_id);

So, it removes flex counter pro to removing RIF. This is correct. However, it does not guarantee the process order on syncd side. In syncd, we have 4 selectables(sockets) to handle redis DB changes:

        s->addSelectable(m_selectableChannel.get());        ====> handles RIF removing
        s->addSelectable(m_restartQuery.get());
        s->addSelectable(m_flexCounter.get());                    ====> handles flex counter removing
        s->addSelectable(m_flexCounterGroup.get());

....
       while(runMainLoop)
      {
            swss::Selectable *sel = NULL;

            int result = s->select(&sel);
...

Sonic select is based on epoll. From my understanding, epoll cannot guarantee the order among different socket and sonic has no mechanism to order event among sockets. So, an example of the issue:

orchagent send:

redis command 1(remove flex counter)
redis command 2(remove rif)

syncd select is processing m_selectableChannel at this time, so it processes:

redis command 2(remove rif)

syncd select continues running and process m_flexCounter:

redis command 1(remove flex counter)

In this case, syncd processing order is different than what orchagent sent. Since flex counter is polling in different thread, things become bad if following happens on syncd side:

redis command 2(remove rif)
flex counter polling counter for the removed rif     ======> causes issue
redis command 1(remove flex counter)

@stephenxs
Copy link
Collaborator

This is not only for RIF objects but for all kinds of objects. Recently we observed a similar syndrome on buffer pool objects when removing the zero buffer pool. in that time, the buffer pool is removed and then the counter. but in syncd the counter is still fetched just after the buffer pool is removed.

@stephenxs
Copy link
Collaborator

The root cause of the issue is the removing notifications of an object, and its counters are received in a wrong order due to multiple channels between orchagent and sairedis for counter and object operations

  • Redis selectable channel or ZMQ for SAI API which is responsible for object operations.
  • Redis ConsumerTable for FLEX_COUNTER_DB operations which is responsible for counter operations.

The formal solution should be to use a unified communication channel for both counters and objects.
This can be done by extending customized SAI attributes of objects whoever can have a counter to represent counter-creating/removing operations.
This is NOT the “official” SAI attributes defined by the SAI community, but the customized SAI attributes defined by SONiC in sai-redis, which is not a new approach.
Currently, it has been used by orchagent to notify sairedis arguments for apply view, communication mode, timeout, recording, saidump, link event damping etc.
By doing so,

  • orchagent will use SAI API to create counters instead of FLEX_COUNTER_DB.
  • Sai-redis is responsible for adding corresponding items to FLEX_COUNTER_DB, starting and stopping polling corresponding counters
  • OA no longer need to check VIDTORID map before creating counters (there are a lot of such checking currently)

@kcudnik @prsunny any concern if we do it in this way?

@kcudnik
Copy link
Contributor

kcudnik commented Jan 12, 2024

There was idea that all operations should go via zna channel, flex counter channel was added outside my supervision and merged, this problem with race condition here is known for a while now, maybe it didn't surface, I mentioned this several times that this can happen and solution is to merge all sairedis operations to 1 channel

@stephenxs
Copy link
Collaborator

There was idea that all operations should go via zna channel, flex counter channel was added outside my supervision and merged, this problem with race condition here is known for a while now, maybe it didn't surface, I mentioned this several times that this can happen and solution is to merge all sairedis operations to 1 channel

hi
what does zna stand for?

@kcudnik
Copy link
Contributor

kcudnik commented Jan 12, 2024

zmq, mistkae spelling

@stephenxs
Copy link
Collaborator

zmq, mistkae spelling

Thanks for clarifying it.
It is determined by syncd's CLI options whether to use ZeroMQSelectableChannel or RedisSelectableChannel
So, I believe this is the same as my proposal.

@Junchao-Mellanox
Copy link
Collaborator Author

Could you please share PR when it is available?

@stephenxs
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🐛 NVIDIA Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

4 participants