Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orchagent received notifications in an order different from the order in which other daemons sent them during dynamic buffer test #5157

Closed
stephenxs opened this issue Aug 12, 2020 · 3 comments
Assignees
Labels
Triaged this issue has been triaged

Comments

@stephenxs
Copy link
Collaborator

stephenxs commented Aug 12, 2020

Description
Orchagent received notifications in an order different from the order in which other daemons sent them.
This issue can be reproduced in dynamic buffer calculation test.
This is a new feature which will come soon. In the feature, there is a logic that the buffer profile will be dynamically created when a new tuple of speed, cable length occurs and removed when it isn't referenced any longer.
One of the test cases has the following flow:
- port's speed updated, causing port down and routing entries withdrawn. After that, the cable length of the port has been updated, causing buffer profile on the port updated.
- buffermgrd create a new buffer profile and replaces the port's old profile on lossless PG with the newly created one.
- the old profile is removed
After step 2 being executed, the old profile shouldn't be referenced any more, which means it's safe to remove it. However, we observed error from sairedis saying it's still being referenced. This is because orchagent received two notifications in a reversed order in which they were sent.

I suspect that the orchagent handles notifications in a different order in which buffermgrd sends them is caused by the low-level mechanism of the orchagent, which takes advantage of I/O multiply mechanism to receive notifications from redis-db:

  1. Each orch in orchagent creates an FD for each table it listens to and inserts all the FDs into something like a select vector orchdaemon.cpp#L453.
  2. The main loop is blocked at a select-like interface, orchdaemon.cpp#L453.
    When the system isn't busy everything works well. But when the system is busy, notifications are backlogged in sockets, the orchagent will handle notifications in the order like FD-ascending or FD-descending rather than the order they come.

My suggestion is to add a dependency between different buffer tables. In details, task_need_retry should be returned in the following case:

  • if a BUFFER_POOL entry is referenced by a BUFFER_PROFILE entry when a removing notification is received.
  • if a BUFFER_PROFILE entry is referenced by a BUFFER_PG entry when a removing notification is received.

To be more precise, we need to:

  • record the dependencies among BUFFER_POOL, BUFFER_PROFILE and BUFFER_PG in the orchagent.
    • add a set to the objects. it contains the objects that depend on it. typically they are the objects that reference it.
    • when a new object created, add it to the set of the object it depends on.
      for example, when a new BUFFER_PG is added, it will be added to the set of the BUFFER_POOL object it references.
    • when an object's dependency is updated, remove it from the field of the old depended object and addition it to the new one.
      for example, if the profile in a BUFFER_PG is updated from a to b, we need remove the BUFFER_PG from the set of a and add it to that of b
    • when an object is removed, remove it from the set of the depended object
  • check the dependency ahead of removing entries, return task_need_retry if the set isn't empty.

By doing so, an entry is prevented from being released before all the references removed.
The drawback of the solution is that making it retry will block all the further notification from one table and make orchagent keep retrying. But this won't introduce further risk as long as:

  • there is no loop dependency, which means all the depending entries can be removed within a certain period of time.
  • the notifications are out of order rather than lost, which means the time won't be too long.

Steps to reproduce the issue:
See the above summary.

Describe the results you received:
Detailed flow and log message:

  1. port's speed updated, causing port down and routing entries withdrawn. After that, the cable length of the port has been updated, causing buffer profile on the port updated.
  2. buffermgrd create a new buffer profile and replaces the port's old profile on lossless PG with the newly created one.
Jul 31 06:52:08.235333 mtbc-sonic-03-2700 INFO config: 'interface speed Ethernet112 10000' executing...
Jul 31 06:52:08.346616 mtbc-sonic-03-2700 INFO swss#buffermgrd: :- handlePortTable: Updating BUFFER_PG for port Ethernet112 due to speed or port updated
Jul 31 06:52:08.346616 mtbc-sonic-03-2700 INFO swss#buffermgrd: :- allocateProfile: Allocating new BUFFER_PROFILE pg_lossless_10000_40m_profile
Jul 31 06:52:08.346616 mtbc-sonic-03-2700 NOTICE swss#buffermgrd: :- allocateProfile: Creating new profile 'pg_lossless_10000_40m_profile'
Jul 31 06:52:08.351527 mtbc-sonic-03-2700 INFO swss#orchagent: message repeated 25 times: [ :- doTask: Handling buffer task]
Jul 31 06:52:08.351527 mtbc-sonic-03-2700 INFO swss#orchagent: :- setPortAdminStatus: Set admin status DOWN to port pid:10000000002d6
Jul 31 06:52:08.351527 mtbc-sonic-03-2700 NOTICE swss#buffermgrd: :- allocateProfile: BUFFER_PROFILE pg_lossless_10000_40m_profile has been created successfully
  1. buffermgrd removes the port's lossless PG, causing the newly created buffer profile no longer referenced.
Jul 31 06:52:22.603972 mtbc-sonic-03-2700 INFO ansible-command: Invoked with creates=None executable=None _uses_shell=True strip_empty_ends=True _raw_params=config interface lossless_pg remove Ethernet112 3-4 removes=None argv=None warn=True chdir=None stdin_add_newline=True stdin=None
Jul 31 06:52:23.131873 mtbc-sonic-03-2700 NOTICE swss#buffermgrd: :- doRemovePgTask: Remove BUFFER_PG Ethernet112:3-4 (profile pg_lossless_10000_40m_profile)
The newly created buffer profile is removed as well.
Jul 31 06:52:23.144138 mtbc-sonic-03-2700 NOTICE swss#buffermgrd: :- doRemovePgTask: No lossless PG configured on port Ethernet112 anymore, try removing the original profile pg_lossless_10000_40m_profile
Jul 31 06:52:23.144261 mtbc-sonic-03-2700 NOTICE swss#buffermgrd: :- releaseProfile: BUFFER_PROFILE pg_lossless_10000_40m_profile has been released successfully
  1. bufferorch handles the notification in a reverse order, receiving the notification of buffer profile removing first and failing to remove the profile. The failure is caused by the buffer profile being still referenced at the time
Jul 31 06:52:23.660780 mtbc-sonic-03-2700 INFO swss#orchagent: :- doTask: Handling buffer task
Jul 31 06:52:23.662209 mtbc-sonic-03-2700 ERR swss#orchagent: :- meta_generic_validation_remove: object 0x1900000000074f reference count is 2, can't remove
Jul 31 06:52:23.662372 mtbc-sonic-03-2700 ERR swss#orchagent: :- processBufferProfile: Failed to remove buffer profile pg_lossless_10000_40m_profile with type BUFFER_PROFILE, rv:-5
Jul 31 06:52:23.662503 mtbc-sonic-03-2700 ERR swss#orchagent: :- doTask: Failed to process buffer task, drop it
Jul 31 06:52:23.662724 mtbc-sonic-03-2700 INFO swss#orchagent: :- doTask: Handling buffer task
Jul 31 06:52:23.663667 mtbc-sonic-03-2700 NOTICE swss#orchagent: :- processPriorityGroup: Remove buffer PG Ethernet112:3-4
Jul 31 06:52:23.664102 mtbc-sonic-03-2700 INFO swss#orchagent: :- doTask: Handling buffer task

Describe the results you expected:
No error should be observed.

Additional information you deem important (e.g. issue happens only occasionally):

**Output of `show version`:**

```
(paste your output here)
```

**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
@stephenxs stephenxs changed the title Orchagent received notifications in an order different from the order in which other daemons sent them Orchagent received notifications in an order different from the order in which other daemons sent them during dynamic buffer test Aug 12, 2020
@neethajohn
Copy link
Contributor

I am not clear on what is the dependency logic you are planning to add. Is it retry logic (task_need_retry)?

@stephenxs
Copy link
Collaborator Author

stephenxs commented Aug 12, 2020

I am not clear on what is the dependency logic you are planning to add. Is it retry logic (task_need_retry)?

Yes.
To be more precise, we need to:

  • record the dependencies among BUFFER_POOL, BUFFER_PROFILE and BUFFER_PG in the orchagent.
    • add a set to the objects. it contains the objects that depend on it. typically they are the objects that reference it.
    • when a new object created, add it to the set of the object it depends on.
      for example, when a new BUFFER_PG is added, it will be added to the set of the BUFFER_POOL object it references.
    • when an object's dependency is updated, remove it from the field of the old depended object and addition it to the new one.
      for example, if the profile in a BUFFER_PG is updated from a to b, we need remove the BUFFER_PG from the set of a and add it to that of b
    • when an object is removed, remove it from the set of the depended object
  • check the dependency ahead of removing entries, return task_need_retry if the set isn't empty.

@xinliu-seattle xinliu-seattle added the Triaged this issue has been triaged label Aug 19, 2020
daall added a commit that referenced this issue Oct 16, 2020
[swss]
[acl] Replace IP_PROTOCOL with NEXT_HEADER for IPv6 ACL tables (#1458)
[acl] Refactor port OID retrieval into aclorch (#1462)
Fix issue #5157 by identifying the dependency among objects and avoiding releasing an object still being referenced (#1440)
[mock tests] Update MockDBConnector to match new swsscommon interface (#1465)

[swss-common]
netlink: Setting nl_socket buffer size to 3M from 2M (#391)
Added support in Swig file to cast Selectable object to Subscriber Table object (#394)
[warm reboot] Warm Reboot Support for EVPN VXLAN (#350)
Implement DBInterface/SonicV2Connector in C++ (#387)
Fix memory leak if a RedisCommand object were to be reused (#392)

Signed-off-by: Danny Allen <daall@microsoft.com>
@stephenxs
Copy link
Collaborator Author

Fixed by PR #1440

santhosh-kt pushed a commit to santhosh-kt/sonic-buildimage that referenced this issue Feb 25, 2021
[swss]
[acl] Replace IP_PROTOCOL with NEXT_HEADER for IPv6 ACL tables (sonic-net#1458)
[acl] Refactor port OID retrieval into aclorch (sonic-net#1462)
Fix issue sonic-net#5157 by identifying the dependency among objects and avoiding releasing an object still being referenced (sonic-net#1440)
[mock tests] Update MockDBConnector to match new swsscommon interface (sonic-net#1465)

[swss-common]
netlink: Setting nl_socket buffer size to 3M from 2M (sonic-net#391)
Added support in Swig file to cast Selectable object to Subscriber Table object (sonic-net#394)
[warm reboot] Warm Reboot Support for EVPN VXLAN (sonic-net#350)
Implement DBInterface/SonicV2Connector in C++ (sonic-net#387)
Fix memory leak if a RedisCommand object were to be reused (sonic-net#392)

Signed-off-by: Danny Allen <daall@microsoft.com>
theasianpianist pushed a commit to theasianpianist/sonic-buildimage that referenced this issue Feb 5, 2022
…and avoiding releasing an object still being referenced (sonic-net#1440)

* Fix issue sonic-net#5157 by identifying the dependency among objects and avoiding releasing an object still being referenced

The issue is caused by the OA receives notification in an different order in which they were sent.
OA doesn't have any dependency check try notifying sai-redis to release an object which is still being referenced,
which causes sai-redis complain and the object leaks.

The idea is to introduce a mechanism to identify the dependency thus preventing a referenced object from being released.

1. Introduce a new type representing the dependency among variant type of objects, including the following fields:
  - m_objsDependingOnMe, a set representing the objects that references the current object.
    eg. BUFFER_PROFILE.ingress_lossless_profile references BUFFER_POOL.ingress_lossless_pool
  - m_objsReferencingByMe, a map from a field of the current object's to the object name it references.
2. When a field of an object A has been updated with referencing another object B,
  - obj[A.m_objsReferencingByMe[field name]].m_objsDependingOnMe.remove(A)
  - A.m_objsReferencingByMe[field name] = B
3. When a an object A is about to be removed,
  - if obj.m_objsDependingOnMe isn't empty set, return task_need_retry else execute the normal remove flow.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

3 participants