Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix flow counter out-of-order issue by notifying counter operations u…
…sing SelectableChannel (#1362) What I did Fix flow counter out-of-order issue by notifying counter operations using SelectableChannel Signed-off-by: Stephen Sun stephens@nvidia.com Why I did it Currently, the operations of SAI objects and their counters (if any) are triggered by different channels, which introduces racing conditions: the creation and destruction of the objects are notified using the SelectableChannel, the operations of counters, including starting and stopping polling the counters, are notified by listening to the FLEX_COUNTER and FLEX_COUNTER_GROUP tables in the FLEX_COUNTER_DB The orchagent always respects the order when starting/stopping counter-polling (which means to start counter-polling after creating the object and to stop counter-polling before destroying the object) but syncd can receive events in a wrong order, eg. it receives destroying an object first and then stopping counter polling on the object, it can poll counter for a non-exist object, which causes errors in vendor SAI. The new solution is to extend SAI redis attributes on the SAI_SWITCH_OBJECT to notify counter polling. As a result, all the objects and their counters are notified using a unified channel, which is the SelectableChannel. How I verified it Unit test Manual test Regressions test Details if related There are two SAI Redis attributes introduced as below. There are some fields with const char * type for each attribute. Passing a field as nullptr means not to change it. SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP for counters represented by FLEX_COUNTER_GROUP table in the FLEX_COUNTER_DB, including the following fields counter_group_name, which is the key of the table, representing the group name. poll_interval, which is the field POLL_INTERVAL of an entry, representing the polling interval of the group. operation, which is the field FLEX_COUNTER_STATUS of an entry, representing whether the counter polling is enabled for the group stats_mode, which is the field STATS_MODE of an entry, either STATS_MODE_READ or STATS_MODE_READ_AND_CLEAR plugins, which represents the Lua plugin related to the group plugin_name, which is the name of the plugins field. It differs among different groups SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER for counter groups represented by the FLEX_COUNTER table in the FLEX_COUNTER_DB, including the following fields counter_key, which is the key of the table, with the name convention of <group-name>:oid:<oid-value> counter_ids, which is a list of counter IDs to be polled for the object counter_field_name, which is the name of the counter ID field. It differs among different groups stats_mode, which is the field STATS_MODE of an entry, either STATS_MODE_READ or STATS_MODE_READ_AND_CLEAR Both SAI attributes are terminated by the RedisRemoteSaiInterface object in the swss context, which serializes the SAI API call into the selectable channel. REDIS_FLEX_COUNTER_COMMAND_SET_COUNTER_GROUP: represents the SET operation in the FLEX_COUNTER_GROUP table REDIS_FLEX_COUNTER_COMMAND_DEL_COUNTER_GROUP: represents the DEL operation in the FLEX_COUNTER_GROUP table REDIS_FLEX_COUNTER_COMMAND_START_POLL: represents the SET operation in the FLEX_COUNTER table REDIS_FLEX_COUNTER_COMMAND_STOP_POLL: represents the DEL operation in the FLEX_COUNTER table The Syncd will call flex counter functions to handle them on receiving the above-extended commands (representing both SAI extended attributes). Gearbox flex counter database Pass the Phy OID, an OID of a SAI switch object in syntax, when calling the SAI set API to set the extended attributes. By doing so, the SAI redis objects can choose in which context the SAI API call should be invoked and the corresponding gearbox syncd docker container will handle it. (ps: THE ORIGINAL GEARBOX FLEX COUNTER IMPLEMENTATION IS BUGGY) Context and critical section analysis It does not change the critical section hierarchy Performance analysis The counter operations are handled in the same thread in both the new and old solutions. In swss, the counter operation was asynchronous in the old solution and is synchronous now, which can introduce a bit more latency. However, as the number of counter operations is small, no performance degradation is observed.
- Loading branch information