-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix adding flex counter to wrong context #1421
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meet code cover requirements
lib/Sai.cpp
Outdated
if (objectType == SAI_OBJECT_TYPE_SWITCH && objectId != SAI_NULL_OBJECT_ID && | ||
(attr->id == SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER || | ||
attr->id == SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP)) | ||
{ | ||
if (!kvp.second->m_redisSai->containsSwitch(objectId)) | ||
{ | ||
continue; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will not guarantee that the context is actually gbsyncd or syncd, no what basis this decision is made here ? from my perspective therre should be some other attribute passed to indicated which syncd it is or based on switch created attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When setting attribute SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER or SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP using sai_switch_api, the switch Id is set to gSwitchId or gearbox_oid, e.g., https://github.com/sonic-net/sonic-swss/blob/master/orchagent/saihelper.cpp#L878.
Before my change, code around here was looping through all the context using "for (auto& kvp: m_contextMap)" and setting the attributes to all contexts. My change is to check if the switch id passed through could match the context. Assumption is that gearbox_oid is only available in the gbsyncd context, switch chip oid is only available in the syncd context, which are always the case from my understanding.
There is some description in PR #1362:
Gearbox flex counter database
Pass the Phy OID, an OID of a SAI switch object in syntax, when calling the SAI set API to set the extended attributes. By doing so, the SAI redis objects can choose in which context the SAI API call should be invoked and the corresponding gearbox syncd docker container will handle it.
From my understanding, my PR here is to implement "choose in which context the SAI API call should be invoke based on OID of a SAI switch object".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your assumption is not enough, since you are assuming those 2 attributes will be passed only on grbsyncd OID, but if you pass other OID for other switch it will also execute it, sairedis should know internally whether those attributes belong to grsyncd, maybe based on create_switch
anyway, if you know outside the sairedis whether those attrbiutes should be passed or not, just don't pass them for non grbsyncd, and let that api loop throu all oids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused about the comment. Let me explain the original issue without this change. Could you please take a look and give some idea on where to change to fix the issue if you still think the current location is not appropriate?
There are two contexts: (A) is syncd for the ASIC switch chip, and (B) is gbsyncd for multiple gearbox chips.
When the system is enabling counter for switch chip of context (A), it executes
sai_switch_api->set_switch_attribute(gSwitchId, &attr);
in https://github.com/sonic-net/sonic-swss/blob/master/orchagent/saihelper.cpp#L878
with SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP in the attr.
Then because of
for (auto& kvp: m_contextMap)
in https://github.com/sonic-net/sonic-sairedis/blob/master/lib/Sai.cpp#L232,
it loops through both context (A) and context (B) and runs the set operation on both. The parameters in the original set_switch_attribute is for ASIC switch chip, thus only relevant to context (A). As it also runs on context (B) we saw the errors like below in syslog that we want to get rid of (the port VID is actually in context (A))
ERR gbsyncd#syncd: :- processFlexCounterEvent: port VID oid:0x1000000000002, was not found (probably port was removed/splitted) and will remove from counters now
Then I proposed that instead of running on both, probably it should only run on context (A), and that is through checking the objectId from set_switch_attribute belongs to which context. Also the TODO line in https://github.com/sonic-net/sonic-sairedis/blob/master/lib/Sai.cpp#L227-L228 seems also implying we should run set on only one context under certain situation.
Above example is for enabling counter on ASIC switch chip of context (A). Meanwhile, when system is enabling counters on gearbox chips of context (B), we found similar error from syncd complaining the port VID is not found in syncd (actually it belongs to gbsyncd).
ERR syncd#syncd: :- processFlexCounterEvent: port VID oid:0x1010000000001, was not found (probably port was removed/splitted) and will remove from counters now
For you comments
anyway, if you know outside the sairedis whether those attributes should be passed or not, just don't pass them for non gbsyncd, and let that api loop through all oids
it seems hard for "don't pass them for non gbsyncd" with the existing code, as that executing on all contexts is done by sairedis: https://github.com/sonic-net/sonic-sairedis/blob/master/lib/Sai.cpp#L232. Note the loop is only for redis attribute: https://github.com/sonic-net/sonic-sairedis/blob/master/lib/Sai.cpp#L204
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i get the problem fully now, and easiest solution for that is this: https://github.com/sonic-net/sonic-sairedis/blob/master/lib/Sai.cpp#L227, apply only for specific context based on switch ID, instead of hard codding specific attributes, and apply to all contexts if switchid is nullobjectid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the latest change, the switch id check has been applied to the set operation of all redis attributes when the switch ID is not null.
8941ffc
to
0266f34
Compare
0266f34
to
28eb780
Compare
The counters for syncd (switch chip) were attempted to be added to gbsyncd (gearbox phys), and vice versa. This issue is introduced by sonic-net#1362 When setting the redis attribute SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP and SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER, the operation is applied to every contexts (both syncd and gbsyncd). However, the counters to initialize could only exist in one context. The fix is to check that the target switch id exists in the context; if not, skip the operation.
Normally, setting SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP and SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER should provide a valid switch id and the setting is only applied to the context with the switch id. Fall back to loop all contexts if SAI_NULL_OBJECT_ID is provided.
Extend the context check to all redis attrs
6aeb709
to
2281261
Compare
Hi @byu343 seems merging master branch is required |
/azp run Azure.sonic-sairedis |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Azure.sonic-sairedis |
Azure Pipelines successfully started running 1 pipeline(s). |
Hi @kcudnik, are we good to merge this PR? |
Hi @bingwang-ms could you please help to approve the backport request? |
@liushilongbuaa Could you check the auto cherry-pick? Seems it's broken for this repo. Thanks |
The counters for syncd (switch chip) were attempted to be added to gbsyncd (gearbox phys), and vice versa. This issue is introduced by sonic-net#1362 When setting the redis attribute SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP and SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER, the operation is applied to every contexts (both syncd and gbsyncd). However, the counters to initialize could only exist in one context. The fix is to check that the target switch id exists in a specific context, for the set operation of any redis type attribute; if not, skip the set on that context. Before the fix, we see the error below in syslog ERR gbsyncd#syncd: :- processFlexCounterEvent: port VID oid:0x1000000000002, was not found (probably port was removed/splitted) and will remove from counters now After the fix, log like below is printed on info level: INFO swss#orchagent: :- containsSwitch: context phys failed to find switch oid:0x21000000000000 This change is needed by the SONiC release 202405 and later.
The counters for syncd (switch chip) were attempted to be added to gbsyncd (gearbox phys), and vice versa. This issue is introduced by #1362 When setting the redis attribute SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP and SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER, the operation is applied to every contexts (both syncd and gbsyncd). However, the counters to initialize could only exist in one context. The fix is to check that the target switch id exists in a specific context, for the set operation of any redis type attribute; if not, skip the set on that context. Before the fix, we see the error below in syslog ERR gbsyncd#syncd: :- processFlexCounterEvent: port VID oid:0x1000000000002, was not found (probably port was removed/splitted) and will remove from counters now After the fix, log like below is printed on info level: INFO swss#orchagent: :- containsSwitch: context phys failed to find switch oid:0x21000000000000 This change is needed by the SONiC release 202405 and later. Co-authored-by: byu343 <byu@arista.com>
The counters for syncd (switch chip) were attempted to be added to gbsyncd (gearbox phys), and vice versa. This issue is introduced by sonic-net#1362 When setting the redis attribute SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP and SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER, the operation is applied to every contexts (both syncd and gbsyncd). However, the counters to initialize could only exist in one context. The fix is to check that the target switch id exists in a specific context, for the set operation of any redis type attribute; if not, skip the set on that context. Before the fix, we see the error below in syslog ERR gbsyncd#syncd: :- processFlexCounterEvent: port VID oid:0x1000000000002, was not found (probably port was removed/splitted) and will remove from counters now After the fix, log like below is printed on info level: INFO swss#orchagent: :- containsSwitch: context phys failed to find switch oid:0x21000000000000 This change is needed by the SONiC release 202405 and later.
The counters for syncd (switch chip) were attempted to be added to gbsyncd (gearbox phys), and vice versa. This issue is introduced by sonic-net#1362 When setting the redis attribute SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP and SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER, the operation is applied to every contexts (both syncd and gbsyncd). However, the counters to initialize could only exist in one context. The fix is to check that the target switch id exists in a specific context, for the set operation of any redis type attribute; if not, skip the set on that context. Before the fix, we see the error below in syslog ERR gbsyncd#syncd: :- processFlexCounterEvent: port VID oid:0x1000000000002, was not found (probably port was removed/splitted) and will remove from counters now After the fix, log like below is printed on info level: INFO swss#orchagent: :- containsSwitch: context phys failed to find switch oid:0x21000000000000 This change is needed by the SONiC release 202405 and later.
The counters for syncd (switch chip) were attempted to be added to gbsyncd (gearbox phys), and vice versa.
This issue is introduced by #1362 When setting the redis attribute
SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP and
SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER, the operation is applied to every contexts (both syncd and gbsyncd). However, the counters to initialize could only exist in one context.
The fix is to check that the target switch id exists in a specific context, for the set operation of any redis type attribute; if not, skip the set on that context.
Before the fix, we see the error below in syslog
ERR gbsyncd#syncd: :- processFlexCounterEvent: port VID oid:0x1000000000002, was not found (probably port was removed/splitted) and will remove from counters now
After the fix, log like below is printed on info level:
INFO swss#orchagent: :- containsSwitch: context phys failed to find switch oid:0x21000000000000
This change is needed by the SONiC release 202405 and later.