-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Mellanox] set select timeout to no more than 1 sec to make sure fast shutdown #13611
[Mellanox] set select timeout to no more than 1 sec to make sure fast shutdown #13611
Conversation
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
@Junchao-Mellanox do you still want to keep in draft? |
@keboliu please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved with comments
@@ -387,26 +387,30 @@ def get_change_event(self, timeout=0): | |||
self.sfp_event.initialize() | |||
|
|||
wait_for_ever = (timeout == 0) | |||
# select timeout should be no more than 1000ms to ensure fast shutdown flow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Junchao-Mellanox may be add more context in the comment why max timeout is kept 1sec?
@@ -387,26 +387,30 @@ def get_change_event(self, timeout=0): | |||
self.sfp_event.initialize() | |||
|
|||
wait_for_ever = (timeout == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if you still want to keep this timeout=0 for future use
Related work items: sonic-net#276, sonic-net#305, sonic-net#332, sonic-net#338, sonic-net#339, sonic-net#1188, sonic-net#1192, sonic-net#1197, sonic-net#1206, sonic-net#1685, sonic-net#1690, sonic-net#1696, sonic-net#1699, sonic-net#1709, sonic-net#1727, sonic-net#1737, sonic-net#1741, sonic-net#1742, sonic-net#2511, sonic-net#2512, sonic-net#2532, sonic-net#2559, sonic-net#2626, sonic-net#2638, sonic-net#2645, sonic-net#2649, sonic-net#2660, sonic-net#2669, sonic-net#2670, sonic-net#2678, sonic-net#10084, sonic-net#11442, sonic-net#11873, sonic-net#12047, sonic-net#12110, sonic-net#12207, sonic-net#12529, sonic-net#12678, sonic-net#13235, sonic-net#13287, sonic-net#13372, sonic-net#13395, sonic-net#13456, sonic-net#13497, sonic-net#13522, sonic-net#13545, sonic-net#13547, sonic-net#13552, sonic-net#13569, sonic-net#13572, sonic-net#13578, sonic-net#13591, sonic-net#13611, sonic-net#13647, sonic-net#13649, sonic-net#13660, sonic-net#13710, sonic-net#13716, sonic-net#13724, sonic-net#13726, sonic-net#13732, sonic-net#13735, sonic-net#13739, sonic-net#13757, sonic-net#13786, sonic-net#13792, sonic-net#13800, sonic-net#13801, sonic-net#13802, sonic-net#13805, sonic-net#13806, sonic-net#13812, sonic-net#13814, sonic-net#13822, sonic-net#13831, sonic-net#13834, sonic-net#13847, sonic-net#13870, sonic-net#13882, sonic-net#13884, sonic-net#13885, sonic-net#13894, sonic-net#13895, sonic-net#13926, sonic-net#13932, sonic-net#13935, sonic-net#13942, sonic-net#13951, sonic-net#13953, sonic-net#13964
This is a fix of issue #13591. The PR sonic-net/sonic-platform-daemons#326 which introduced the issue has been cherry-picked to 202205 and 202211, so we need to cherry-pick this to 202211 and 202205 as well. Hi @StormLiangMS , could you please help cherry-pick to 202211? |
… shutdown (sonic-net#13611) - Why I did it Commit sonic-net/sonic-platform-daemons@153ea47 changed SfpStateUpdateTask from Process to Thread. In this commit, it raises an exception in SfpStateUpdateTask to make shutdown flow fast. But it does not work on Nvidia platform as Nvidia platform is passing timeout parameter of get_change_event to select. Linux select function can not be interrupted by a Python exception. There is no such issue on Nvidia platform before that commit. However, in order to comply with the commit and make shutdown flow fast, we decided to change Nvidia platform API implementation. To fix issue sonic-net#13591. - How I did it The select call in get_change_event should use no more than 1 second as timeout parameter. Outside the select call, add a while loop to make sure timeout parameter of get_change_event work as expected - How to verify it Manual test
… shutdown (sonic-net#13611) - Why I did it Commit sonic-net/sonic-platform-daemons@153ea47 changed SfpStateUpdateTask from Process to Thread. In this commit, it raises an exception in SfpStateUpdateTask to make shutdown flow fast. But it does not work on Nvidia platform as Nvidia platform is passing timeout parameter of get_change_event to select. Linux select function can not be interrupted by a Python exception. There is no such issue on Nvidia platform before that commit. However, in order to comply with the commit and make shutdown flow fast, we decided to change Nvidia platform API implementation. To fix issue sonic-net#13591. - How I did it The select call in get_change_event should use no more than 1 second as timeout parameter. Outside the select call, add a while loop to make sure timeout parameter of get_change_event work as expected - How to verify it Manual test
Cherry-pick PR to 202211: #16448 |
Cherry-pick PR to 202205: #16449 |
… shutdown (#13611) (#16449) - Why I did it Commit sonic-net/sonic-platform-daemons@153ea47 changed SfpStateUpdateTask from Process to Thread. In this commit, it raises an exception in SfpStateUpdateTask to make shutdown flow fast. But it does not work on Nvidia platform as Nvidia platform is passing timeout parameter of get_change_event to select. Linux select function can not be interrupted by a Python exception. There is no such issue on Nvidia platform before that commit. However, in order to comply with the commit and make shutdown flow fast, we decided to change Nvidia platform API implementation. To fix issue #13591. - How I did it The select call in get_change_event should use no more than 1 second as timeout parameter. Outside the select call, add a while loop to make sure timeout parameter of get_change_event work as expected - How to verify it Manual test Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
Why I did it
Commit sonic-net/sonic-platform-daemons@153ea47 changed SfpStateUpdateTask from Process to Thread. In this commit, it raises an exception in SfpStateUpdateTask to make shutdown flow fast. But it does not work on Nvidia platform as Nvidia platform is passing timeout parameter of get_change_event to select. Linux select function can not be interrupted by a Python exception. There is no such issue on Nvidia platform before that commit. However, in order to comply with the commit and make shutdown flow fast, we decided to change Nvidia platform API implementation.
To fix issue #13591.
How I did it
How to verify it
Manual test
Which release branch to backport (provide reason below if selected)
Description for the changelog
Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)