-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[healthd] Use unix_socket_path instead of loopback ip #14843
[healthd] Use unix_socket_path instead of loopback ip #14843
Conversation
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
@qiluo-msft could you please help to review? |
@qiluo-msft can you please help merge it, we need this fix for our 202211 release |
@Junchao-Mellanox could you please help to review the concept? |
Try to understand the issue. Is system-health the only service that will connect redis during loopback interface flapping? Maybe we should fix this line https://github.com/sonic-net/sonic-buildimage/blob/a738c39328b3e069a83162fe036ae693a8bd3d27/src/system-health/health_checker/sysmonitor.py#LL34C31-L34C42 . Use unix socket instead of tcp socket. |
Regarding the first proposal, even other processes of system health will be using the loopback to connect to redis. So, the problem can still be seen even if just the system ready is updated. There were a few instances of the same kind of issue reported. Using unix socket apparently causes some permission issues #10179 (comment) @qiluo-msft can clarify on that. |
system health is a service managed by systemd, it is not a CLI. So I assume the permission issue is not relevant. |
Hmm, FYI another instance of where this approach is used. c93716a. Seems to be a common practice across sonic. |
rsyslog service is a different case. It uses UDP which depends on LO interface. System health does not rely on UDP/TCP. So, I don't see an issue to use unix socket here. |
So you suggest moving all DBconnector initialization in system-health to Unix socket? |
This reverts commit 341108e.
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
…dimage into intf_cfg_dep_healthd
@Junchao-Mellanox, updated with your recommendation. Can you review? |
This comment might help answer your question. #14843 (comment) |
@judyjoseph any other comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. And please make sure it is validated in a JIT based environment -- so that there is no permission issues.
Can you expand on what JIT means? #Resolved |
Hi @qiluo-msft would you please merge this since it's approved? |
@StormLiangMS would you please help to cherry-pick? |
Cherry-pick PR to 202211: #15249 |
- Why I did it interfaces-config service restarts networking service, which in-turn results in loopback interface address is being removed and reassigned back If the system-health happens to start during that instance expections and logs like this are seen: Apr 15 18:14:49.357869 r-panther-20 ERR healthd: update system status exception:Unable to connect to redis: Cannot assign requested address Apr 15 18:14:49.429778 r-panther-20 ERR healthd: subscribe_statedb exited- Unable to connect to redis: Cannot assign requested address Apr 15 18:14:52.218594 r-panther-20 ERR healthd: system_service_Map_base::at Apr 15 18:14:52.219714 r-panther-20 ERR healthd: system_service_Map_base::at Apr 15 18:14:55.218636 r-panther-20 ERR healthd: system_service_Map_base::at Apr 15 18:14:55.218722 r-panther-20 ERR healthd: system_service_Map_base::at - How I did it use unix socket path Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Issue was also seen on 202205 as reported here #15364 and thus request to backport to 202205 |
- Why I did it interfaces-config service restarts networking service, which in-turn results in loopback interface address is being removed and reassigned back If the system-health happens to start during that instance expections and logs like this are seen: Apr 15 18:14:49.357869 r-panther-20 ERR healthd: update system status exception:Unable to connect to redis: Cannot assign requested address Apr 15 18:14:49.429778 r-panther-20 ERR healthd: subscribe_statedb exited- Unable to connect to redis: Cannot assign requested address Apr 15 18:14:52.218594 r-panther-20 ERR healthd: system_service_Map_base::at Apr 15 18:14:52.219714 r-panther-20 ERR healthd: system_service_Map_base::at Apr 15 18:14:55.218636 r-panther-20 ERR healthd: system_service_Map_base::at Apr 15 18:14:55.218722 r-panther-20 ERR healthd: system_service_Map_base::at - How I did it use unix socket path Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Cherry-pick PR to 202205: #15480 |
Why I did it
interfaces-config service restarts networking service, which in-turn results in loopback interface address is being removed and reassigned back
If the system-health happens to start during that instance expections and logs like this are seen:
How I did it
use unix socket path
How to verify it
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)