Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

psu daemon doesn't update PSU FAN information #136

Closed
aravindmani-1 opened this issue Dec 23, 2020 · 12 comments
Closed

psu daemon doesn't update PSU FAN information #136

aravindmani-1 opened this issue Dec 23, 2020 · 12 comments

Comments

@aravindmani-1
Copy link
Contributor

aravindmani-1 commented Dec 23, 2020

Issue:

  • The PSU FAN information is not updated in the redis-db.

Steps to reproduce

  • Load latest master image and check redis-db PSU Fan table.

Logs:
root@sonic:~# redis-cli -n 6 hgetall "FAN_INFO|PSU1 Fan"
1) "led_status"
2) "None"

To fix this issue:

  • Initialize self.presence and other variables in PsuStatus dunder init to False instead of True.
  • Import datetime module.
  • Then, the following values will be seen.

root@sonic:/# redis-cli -n 6 hgetall "FAN_INFO|PSU1 Fan"
1) "presence"
2) "True"
3) "status"
4) "Updating"
5) "direction"
6) "exhaust"
7) "speed"
8) "67"
9) "timestamp"
10) "20201223 01:48:55"
11) "led_status"
12) "None"

The one more issue is that since the PSU fan status values are "Updating/ N/A".
In system health daemon, the expected values for PSU fan status is "True/False".

Thermalctld also updates the PSU Fan status value to "True/False".
So, need your input on whether we can change the PSU Fan status in psu daemon to return "True/False" instead of "Updating/ N/A".
If it is not done, then "PSU Fan is broken" error will be logged in redis-db for system health table.

@aravindmani-1
Copy link
Contributor Author

@jleveque raised this issue since it is common for all vendors.

@jleveque
Copy link
Contributor

@keboliu, @Junchao-Mellanox: Can you please comment on this?

@Junchao-Mellanox
Copy link
Collaborator

Will check.

@Junchao-Mellanox
Copy link
Collaborator

Hi @aravindmani-1, I checked the latest master, it looks ok:

admin@sonic:~$ redis-cli -n 6 hgetall "FAN_INFO|psu_1_fan_1"
 1) "led_status"
 2) "green"

In psud, this function https://github.com/Azure/sonic-platform-daemons/blob/e6c786bd02e6a253fcf67dec895e7c24e940c788/sonic-psud/scripts/psud#L585 is going to set the PSU fan led status, could you check whether this function is called?

For the led status "Updating", we might need to adjust system health code to align with this.

@aravindmani-1
Copy link
Contributor Author

aravindmani-1 commented Dec 28, 2020

Hi @Junchao-Mellanox ,

Currently, it is expected for Dell platforms to return None for psu fan led status.
The issue is with the other fields.
https://github.com/Azure/sonic-platform-daemons/blob/e6c786bd02e6a253fcf67dec895e7c24e940c788/sonic-psud/scripts/psud#L478
_update_psu_fan_data function is never invoked even once.
I hope that the fix that i've mentioned is okay.
In system health daemon for system status LED, currently we're checking True/False for status.
In thermalctld, we set led status to "True" whereas in psud, it is set to "Updating".
Can we set psud led status to True, so that we don't need to change the code in system health?.

@Junchao-Mellanox
Copy link
Collaborator

I am a litter bit confused. System health does not check PSU fan led status, why you need its value to be "True"? And led status should be "green", "red", or "Updating". Maybe you are talking about another field?

And if you want system health to ignore the check for PSU, you can simple change the system health configuration to something like:

{
    "services_to_ignore": [],
    "devices_to_ignore": ["psu"],
    "user_defined_checkers": [],
    "polling_interval": 60,
    "led_color": {
        "fault": "amber",
        "normal": "green",
        "booting": "orange_blink"
    }
}

@aravindmani-1
Copy link
Contributor Author

The system health daemon checks for Fan status in the below piece of code:
https://github.com/Azure/sonic-buildimage/blob/e88c7d11cae12a4973eca4f1ca9f4b4ba9facd71/src/system-health/health_checker/hardware_checker.py#L92

Whereas in psu daemon,we update the fan status to "Updating / N/A".
https://github.com/Azure/sonic-platform-daemons/blob/e6c786bd02e6a253fcf67dec895e7c24e940c788/sonic-psud/scripts/psud#L551
In thermalctl daemon, we update the same fan status to "True/False".
https://github.com/Azure/sonic-platform-daemons/blob/e6c786bd02e6a253fcf67dec895e7c24e940c788/sonic-thermalctld/scripts/thermalctld#L312

The issue will be seen in platforms where thermalctld is not enabled.
Currently , we don't run thermalctld in Dell S6100 and we will enable it soon.

Logs:
root@sonic:/# redis-cli -n 6 hgetall "FAN_INFO|PSU1 Fan"

  1. "presence"
  2. "True"
    **3) "status"
  3. "Updating"**
  4. "direction"
  5. "exhaust"
  6. "speed"
  7. "67"
  8. "timestamp"
  9. "20201223 01:48:55"
  10. "led_status"
  11. "None"

You can see that the status us set to "Updating" and system health daemon reads this field and logs error message(PSU Fan is broken").

@Junchao-Mellanox
Copy link
Collaborator

I see. You are talking about "status", not "led_status". I suppose your approach is good to me: set the presence value to False by default.

@aravindmani-1
Copy link
Contributor Author

aravindmani-1 commented Dec 28, 2020

Thanks @Junchao-Mellanox .
Raised #137 to fix this issue.

jleveque pushed a commit that referenced this issue Dec 28, 2020
- Initialize self.presence and other variables in PsuStatus dunder init to False instead of True.
- Import datetime module.
- Discussions related to this issue can be seen in #136
@jleveque
Copy link
Contributor

@aravindmani-1: Thank you for your help. I have merged #137. Once confirmed that the issue is resolved, please close this issue.

@aravindmani-1
Copy link
Contributor Author

@jleveque : Fix is not merged in master image yet.

@jleveque
Copy link
Contributor

jleveque commented Jan 5, 2021

@jleveque : Fix is not merged in master image yet.

Submodule update here: sonic-net/sonic-buildimage#6352

sujinmkang pushed a commit to sujinmkang/sonic-platform-daemons that referenced this issue Jan 16, 2021
…-net#137)

- Initialize self.presence and other variables in PsuStatus dunder init to False instead of True.
- Import datetime module.
- Discussions related to this issue can be seen in sonic-net#136
vdahiya12 pushed a commit to vdahiya12/sonic-platform-daemons that referenced this issue Apr 4, 2022
…et#136)

sonic-platform-base: Changes to introduce APIs for modular chassis for power-consumption and supplied

HLD: sonic-net/SONiC#646

PSUd APIs for power requirement calculations

get_maximum_supplied_power() - per PSU
get_status_master_led() - get master psu led status. Class method.
set_status_master_led() - set master psu led status. Class method.

get_maximum_consumed_power(self) - per consumer API. Consumers are modules, Fans
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants