Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[linux-6.6.y] x86/mce/zhaoxin: Enable mcelog to decode PCIE, ZDI/ZPI, and DRAM errors #282

Merged
merged 2 commits into from
Jul 10, 2024

Conversation

leoliu-oc
Copy link
Contributor

zhaoxin inclusion
category: bugfix


The mcelog cannot decode PCIE, ZDI/ZPI, and DRAM errors in the FFM (Firmware First Mode).
The purpose of this patch is to enable mcelog to decode PCIE, ZDI/ZPI, and DRAM errors that occur on Zhaoxin processors, so that the cause of these errors can be quickly located.

@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tsic404 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@deepin-ci-robot
Copy link

Hi @leoliu-oc. Thanks for your PR.

I'm waiting for a deepin-community member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@@ -699,14 +699,21 @@ static bool ghes_do_proc(struct ghes *ghes,

atomic_notifier_call_chain(&ghes_report_chain, sev, mem_err);

arch_apei_report_mem_error(sev, mem_err);
arch_apei_report_mem_error(sec_sev, mem_err);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我对这里不了解。sev = ghes_severity(estatus->error_severity);和sec_sev = ghes_severity(gdata->error_severity);有什么差异,为什么要把前者改成后者,后续新增的代码是为什么又用前者?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里内核的处理是有问题的,即使不加入zhaoxin patch code,这里也应该是使用sec_sev。sev是一个总的severity,是ghes中各个error的severity的最高值,而在arch_apei_report_mem_error时应该是针对各个具体的error。
而后面的代码,使用sev是合理的,所以不需要修改。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里内核的处理是有问题的,即使不加入zhaoxin patch code,这里也应该是使用sec_sev。sev是一个总的severity,是ghes中各个error的severity的最高值,而在arch_apei_report_mem_error时应该是针对各个具体的error。 而后面的代码,使用sev是合理的,所以不需要修改。

这方面您们CPU厂商可能会更专业一些,既然如此建议将这个改动单独拆分成一个提交,并且看起来也可以提给上游

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新版本中已经拆分出来了,还烦请review了。后续会考虑向上游主线提交。

When executing arch_apei_report_mem_error, replace the previous 'sev'with
'sec_sev'. 'sev' represents the overall severity, which is the highest
severity among individual errors in GHES. However, during
arch_apei_report_mem_error, it should be specific to each individual
error.

Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>
zhaoxin inclusion
category: bugfix

-------------------

The mcelog cannot decode PCIE, ZDI/ZPI, and DRAM errors in the FFM
(Firmware First Mode).
The purpose of this patch is to enable mcelog to decode PCIE, ZDI/ZPI, and
DRAM errors that occur on Zhaoxin processors, so that the cause of these
errors can be quickly located.

Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>
@leoliu-oc leoliu-oc force-pushed the linux-6.6.y-82-mcelog branch from cf3d0eb to 85ff5df Compare July 10, 2024 08:37
@opsiff opsiff merged commit 81452d9 into deepin-community:linux-6.6.y Jul 10, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants