Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPU disable unused PCIe BAR #225

Closed
wants to merge 5 commits into from
Closed

Conversation

msiddare
Copy link
Contributor

@msiddare msiddare commented Jul 1, 2021

NPU device enumeration was failing because of an issue in the cisco
ASIC. This patch will fix address the issue by disabling unused BAR.

Signed-off-by: Madhava Reddy Siddareddygari msiddare@cisco.com

NPU device enumeration was failing because of an issue in the cisco
ASIC. This patch will fix address the issue by disabling unused BAR.

Signed-off-by: Madhava Reddy Siddareddygari <msiddare@cisco.com>
Copy link
Contributor

@paulmenzel paulmenzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your patches, but reviewing your submission takes too much time because they have so many formal errors. It’d be great, if you got acquainted with the Linux kernel development processes, and followed them.

Comment on lines 1 to 25
From 2bc0b2e84866a7733fc05a34bd8a93e165db99cc Mon Sep 17 00:00:00 2001
From: Madhava Reddy Siddareddygari <msiddare@cisco.com>
Date: Tue, 9 Mar 2021 13:54:28 -0800
Subject: [PATCH] NPU disable other bars

This is a port of IOS XR patch

Spitfire-pacific-disable-all-bars-other-than-0.patch

to Debian kernel. Modifications from IOS XR:

a. Only issue an INFO message for changed resources
b. Include the previous resource data in the INFO message
c. Prepare for the possibility of applying patch to
future NPU revisions.

Original patch message follows:

For Pacific, only bar 0 is valid. Hence disable all other bars as a work
around for spitfire

CDETS: CSCvi15029

Signed-off-by: Gaurav Gupta <gauragup@cisco.com>
Signed-off-by: Subha Keshavaraj <subkesha@cisco.com>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you just copied the output from git show. Please create the patch with git format-patch ….


This is a port of IOS XR patch

Spitfire-pacific-disable-all-bars-other-than-0.patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide a git commit hash or URL for this patch.

+ */
+static void silicon_one_fixup(struct pci_dev *dev)
+{
+ int i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not follow the Linux coding style. Please check patches with checkpatch.pl script in the Linux kernel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, Firefox display issues in the browser.

Signed-off-by: Madhava Reddy Siddareddygari <msiddare@cisco.com>
Signed-off-by: Madhava Reddy Siddareddygari <msiddare@cisco.com>
Further debugging and consultion with Hardware team, issue seems
to be related to ASIC and workaround suggested is to disable
unused PCI BAR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also paste the new log messages from one device to the commit message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still open.

+ continue;
+
+ dev_info(&dev->dev,
+ "Cisco Silicon One BAR %d %pR fixed up\n", i, r);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

Cisco Silicon One BAR %d %pR disabled due to NPU hardware bug

Signed-off-by: Madhava Reddy Siddareddygari <msiddare@cisco.com>
@msiddare
Copy link
Contributor Author

Review comments incorporated and updated series file as per meeting discussion

Copy link
Contributor

@paulmenzel paulmenzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please get it reviewed by the Linux subsystem maintainers.

Comment on lines +54 to +57
+ * Pacific device was misbehaving during rescan not enumerating
+ * memory for bar. Due to this HW issue, added this workaround
+ * and verified that during rescan memory gets assigned properly
+ * to the device. This is only a temporary fix for pacific ASIC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

Due to HW bug, Pacific device is misbehaving during rescan by not enumerating memory for that BAR. Work around it and verify, that during rescan memory gets assigned properly to the device.

+ * memory for bar. Due to this HW issue, added this workaround
+ * and verified that during rescan memory gets assigned properly
+ * to the device. This is only a temporary fix for pacific ASIC
+ * only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really temporary?

+ dev->class = PCI_CLASS_MEMORY_OTHER << 8;
+ dev_info(&dev->dev, "Cisco Silicon One class adjusted\n");
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SYNOPSYS, PCI_DEVICE_ID_LEABA_PACIFIC,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why PCI_VENDOR_ID_SYNOPSYS?

+/*
+ * For Pacific A0, only BAR 0 is valid
+ */
+static void silicon_one_fixup(struct pci_dev *dev)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d think, the function name needs to include cisco somehow.

From 0be339b4c8468138dfe1a30de29511e94b846af6 Mon Sep 17 00:00:00 2001
From: Madhava Reddy Siddareddygari <msiddare@cisco.com>
Date: Mon, 16 Aug 2021 12:55:37 -0700
Subject: [PATCH] NPU disable unused PCI BAR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it an upstreamable commit message summary (How to Write a Git Commit Message).

Further debugging and consultion with Hardware team, issue seems
to be related to ASIC and workaround suggested is to disable
unused PCI BAR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still open.

patch/cisco-npu-disable-other-bars.patch Show resolved Hide resolved
+#define PCI_DEVICE_ID_LEABA_KRYPTON 0xa006
+
+/*
+ * For Pacific A0, only BAR 0 is valid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an errata document name and number for reference.

@msiddare
Copy link
Contributor Author

Closing this PR as issue is observed only on early P0 proto boards.

@msiddare msiddare closed this Sep 21, 2021
@msiddare msiddare deleted the cisco/npu-fix branch September 21, 2021 22:56
msiddare added a commit to msiddare/sonic-linux-kernel that referenced this pull request Nov 19, 2021
For Cisco ASIC only BAR0 is valid. Not disabling other BAR's
was resulting in pci_enable_device function failure in P0
Pacific ASIC's. Further debugging and consultion with Hardware
team, issue seems to be related to only P0 version of ASIC and
workaround suggested is to disable unused PCI BAR.

This is same pull request reviewed ealier,
sonic-net#225

Cisco experimented removing this patch, but identified that
there are boards in the field with Pacific ASIC.

This patch is need for these ASIC's to work, hence re-submitting.

Signed-off-by: Madhava Reddy Siddareddygari <msiddare@cisco.com>
saiarcot895 pushed a commit that referenced this pull request Dec 6, 2021
For Cisco ASIC only BAR0 is valid. Not disabling other BARs
was resulting in pci_enable_device function failure in P0
Pacific ASIC's. Further debugging and consultation with Hardware
team, issue seems to be related to only P0 version of ASIC and
workaround suggested is to disable unused PCI BAR.

This is same pull request reviewed ealier,
#225

Cisco experimented removing this patch, but identified that
there are boards in the field with Pacific ASIC.

This patch is need for these ASIC's to work, hence re-submitting.

Signed-off-by: Madhava Reddy Siddareddygari <msiddare@cisco.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants