-
Notifications
You must be signed in to change notification settings - Fork 54
[nvidia-6.11-next] mediatek EINT driver for NVIDIA CX7 hotplug management #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nvidia-6.11-next] mediatek EINT driver for NVIDIA CX7 hotplug management #175
Conversation
fd87a89 to
8020a11
Compare
|
@abhsahu I took a look at this and have several questions/comments... cx7_hp_probe(): Nit: Couldn’t these check be performed earlier in probe? Nit: app_ctx->desc is a pointer, no? Use NULL instead. Nit: Flip the conditional, if (!app_ctx->ctx) , to avoid indenting the the good path Question: Why does err_irq exist? Question: It is valid/safe to continue onward when these fail?
Suggestion: It might be more straightforward to merge these 2 so there is just one set of conditionals Nit: It would be cleaner to use an enum or #define for these plugin values (I had to look at plugin_store() to understand what the numbers meant) plugin_store() Question: Is the lack of serialization for “state” between this thread an in interrupt kthread simply because this is a debug mechanism? **acpi_gpio_resource_handler()** Question: Is it valid to return “AE_OK” when ares->type is not ACPI_RESOURCE_TYPE_GPIO ? cx7_hp_ckm_control() Nit: use bool instead of int for the disable parameter, and then true/false at the call sites |
|
I tried this PR in one system but when I did the echo for hotplug did not go well. So waiting for another system at colossus where I can have console. |
|
@abhsahu Im seeing that the PRSNT is 1 so after the CX7s load at boot time then they get removed. This will confuse people. |
|
8020a11 to
79106ce
Compare
I have moved this earlier.
I have changed to
I have filled the conditional.
I have changed kzalloc() to devm_kzalloc() so that we don't have to handle free explicitly.
We are not expecting the probe to be called twice.
I have renamed this to
I have merged these into 1.
I have updated code to use enum.
This is debug mechanism, so we have not added any serialization currently.
I think, its fine to return AE_OK so that we will continue to traverse the tree.
I have updated this to use bool. |
This hardcoding is according to DGX Spark build.
Was that system DGX spark (DIGITS) or some other system ? |
|
|
|
Thanks for the responses and code updates @abhsahu! Two additional comments on the updated probe():
|
|
79106ce to
6ca3135
Compare
Thanks @clsotog I have added error prints in all error path and did reser of |
|
Thanks @abhsahu for addressing these...no further issues for me.
|
|
Hi @abhsahu Sorry removing my arb. I found out why the system from nbu has different lscpi. It is because of FW. |
6ca3135 to
dc31a94
Compare
|
I have updated the commit to add check for CX7 vendor ID and device ID. |
dc31a94 to
4883b0c
Compare
|
I missed to return error in my previous change so fixed the same. |
4883b0c to
4d4ac87
Compare
|
@abhsahu - Can you provide more context about the changes you just pushed? |
|
|
I tried both Mellanox FWs and the code worked. |
clsotog
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acked-by: Carol L Soto <csoto@nvidia.com>
|
In earlier version, it is hardcoding the PCI address. With older version of Mellanox firmware: 000:00:00.0 PCI bridge: NVIDIA Corporation Device 22ce (rev 01) With newer version of Mellanox firmware 0000:00:00.0 PCI bridge: NVIDIA Corporation Device 22ce (rev 01) with newer firmware version, it has removed the ConnectX-7 PCIe Bridge and now we are seeing CX7 devices in PCI address 0000:01:00.0 and 0000:01:00.0 instead of 0000:03:00.0 and 0000:03:00.1 for domain 0. With latest patch, instead of hardcoding PCI address (Domain:Bus:Device.Function), it check the devices with CX7 PCI vendor ID and device ID in Domain 0 and 2 and counts that. If this count matches with GB10 board expected count (4), then probe continues, otherwise probe will fail. |
Got it, thanks for clarifying! |
This driver is used to manage PCIe link for NVIDIA ConnectX-7 (CX7) hot-plug/unplug on DGX Spark. We need to disable PCIe link when CX7 cable plug out happens and enable pcie link when CX7 cable plug in happens. It also creates a sysfs entry to emulate cable plug in/out behavior as below: plug in - echo 1 > /sys/devices/platform/MTKP0001\:00/cx7_dbg/plugin plug out - echo 0 > /sys/devices/platform/MTKP0001\:00/cx7_dbg/plugin We also implement uevent to notify user-space applications when a cable is plugged in or removed. Below are the details of our process: * cable plug-in: 1. report plug-in uevent (driver) 2. enable pcie link (application) 3. rescan devices (application) * cable removal: 1. report removal uevent (driver) 2. remove devices (application) 3. disable pcie link (application) Signed-off-by: Jerry.Guo <jerry.guo@mediatek.com> Signed-off-by: Yenchia Chen <yenchia.chen@mediatek.com> Signed-off-by: Shubhi Garg <shgarg@nvidia.com> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
…T_MTK_HP Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
4d4ac87 to
31209a6
Compare
|
|
Confirmed this was the only change in the latest push. |
|
|
|
Closing this PR I do not think this will get in based on comments from PR 220. |
This PR adds a new driver for managing PCIe link when NVIDIA CX7 hot plug/unplug happens