Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mstlink cannot query the temperature of a specific network card on the server (it works with the network card name, but not with the LID #972

Open
xiaolongzhou123 opened this issue Jul 22, 2024 · 7 comments
Assignees

Comments

@xiaolongzhou123
Copy link

mlxlink -d lid-0x1cb -p 1 works.
Compiling mlxlink (mstlink) from https://github.com/Mellanox/mstflint does not work.

image

@HarelKarni
Copy link
Contributor

Please provide additional information like:
What is device you are trying to use for repro? What is the mstflint version are you using?

@xiaolongzhou123
Copy link
Author

I downloaded and compiled from: https://github.com/Mellanox/mstflint
No matter which tag version I use, or the latest version, it doesn't work.

  1. autogen.sh
  2. /configure --prefix=/opt/jl --enable-adb-generic-tools

root@p-jn-sz-cw-h1-su2-gpu31-402-22c-12u-208-65:/opt/jl/bin# ./mstlink -d lid-0x01e5 -p 10/2 -m
Segmentation fault (core dumped)
root@p-jn-sz-cw-h1-su2-gpu31-402-22c-12u-208-65:/opt/jl/bin# ./mstlink -v
mstlink, mstflint 4.28.0, Git SHA Hash: cc30ec
Executing the command using mlxlink works without errors:

mlxlink -d lid-0x01e5 -p 10/2 -m

root@p-jn-sz-cw-h1-su2-gpu31-402-22c-12u-208-65:~/c/mstflint# ofed_info -s
MLNX_OFED_LINUX-23.10-2.1.3.1:

root@p-jn-sz-cw-h1-su2-gpu31-402-22c-12u-208-65:~/c/mstflint# mlxlink -v
mlxlink, mft 4.26.1-3, built on Nov 27 2023, 15:26:06. Git SHA Hash: N/A

image
image

@xiaolongzhou123
Copy link
Author

image

@xiaolongzhou123
Copy link
Author

@HarelKarni

MlxRegLib::isAccessRegisterSupported, calling status = get_icmd_query_cap(mf, &icmd_cap);, directly causes a Segmentation fault (core dumped).

The GDB debugging information is as follows:
image

Please have an expert take a look: can it run after compilation? Are there any other necessary software or libraries required? Is support available?

@HarelKarni
Copy link
Contributor

@xiaolongzhou123
As I understand from out infrastructure team, this issue was resolved in the newest version (master_devel).
Can you check please and let me know?

@xiaolongzhou123
Copy link
Author

xiaolongzhou123 commented Jul 25, 2024

@HarelKarni
master_devel branch ,It’s still giving an error. It doesn't work.

Could it be that I need to install some dependencies first?
I have tested compiling with multiple versions. All of them give the same error. But mlxlink doesn't give an error. It might be executed through Lid.

However, when I compile mstlink -d lid-xxx, it always gives an error, and I have no idea why.

So, I suspect it might be a configuration issue... I'm about to cry.

@xiaolongzhou123
Copy link
Author

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants