Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topology XML file #86

Closed
guntrogu opened this issue May 26, 2021 · 2 comments
Closed

Topology XML file #86

guntrogu opened this issue May 26, 2021 · 2 comments

Comments

@guntrogu
Copy link

Hi,

I would like to set the NCCL_TOPO_FILE to a valid value before running all_reduce_perf script.
What is the correct location for this file, or should I generate it ?

Thanks!

@sjeaugey
Copy link
Member

Why would you like to set this file?

The NCCL_TOPO_FILE is usually generated by the VM provider so you should not need to generate it yourself, and the VM provider should document where to find this file.

Now if you're running your own VM, then indeed you may want to dump the XML topology on the baremetal system (NCCL_TOPO_DUMP_FILE=system.xml), keep only the <cpu> and <pci> tags (remove the <gpu> and <nic>), then inject it inside the VM using NCCL_TOPO_FILE), provided the PCI IDs are the same inside the VM and on baremetal, otherwise you'll have to adjust the PCI bus IDs to match what's inside the VM.

@AddyLaddy
Copy link
Collaborator

No further feedback so I'll close this issue. Please open another one if you are still seeing issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants