Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues of NCCL Allreduce #29

Closed
czkkkkkk opened this issue Nov 19, 2019 · 4 comments
Closed

Performance issues of NCCL Allreduce #29

czkkkkkk opened this issue Nov 19, 2019 · 4 comments

Comments

@czkkkkkk
Copy link

czkkkkkk commented Nov 19, 2019

Hi. I try to repeat the performance of allreduce in the official document, which has the bandwidth of about 40 GB/s. I try to run the test on the similar environment. Here is the environment setting:

  1. 4 servers, and each has 8 V100 16GB GPUs.
  2. The GPUs within a server are connected by two nvlinks.
  3. The servers are interconnected by 4xEDR 100Gbps infiniband.

The NCCL version is 2.4.7 and the CUDA version is 10.0

And the command I used to run the test

mpirun -np 32 -H server1:8,server2:8,server3:8,server4:8 \
-x NCCL_DEBUG=INFO \
./build/all_reduce_perf -b 8 -e 512M -f 2

But results is far from expectation. The busbw is about 6 GB/s. I think the only different of my setting and DGX-1 is I only have 2 nvlinks within a servers, which I think is not the bottleneck.
image

Here are what I am confused:

  1. Why the performance of nccl in my environment is so slow? I think it should be at least 12 GB/s.
  2. Why the bandwidth in DGX-1 can reach 40 GB/s? From my understanding, one EDR can provide the bandwidth of about 3GB/s. So the upperbound bandwidth for 4xEDR is 12 GB/s. And it would become the bottleneck of the allreduce operation. Is there anything special make the connection faster in DGX-1?

Thank you for your attention.

@sjeaugey
Copy link
Member

sjeaugey commented Nov 19, 2019

Getting the full log would help figuring out why the performance is not good.

Without much details, I can only give a list of things to check :

Sylvain

@czkkkkkk
Copy link
Author

czkkkkkk commented Dec 6, 2019

Thank you Sylvain.

Sorry for the late response. According to your suggestions:

  • I enable the GPU Direct RDMA and the performance doubled, which I think is very awesome.
  • I think I have disabled ACS according to the output of lspci -vvv | grep ACSCtl.
  • It is not running inside a VM.
  • I cannot change to NCCL 2.5 since I run on a cluster shared by many users. Would it help significantly if I change to NCCL2.5?
  • PCI and NVLink topology. The output of nvidia-smi topo -m is:
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    mlx5_0  mlx5_1  CPU Affinity
GPU0     X      NV2     NV2     NV1     NV1     SYS     SYS     SYS     NODE    NODE    0-23,48-71
GPU1    NV2      X      NV1     NV2     SYS     NV1     SYS     SYS     NODE    NODE    0-23,48-71
GPU2    NV2     NV1      X      NV1     SYS     SYS     NV2     SYS     PIX     PIX     0-23,48-71
GPU3    NV1     NV2     NV1      X      SYS     SYS     SYS     NV2     PIX     PIX     0-23,48-71
GPU4    NV1     SYS     SYS     SYS      X      NV2     NV2     NV1     SYS     SYS     24-47,72-95
GPU5    SYS     NV1     SYS     SYS     NV2      X      NV1     NV2     SYS     SYS     24-47,72-95
GPU6    SYS     SYS     NV2     SYS     NV2     NV1      X      NV1     SYS     SYS     24-47,72-95
GPU7    SYS     SYS     SYS     NV2     NV1     NV2     NV1      X      SYS     SYS     24-47,72-95
mlx5_0  NODE    NODE    PIX     PIX     SYS     SYS     SYS     SYS      X      PIX
mlx5_1  NODE    NODE    PIX     PIX     SYS     SYS     SYS     SYS     PIX      X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing a single PCIe switch
  NV#  = Connection traversing a bonded set of # NVLinks

About the PCI connection, I am not very sure. Seems that every two GPUs are attached on the same PCIe bus. The output of lspci -tv are like:

-+-[0000:d7]-+-00.0-[d8-ea]----00.0-[d9-ea]--+-04.0-[da-dd]--
 |           |                               +-08.0-[de]----00.0  PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch
 |           |                               +-0c.0-[df-e2]----00.0  NVIDIA Corporation Device 1db1
 |           |                               +-10.0-[e3-e6]--
 |           |                               \-14.0-[e7-ea]----00.0  NVIDIA Corporation Device 1db1
 +-[0000:ae]-+-00.0-[af-c1]----00.0-[b0-c1]--+-04.0-[b1-b4]--
 |           |                               +-08.0-[b5-b8]----00.0  NVIDIA Corporation Device 1db1
 |           |                               +-0c.0-[b9]----00.0  PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch
 |           |                               +-10.0-[ba-bd]--
 |           |                               \-14.0-[be-c1]----00.0  NVIDIA Corporation Device 1db1
 +-[0000:53]-+-00.0-[54-66]----00.0-[55-66]--+-04.0-[56-59]--+-00.0  Mellanox Technologies MT28800 Family [ConnectX-5]
 |           |                               |               \-00.1  Mellanox Technologies MT28800 Family [ConnectX-5]
 |           |                               +-08.0-[5a]----00.0  PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch
 |           |                               +-0c.0-[5b-5e]----00.0  NVIDIA Corporation Device 1db1
 |           |                               +-10.0-[5f-62]----00.0  NVIDIA Corporation Device 1db1
 |           |                               \-14.0-[63-66]--
 +-[0000:26]-+-00.0-[27-3a]----00.0-[28-3a]--+-04.0-[29-2c]--+-00.0  Intel Corporation Ethernet Controller 10-Gigabit X540-AT2
 |           |                               |               \-00.1  Intel Corporation Ethernet Controller 10-Gigabit X540-AT2
 |           |                               +-08.0-[2d-30]----00.0  NVIDIA Corporation Device 1db1
 |           |                               +-0c.0-[31]----00.0  PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch
 |           |                               +-10.0-[32-35]----00.0  NVIDIA Corporation Device 1db1
 |           |                               +-14.0-[36-39]--
 |           |                               \-15.0-[3a]--

The full log after I installed GPU Direct RDMA are:

# nThread 1 nGpus 1 minBytes 8 maxBytes 536870912 step: 2(factor) warmup iters: 5 iters: 20 validation: 1
#
# Using devices
#   Rank  0 Pid 117944 on    server1 device  0 [0x2d] Tesla V100-SXM2-16GB
#   Rank  1 Pid 117945 on    server1 device  1 [0x32] Tesla V100-SXM2-16GB
#   Rank  2 Pid 117946 on    server1 device  2 [0x5b] Tesla V100-SXM2-16GB
#   Rank  3 Pid 117947 on    server1 device  3 [0x5f] Tesla V100-SXM2-16GB
#   Rank  4 Pid 117948 on    server1 device  4 [0xb5] Tesla V100-SXM2-16GB
#   Rank  5 Pid 117949 on    server1 device  5 [0xbe] Tesla V100-SXM2-16GB
#   Rank  6 Pid 117950 on    server1 device  6 [0xdf] Tesla V100-SXM2-16GB
#   Rank  7 Pid 117953 on    server1 device  7 [0xe7] Tesla V100-SXM2-16GB
#   Rank  8 Pid  23913 on    server2 device  0 [0x2d] Tesla V100-SXM2-16GB
#   Rank  9 Pid  23914 on    server2 device  1 [0x32] Tesla V100-SXM2-16GB
#   Rank 10 Pid  23915 on    server2 device  2 [0x5b] Tesla V100-SXM2-16GB
#   Rank 11 Pid  23916 on    server2 device  3 [0x5f] Tesla V100-SXM2-16GB
#   Rank 12 Pid  23917 on    server2 device  4 [0xb5] Tesla V100-SXM2-16GB
#   Rank 13 Pid  23918 on    server2 device  5 [0xbe] Tesla V100-SXM2-16GB
#   Rank 14 Pid  23919 on    server2 device  6 [0xdf] Tesla V100-SXM2-16GB
#   Rank 15 Pid  23920 on    server2 device  7 [0xe7] Tesla V100-SXM2-16GB
#   Rank 16 Pid  15216 on    server3 device  0 [0x2d] Tesla V100-SXM2-16GB
#   Rank 17 Pid  15217 on    server3 device  1 [0x32] Tesla V100-SXM2-16GB
#   Rank 18 Pid  15218 on    server3 device  2 [0x5b] Tesla V100-SXM2-16GB
#   Rank 19 Pid  15219 on    server3 device  3 [0x5f] Tesla V100-SXM2-16GB
#   Rank 20 Pid  15220 on    server3 device  4 [0xb5] Tesla V100-SXM2-16GB
#   Rank 21 Pid  15221 on    server3 device  5 [0xbe] Tesla V100-SXM2-16GB
#   Rank 22 Pid  15222 on    server3 device  6 [0xdf] Tesla V100-SXM2-16GB
#   Rank 23 Pid  15223 on    server3 device  7 [0xe7] Tesla V100-SXM2-16GB
#   Rank 24 Pid  45886 on    server4 device  0 [0x2d] Tesla V100-SXM2-16GB
#   Rank 25 Pid  45887 on    server4 device  1 [0x32] Tesla V100-SXM2-16GB
#   Rank 26 Pid  45888 on    server4 device  2 [0x5b] Tesla V100-SXM2-16GB
#   Rank 27 Pid  45889 on    server4 device  3 [0x5f] Tesla V100-SXM2-16GB
#   Rank 28 Pid  45890 on    server4 device  4 [0xb5] Tesla V100-SXM2-16GB
#   Rank 29 Pid  45891 on    server4 device  5 [0xbe] Tesla V100-SXM2-16GB
#   Rank 30 Pid  45892 on    server4 device  6 [0xdf] Tesla V100-SXM2-16GB
#   Rank 31 Pid  45893 on    server4 device  7 [0xe7] Tesla V100-SXM2-16GB
server1:117944:117944 [0] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server1:117944:117944 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server1:117944:117944 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server2:23917:23917 [4] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server2:23917:23917 [4] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server2:23920:23920 [7] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server2:23920:23920 [7] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server2:23917:23917 [4] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server2:23920:23920 [7] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server2:23915:23915 [2] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server2:23915:23915 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server2:23915:23915 [2] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server2:23913:23913 [0] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server2:23913:23913 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server2:23914:23914 [1] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server2:23919:23919 [6] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server2:23914:23914 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server2:23919:23919 [6] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server2:23918:23918 [5] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server2:23918:23918 [5] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server2:23916:23916 [3] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server2:23916:23916 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server2:23913:23913 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server2:23914:23914 [1] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server2:23918:23918 [5] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server2:23916:23916 [3] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server2:23919:23919 [6] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server2:23917:24012 [4] NCCL INFO Setting affinity for GPU 4 to ffffff00,0000ffff,ff000000
server2:23917:24012 [4] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server2:23920:24013 [7] NCCL INFO Setting affinity for GPU 7 to ffffff00,0000ffff,ff000000
server2:23920:24013 [7] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
NCCL version 2.4.7+cuda10.0
server2:23915:24014 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffff0000,00ffffff
server2:23915:24014 [2] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server3:15217:15217 [1] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0> [1]virbr0:<0>
server3:15217:15217 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server3:15217:15217 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB enp41s0f1:<0>
server2:23913:24015 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffff0000,00ffffff
server2:23913:24015 [0] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server2:23914:24016 [1] NCCL INFO Setting affinity for GPU 1 to ff,ffff0000,00ffffff
server2:23914:24016 [1] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server2:23916:24017 [3] NCCL INFO Setting affinity for GPU 3 to ff,ffff0000,00ffffff
server2:23916:24017 [3] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server2:23918:24018 [5] NCCL INFO Setting affinity for GPU 5 to ffffff00,0000ffff,ff000000
server2:23918:24018 [5] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server2:23919:24019 [6] NCCL INFO Setting affinity for GPU 6 to ffffff00,0000ffff,ff000000
server2:23919:24019 [6] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server3:15218:15218 [2] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0> [1]virbr0:<0>
server3:15218:15218 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server3:15220:15220 [4] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0> [1]virbr0:<0>
server3:15220:15220 [4] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server3:15221:15221 [5] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0> [1]virbr0:<0>
server3:15221:15221 [5] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server3:15218:15218 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB enp41s0f1:<0>
server3:15222:15222 [6] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0> [1]virbr0:<0>
server3:15222:15222 [6] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server3:15220:15220 [4] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB enp41s0f1:<0>
server3:15223:15223 [7] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0> [1]virbr0:<0>
server3:15223:15223 [7] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server3:15216:15216 [0] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0> [1]virbr0:<0>
server3:15216:15216 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server3:15219:15219 [3] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0> [1]virbr0:<0>
server3:15221:15221 [5] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB enp41s0f1:<0>
server3:15219:15219 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server3:15222:15222 [6] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB enp41s0f1:<0>
server1:117953:117953 [7] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server1:117953:117953 [7] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server3:15223:15223 [7] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB enp41s0f1:<0>
server3:15219:15219 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB enp41s0f1:<0>
server3:15216:15216 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB enp41s0f1:<0>
server1:117945:117945 [1] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server1:117945:117945 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server1:117953:117953 [7] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server1:117945:117945 [1] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server1:117949:117949 [5] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server1:117949:117949 [5] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server1:117949:117949 [5] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server1:117947:117947 [3] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server1:117950:117950 [6] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server1:117947:117947 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server1:117948:117948 [4] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server1:117950:117950 [6] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server1:117946:117946 [2] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server1:117948:117948 [4] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server1:117946:117946 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server1:117947:117947 [3] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server1:117946:117946 [2] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server1:117950:117950 [6] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server1:117948:117948 [4] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server1:117944:118307 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffff0000,00ffffff
server1:117944:118307 [0] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server3:15217:15314 [1] NCCL INFO Setting affinity for GPU 1 to ff,ffff0000,00ffffff
server3:15217:15314 [1] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server4:45893:45893 [7] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server4:45893:45893 [7] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server3:15218:15315 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffff0000,00ffffff
server3:15218:15315 [2] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server4:45893:45893 [7] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server1:117953:118322 [7] NCCL INFO Setting affinity for GPU 7 to ffffff00,0000ffff,ff000000
server1:117953:118322 [7] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server4:45892:45892 [6] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server4:45886:45886 [0] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server4:45892:45892 [6] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server4:45886:45886 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server4:45888:45888 [2] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server4:45888:45888 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server4:45890:45890 [4] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server4:45890:45890 [4] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server3:15222:15316 [6] NCCL INFO Setting affinity for GPU 6 to ffffff00,0000ffff,ff000000
server4:45891:45891 [5] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server4:45887:45887 [1] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server4:45891:45891 [5] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server4:45889:45889 [3] NCCL INFO NET/Socket : Using [0]enp41s0f1:<0>
server4:45887:45887 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server4:45889:45889 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
server3:15222:15316 [6] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server1:117945:118323 [1] NCCL INFO Setting affinity for GPU 1 to ff,ffff0000,00ffffff
server1:117945:118323 [1] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server4:45886:45886 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server4:45888:45888 [2] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server4:45892:45892 [6] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server4:45890:45890 [4] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server3:15221:15317 [5] NCCL INFO Setting affinity for GPU 5 to ffffff00,0000ffff,ff000000
server3:15221:15317 [5] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server3:15223:15319 [7] NCCL INFO Setting affinity for GPU 7 to ffffff00,0000ffff,ff000000
server4:45889:45889 [3] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server3:15223:15319 [7] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server4:45891:45891 [5] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server4:45887:45887 [1] NCCL INFO NET/IB : Using [0]mlx5_1:1/IB ; OOB enp41s0f1:<0>
server3:15220:15320 [4] NCCL INFO Setting affinity for GPU 4 to ffffff00,0000ffff,ff000000
server3:15220:15320 [4] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server1:117949:118326 [5] NCCL INFO Setting affinity for GPU 5 to ffffff00,0000ffff,ff000000
server1:117949:118326 [5] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server3:15219:15321 [3] NCCL INFO Setting affinity for GPU 3 to ff,ffff0000,00ffffff
server3:15219:15321 [3] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server3:15216:15318 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffff0000,00ffffff
server3:15216:15318 [0] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server1:117947:118331 [3] NCCL INFO Setting affinity for GPU 3 to ff,ffff0000,00ffffff
server1:117947:118331 [3] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server1:117946:118332 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffff0000,00ffffff
server1:117946:118332 [2] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server1:117948:118333 [4] NCCL INFO Setting affinity for GPU 4 to ffffff00,0000ffff,ff000000
server1:117948:118333 [4] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server1:117950:118334 [6] NCCL INFO Setting affinity for GPU 6 to ffffff00,0000ffff,ff000000
server1:117950:118334 [6] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server4:45893:45987 [7] NCCL INFO Setting affinity for GPU 7 to ffffff00,0000ffff,ff000000
server4:45893:45987 [7] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server4:45886:45988 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffff0000,00ffffff
server4:45886:45988 [0] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server4:45888:45989 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffff0000,00ffffff
server4:45888:45989 [2] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server4:45890:45991 [4] NCCL INFO Setting affinity for GPU 4 to ffffff00,0000ffff,ff000000
server4:45892:45990 [6] NCCL INFO Setting affinity for GPU 6 to ffffff00,0000ffff,ff000000
server4:45890:45991 [4] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server4:45892:45990 [6] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server4:45889:45992 [3] NCCL INFO Setting affinity for GPU 3 to ff,ffff0000,00ffffff
server4:45889:45992 [3] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server4:45891:45993 [5] NCCL INFO Setting affinity for GPU 5 to ffffff00,0000ffff,ff000000
server4:45891:45993 [5] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server4:45887:45994 [1] NCCL INFO Setting affinity for GPU 1 to ff,ffff0000,00ffffff
server4:45887:45994 [1] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
server4:45886:45988 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance :  NODE
server3:15222:15316 [6] NCCL INFO CUDA Dev 6[6], IB NIC distance :  SYS
server4:45889:45992 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance :  PIX
server3:15223:15319 [7] NCCL INFO CUDA Dev 7[7], IB NIC distance :  SYS
server2:23920:24013 [7] NCCL INFO CUDA Dev 7[7], IB NIC distance :  SYS
server3:15221:15317 [5] NCCL INFO CUDA Dev 5[5], IB NIC distance :  SYS
server2:23919:24019 [6] NCCL INFO CUDA Dev 6[6], IB NIC distance :  SYS
server4:45887:45994 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance :  NODE
server3:15220:15320 [4] NCCL INFO CUDA Dev 4[4], IB NIC distance :  SYS
server4:45888:45989 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance :  PIX
server2:23917:24012 [4] NCCL INFO CUDA Dev 4[4], IB NIC distance :  SYS
server3:15219:15321 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance :  PIX
server2:23918:24018 [5] NCCL INFO CUDA Dev 5[5], IB NIC distance :  SYS
server3:15216:15318 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance :  NODE
server4:45890:45991 [4] NCCL INFO CUDA Dev 4[4], IB NIC distance :  SYS
server3:15218:15315 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance :  PIX
server2:23916:24017 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance :  PIX
server4:45891:45993 [5] NCCL INFO CUDA Dev 5[5], IB NIC distance :  SYS
server3:15217:15314 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance :  NODE
server2:23913:24015 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance :  NODE
server2:23915:24014 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance :  PIX
server4:45893:45987 [7] NCCL INFO CUDA Dev 7[7], IB NIC distance :  SYS
server4:45892:45990 [6] NCCL INFO CUDA Dev 6[6], IB NIC distance :  SYS
server2:23914:24016 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance :  NODE
server1:117945:118323 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance :  NODE
server1:117944:118307 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance :  NODE
server1:117946:118332 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance :  PIX
server1:117953:118322 [7] NCCL INFO CUDA Dev 7[7], IB NIC distance :  SYS
server1:117947:118331 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance :  PIX
server1:117950:118334 [6] NCCL INFO CUDA Dev 6[6], IB NIC distance :  SYS
server1:117948:118333 [4] NCCL INFO CUDA Dev 4[4], IB NIC distance :  SYS
server1:117949:118326 [5] NCCL INFO CUDA Dev 5[5], IB NIC distance :  SYS
server1:117944:118307 [0] NCCL INFO Channel 00 :    0   1   5   4   6   7   3  10   8   9  13  12  14  15  11  18  16  17  21  20
server1:117944:118307 [0] NCCL INFO Channel 01 :    0   1   5   4   6   7   3  10   8   9  13  12  14  15  11  18  16  17  21  20
server3:15218:15315 [2] NCCL INFO Ring 00 : 11 -> 18 [receive] via NET/IB/0/GDRDMA
server2:23915:24014 [2] NCCL INFO Ring 00 : 3 -> 10 [receive] via NET/IB/0/GDRDMA
server1:117946:118332 [2] NCCL INFO Ring 00 : 27 -> 2 [receive] via NET/IB/0/GDRDMA
server4:45888:45989 [2] NCCL INFO Ring 00 : 19 -> 26 [receive] via NET/IB/0/GDRDMA
server3:15218:15315 [2] NCCL INFO Ring 00 : 18[2] -> 16[0] via P2P/IPC
server2:23915:24014 [2] NCCL INFO Ring 00 : 10[2] -> 8[0] via P2P/IPC
server1:117946:118332 [2] NCCL INFO Ring 00 : 2[2] -> 0[0] via P2P/IPC
server4:45888:45989 [2] NCCL INFO Ring 00 : 26[2] -> 24[0] via P2P/IPC
server3:15223:15319 [7] NCCL INFO Ring 00 : 23[7] -> 19[3] via P2P/IPC
server3:15217:15314 [1] NCCL INFO Ring 00 : 17[1] -> 21[5] via P2P/IPC
server2:23913:24015 [0] NCCL INFO Ring 00 : 8[0] -> 9[1] via P2P/IPC
server4:45893:45987 [7] NCCL INFO Ring 00 : 31[7] -> 27[3] via P2P/IPC
server1:117953:118322 [7] NCCL INFO Ring 00 : 7[7] -> 3[3] via P2P/IPC
server3:15222:15316 [6] NCCL INFO Ring 00 : 22[6] -> 23[7] via P2P/IPC
server3:15221:15317 [5] NCCL INFO Ring 00 : 21[5] -> 20[4] via P2P/IPC
server4:45892:45990 [6] NCCL INFO Ring 00 : 30[6] -> 31[7] via P2P/IPC
server2:23917:24012 [4] NCCL INFO Ring 00 : 12[4] -> 14[6] via P2P/IPC
server3:15216:15318 [0] NCCL INFO Ring 00 : 16[0] -> 17[1] via P2P/IPC
server1:117945:118323 [1] NCCL INFO Ring 00 : 1[1] -> 5[5] via P2P/IPC
server3:15220:15320 [4] NCCL INFO Ring 00 : 20[4] -> 22[6] via P2P/IPC
server2:23919:24019 [6] NCCL INFO Ring 00 : 14[6] -> 15[7] via P2P/IPC
server1:117950:118334 [6] NCCL INFO Ring 00 : 6[6] -> 7[7] via P2P/IPC
server2:23918:24018 [5] NCCL INFO Ring 00 : 13[5] -> 12[4] via P2P/IPC
server2:23920:24013 [7] NCCL INFO Ring 00 : 15[7] -> 11[3] via P2P/IPC
server1:117948:118333 [4] NCCL INFO Ring 00 : 4[4] -> 6[6] via P2P/IPC
server4:45886:45988 [0] NCCL INFO Ring 00 : 24[0] -> 25[1] via P2P/IPC
server2:23914:24016 [1] NCCL INFO Ring 00 : 9[1] -> 13[5] via P2P/IPC
server1:117944:118307 [0] NCCL INFO Ring 00 : 0[0] -> 1[1] via P2P/IPC
server1:117949:118326 [5] NCCL INFO Ring 00 : 5[5] -> 4[4] via P2P/IPC
server4:45887:45994 [1] NCCL INFO Ring 00 : 25[1] -> 29[5] via P2P/IPC
server4:45891:45993 [5] NCCL INFO Ring 00 : 29[5] -> 28[4] via P2P/IPC
server4:45890:45991 [4] NCCL INFO Ring 00 : 28[4] -> 30[6] via P2P/IPC
server3:15219:15321 [3] NCCL INFO Ring 00 : 19 -> 26 [send] via NET/IB/0/GDRDMA
server2:23916:24017 [3] NCCL INFO Ring 00 : 11 -> 18 [send] via NET/IB/0/GDRDMA
server4:45889:45992 [3] NCCL INFO Ring 00 : 27 -> 2 [send] via NET/IB/0/GDRDMA
server1:117947:118331 [3] NCCL INFO Ring 00 : 3 -> 10 [send] via NET/IB/0/GDRDMA
server4:45888:45989 [2] NCCL INFO Ring 01 : 19 -> 26 [receive] via NET/IB/0/GDRDMA
server2:23915:24014 [2] NCCL INFO Ring 01 : 3 -> 10 [receive] via NET/IB/0/GDRDMA
server3:15218:15315 [2] NCCL INFO Ring 01 : 11 -> 18 [receive] via NET/IB/0/GDRDMA
server1:117946:118332 [2] NCCL INFO Ring 01 : 27 -> 2 [receive] via NET/IB/0/GDRDMA
server2:23913:24015 [0] NCCL INFO Ring 01 : 8[0] -> 9[1] via P2P/IPC
server1:117953:118322 [7] NCCL INFO Ring 01 : 7[7] -> 3[3] via P2P/IPC
server4:45893:45987 [7] NCCL INFO Ring 01 : 31[7] -> 27[3] via P2P/IPC
server2:23917:24012 [4] NCCL INFO Ring 01 : 12[4] -> 14[6] via P2P/IPC
server2:23919:24019 [6] NCCL INFO Ring 01 : 14[6] -> 15[7] via P2P/IPC
server2:23918:24018 [5] NCCL INFO Ring 01 : 13[5] -> 12[4] via P2P/IPC
server4:45892:45990 [6] NCCL INFO Ring 01 : 30[6] -> 31[7] via P2P/IPC
server3:15223:15319 [7] NCCL INFO Ring 01 : 23[7] -> 19[3] via P2P/IPC
server1:117945:118323 [1] NCCL INFO Ring 01 : 1[1] -> 5[5] via P2P/IPC
server3:15217:15314 [1] NCCL INFO Ring 01 : 17[1] -> 21[5] via P2P/IPC
server2:23920:24013 [7] NCCL INFO Ring 01 : 15[7] -> 11[3] via P2P/IPC
server2:23914:24016 [1] NCCL INFO Ring 01 : 9[1] -> 13[5] via P2P/IPC
server3:15221:15317 [5] NCCL INFO Ring 01 : 21[5] -> 20[4] via P2P/IPC
server1:117950:118334 [6] NCCL INFO Ring 01 : 6[6] -> 7[7] via P2P/IPC
server1:117948:118333 [4] NCCL INFO Ring 01 : 4[4] -> 6[6] via P2P/IPC
server3:15222:15316 [6] NCCL INFO Ring 01 : 22[6] -> 23[7] via P2P/IPC
server3:15216:15318 [0] NCCL INFO Ring 01 : 16[0] -> 17[1] via P2P/IPC
server1:117944:118307 [0] NCCL INFO Ring 01 : 0[0] -> 1[1] via P2P/IPC
server3:15220:15320 [4] NCCL INFO Ring 01 : 20[4] -> 22[6] via P2P/IPC
server1:117949:118326 [5] NCCL INFO Ring 01 : 5[5] -> 4[4] via P2P/IPC
server4:45886:45988 [0] NCCL INFO Ring 01 : 24[0] -> 25[1] via P2P/IPC
server4:45887:45994 [1] NCCL INFO Ring 01 : 25[1] -> 29[5] via P2P/IPC
server4:45891:45993 [5] NCCL INFO Ring 01 : 29[5] -> 28[4] via P2P/IPC
server4:45890:45991 [4] NCCL INFO Ring 01 : 28[4] -> 30[6] via P2P/IPC
server4:45888:45989 [2] NCCL INFO Ring 01 : 26[2] -> 24[0] via P2P/IPC
server2:23915:24014 [2] NCCL INFO Ring 01 : 10[2] -> 8[0] via P2P/IPC
server3:15218:15315 [2] NCCL INFO Ring 01 : 18[2] -> 16[0] via P2P/IPC
server1:117946:118332 [2] NCCL INFO Ring 01 : 2[2] -> 0[0] via P2P/IPC
server1:117944:118307 [0] NCCL INFO Using 256 threads, Min Comp Cap 7, Trees disabled
server2:23917:24012 [4] NCCL INFO comm 0x7f04f8001ca0 rank 12 nranks 32 cudaDev 4 nvmlDev 4 - Init COMPLETE
server2:23919:24019 [6] NCCL INFO comm 0x7f4fc0001ca0 rank 14 nranks 32 cudaDev 6 nvmlDev 6 - Init COMPLETE
server1:117953:118322 [7] NCCL INFO comm 0x7f62f0001ca0 rank 7 nranks 32 cudaDev 7 nvmlDev 7 - Init COMPLETE
server4:45893:45987 [7] NCCL INFO comm 0x7f79b8001ca0 rank 31 nranks 32 cudaDev 7 nvmlDev 7 - Init COMPLETE
server2:23918:24018 [5] NCCL INFO comm 0x7f1208001ca0 rank 13 nranks 32 cudaDev 5 nvmlDev 5 - Init COMPLETE
server3:15223:15319 [7] NCCL INFO comm 0x7f99ec001ca0 rank 23 nranks 32 cudaDev 7 nvmlDev 7 - Init COMPLETE
server1:117945:118323 [1] NCCL INFO comm 0x7f3ba0001ca0 rank 1 nranks 32 cudaDev 1 nvmlDev 1 - Init COMPLETE
server2:23916:24017 [3] NCCL INFO Ring 01 : 11 -> 18 [send] via NET/IB/0/GDRDMA
server1:117947:118331 [3] NCCL INFO Ring 01 : 3 -> 10 [send] via NET/IB/0/GDRDMA
server3:15219:15321 [3] NCCL INFO Ring 01 : 19 -> 26 [send] via NET/IB/0/GDRDMA
server3:15221:15317 [5] NCCL INFO comm 0x7fe020001ca0 rank 21 nranks 32 cudaDev 5 nvmlDev 5 - Init COMPLETE
server4:45892:45990 [6] NCCL INFO comm 0x7f4c6c001ca0 rank 30 nranks 32 cudaDev 6 nvmlDev 6 - Init COMPLETE
server4:45889:45992 [3] NCCL INFO Ring 01 : 27 -> 2 [send] via NET/IB/0/GDRDMA
server2:23920:24013 [7] NCCL INFO comm 0x7f5780001ca0 rank 15 nranks 32 cudaDev 7 nvmlDev 7 - Init COMPLETE
server3:15217:15314 [1] NCCL INFO comm 0x7fd178001ca0 rank 17 nranks 32 cudaDev 1 nvmlDev 1 - Init COMPLETE
server2:23914:24016 [1] NCCL INFO comm 0x7f7e7c001ca0 rank 9 nranks 32 cudaDev 1 nvmlDev 1 - Init COMPLETE
server3:15220:15320 [4] NCCL INFO comm 0x7f7e7c001ca0 rank 20 nranks 32 cudaDev 4 nvmlDev 4 - Init COMPLETE
server1:117950:118334 [6] NCCL INFO comm 0x7f2240001ca0 rank 6 nranks 32 cudaDev 6 nvmlDev 6 - Init COMPLETE
server4:45886:45988 [0] NCCL INFO comm 0x7fcf50001ca0 rank 24 nranks 32 cudaDev 0 nvmlDev 0 - Init COMPLETE
server4:45887:45994 [1] NCCL INFO comm 0x7f14bc001ca0 rank 25 nranks 32 cudaDev 1 nvmlDev 1 - Init COMPLETE
server2:23913:24015 [0] NCCL INFO comm 0x7f23d0001ca0 rank 8 nranks 32 cudaDev 0 nvmlDev 0 - Init COMPLETE
server1:117948:118333 [4] NCCL INFO comm 0x7fd1f8001ca0 rank 4 nranks 32 cudaDev 4 nvmlDev 4 - Init COMPLETE
server3:15222:15316 [6] NCCL INFO comm 0x7fbeac001ca0 rank 22 nranks 32 cudaDev 6 nvmlDev 6 - Init COMPLETE
server1:117949:118326 [5] NCCL INFO comm 0x7f5e40001ca0 rank 5 nranks 32 cudaDev 5 nvmlDev 5 - Init COMPLETE
server4:45891:45993 [5] NCCL INFO comm 0x7f24d0001ca0 rank 29 nranks 32 cudaDev 5 nvmlDev 5 - Init COMPLETE
server4:45890:45991 [4] NCCL INFO comm 0x7f7650001ca0 rank 28 nranks 32 cudaDev 4 nvmlDev 4 - Init COMPLETE
server3:15216:15318 [0] NCCL INFO comm 0x7f6860001ca0 rank 16 nranks 32 cudaDev 0 nvmlDev 0 - Init COMPLETE
server1:117944:118307 [0] NCCL INFO comm 0x7f7054001ca0 rank 0 nranks 32 cudaDev 0 nvmlDev 0 - Init COMPLETE
server2:23916:24017 [3] NCCL INFO comm 0x7f0cd8001ca0 rank 11 nranks 32 cudaDev 3 nvmlDev 3 - Init COMPLETE
#
#                                                     out-of-place                       in-place
#       size         count    type   redop     time   algbw   busbw  error     time   algbw   busbw  error
#        (B)    (elements)                     (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
server1:117944:117944 [0] NCCL INFO Launch mode Parallel
server1:117947:118331 [3] NCCL INFO comm 0x7f9448001ca0 rank 3 nranks 32 cudaDev 3 nvmlDev 3 - Init COMPLETE
server4:45889:45992 [3] NCCL INFO comm 0x7f8a70001ca0 rank 27 nranks 32 cudaDev 3 nvmlDev 3 - Init COMPLETE
server3:15219:15321 [3] NCCL INFO comm 0x7f07f8001ca0 rank 19 nranks 32 cudaDev 3 nvmlDev 3 - Init COMPLETE
server2:23915:24014 [2] NCCL INFO comm 0x7ff6e8001ca0 rank 10 nranks 32 cudaDev 2 nvmlDev 2 - Init COMPLETE
server3:15218:15315 [2] NCCL INFO comm 0x7fe1e8001ca0 rank 18 nranks 32 cudaDev 2 nvmlDev 2 - Init COMPLETE
server4:45888:45989 [2] NCCL INFO comm 0x7f6bfc001ca0 rank 26 nranks 32 cudaDev 2 nvmlDev 2 - Init COMPLETE
server1:117946:118332 [2] NCCL INFO comm 0x7f8420001ca0 rank 2 nranks 32 cudaDev 2 nvmlDev 2 - Init COMPLETE
           8             2   float     sum    160.5    0.00    0.00  2e-07    57.13    0.00    0.00  2e-07
          16             4   float     sum    56.12    0.00    0.00  1e-07    55.54    0.00    0.00  1e-07
          32             8   float     sum    55.86    0.00    0.00  2e-07    57.09    0.00    0.00  2e-07
          64            16   float     sum    56.36    0.00    0.00  2e-07    55.67    0.00    0.00  2e-07
         128            32   float     sum    56.39    0.00    0.00  2e-07    55.59    0.00    0.00  2e-07
         256            64   float     sum    57.62    0.00    0.01  2e-07    56.94    0.00    0.01  2e-07
         512           128   float     sum    59.15    0.01    0.02  2e-07    58.30    0.01    0.02  1e-07
        1024           256   float     sum    60.31    0.02    0.03  7e-07    60.14    0.02    0.03  7e-07
        2048           512   float     sum    62.23    0.03    0.06  7e-07    61.86    0.03    0.06  7e-07
        4096          1024   float     sum    65.93    0.06    0.12  7e-07    64.73    0.06    0.12  7e-07
        8192          2048   float     sum    72.30    0.11    0.22  7e-07    70.99    0.12    0.22  7e-07
       16384          4096   float     sum    80.94    0.20    0.39  1e-06    79.77    0.21    0.40  1e-06
       32768          8192   float     sum    88.43    0.37    0.72  1e-06    86.18    0.38    0.74  1e-06
       65536         16384   float     sum    104.8    0.63    1.21  1e-06    103.2    0.63    1.23  1e-06
      131072         32768   float     sum    122.4    1.07    2.07  1e-06    118.7    1.10    2.14  1e-06
      262144         65536   float     sum    165.0    1.59    3.08  1e-06    164.9    1.59    3.08  1e-06
      524288        131072   float     sum    260.3    2.01    3.90  1e-06    260.2    2.01    3.90  1e-06
     1048576        262144   float     sum    443.6    2.36    4.58  1e-06    442.3    2.37    4.59  1e-06
     2097152        524288   float     sum    860.2    2.44    4.72  1e-06    852.0    2.46    4.77  1e-06
     4194304       1048576   float     sum   1163.6    3.60    6.98  1e-06   1166.2    3.60    6.97  1e-06
     8388608       2097152   float     sum   1505.1    5.57   10.80  1e-06   1487.8    5.64   10.92  1e-06
    16777216       4194304   float     sum   2708.9    6.19   12.00  1e-06   2714.1    6.18   11.98  1e-06
    33554432       8388608   float     sum   5407.7    6.20   12.02  1e-06   5379.8    6.24   12.08  1e-06
    67108864      16777216   float     sum    10730    6.25   12.12  1e-06    10718    6.26   12.13  1e-06
   134217728      33554432   float     sum    21347    6.29   12.18  1e-06    21347    6.29   12.18  1e-06
   268435456      67108864   float     sum    42677    6.29   12.19  1e-06    42702    6.29   12.18  1e-06
   536870912     134217728   float     sum    85342    6.29   12.19  1e-06    85291    6.29   12.20  1e-06
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 4.14076
#

@sjeaugey
Copy link
Member

sjeaugey commented Dec 6, 2019

OK so performance is good now. 12.2 GB/s (for a 100Gbps card) is as good as you can get.

DGX-1 can get 48 GB/s because we have 4 IB adapters instead of 1 in your case. I was confused by the "4xEDR 100GBps Infiniband"... 4x here is just the Infiniband cable width (which translates to 100GBps), not the number of IB adapters.

@czkkkkkk
Copy link
Author

czkkkkkk commented Dec 9, 2019

Oh, I see. Thank you so much.

@czkkkkkk czkkkkkk closed this as completed Dec 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants