-
Notifications
You must be signed in to change notification settings - Fork 935
Description
On master branch
I observe a strange behavior. I think that openib may be using too large of a hammer for numa membinding, possibly setting the wrong memory binding policy for the vader and sm shared memory segments. I've only come to this conclusion empirically based on performance numbers.
For example, I have a RHEL 6.5 node with a single Mellanox Technologies MT25204 [InfiniHost III Lx HCA] ConnectX-3 card with a single port active.
Bad Latency run single host:
$ mpirun -host "mpi03" -np 4 --bind-to core --report-bindings --mca btl openib,vader,self ./ping_pong_ring.x2
[mpi03:12941] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[mpi03:12941] MCW rank 1 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..]
[mpi03:12941] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[mpi03:12941] MCW rank 3 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..]
[0:mpi03] ping-pong 0 bytes ...
0 bytes: 7.11 usec/msg
[1:mpi03] ping-pong 0 bytes ...
0 bytes: 7.10 usec/msg
[2:mpi03] ping-pong 0 bytes ...
0 bytes: 7.15 usec/msg
[3:mpi03] ping-pong 0 bytes ...
0 bytes: 7.17 usec/msgSimilar behavior with sm:
$ mpirun -host "mpi03" -np 4 --bind-to core --report-bindings --mca btl openib,sm,self ./ping_pong_ring.x2
[mpi03:14928] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[mpi03:14928] MCW rank 1 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..]
[mpi03:14928] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[mpi03:14928] MCW rank 3 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..]
[0:mpi03] ping-pong 0 bytes ...
0 bytes: 7.45 usec/msg
[1:mpi03] ping-pong 0 bytes ...
0 bytes: 7.38 usec/msg
[2:mpi03] ping-pong 0 bytes ...
0 bytes: 7.35 usec/msg
[3:mpi03] ping-pong 0 bytes ...
0 bytes: 7.38 usec/msgWhen I remove openib results look much better:
$ mpirun -host "mpi03" -np 4 --bind-to core --report-bindings --mca btl vader,self ./ping_pong_ring.x2
[mpi03:15819] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[mpi03:15819] MCW rank 1 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..]
[mpi03:15819] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[mpi03:15819] MCW rank 3 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..]
[0:mpi03] ping-pong 0 bytes ...
0 bytes: 0.50 usec/msg
[1:mpi03] ping-pong 0 bytes ...
0 bytes: 0.50 usec/msg
[2:mpi03] ping-pong 0 bytes ...
0 bytes: 0.49 usec/msg
[3:mpi03] ping-pong 0 bytes ...
0 bytes: 0.51 usec/msgSimilar behavior with sm (though it's half as fast as vader):
$ mpirun -host "mpi03" -np 4 --bind-to core --report-bindings --mca btl sm,self ./ping_pong_ring.x2
[mpi03:16608] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[mpi03:16608] MCW rank 1 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..]
[mpi03:16608] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[mpi03:16608] MCW rank 3 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..]
[0:mpi03] ping-pong 0 bytes ...
0 bytes: 0.98 usec/msg
[1:mpi03] ping-pong 0 bytes ...
0 bytes: 1.00 usec/msg
[2:mpi03] ping-pong 0 bytes ...
0 bytes: 0.95 usec/msg
[3:mpi03] ping-pong 0 bytes ...
0 bytes: 0.93 usec/msgIf I disable binding explicitly with --bind-to none, even when specifying openib I see the expected results (with either vader or sm, but now sm is the same speed as vader... weird):
$ mpirun -host "mpi03" -np 4 --bind-to none --report-bindings --mca btl openib,vader,self ./ping_pong_ring.x2
[mpi03:20206] MCW rank 1 is not bound (or bound to all available processors)
[mpi03:20205] MCW rank 0 is not bound (or bound to all available processors)
[mpi03:20207] MCW rank 2 is not bound (or bound to all available processors)
[mpi03:20208] MCW rank 3 is not bound (or bound to all available processors)
[0:mpi03] ping-pong 0 bytes ...
0 bytes: 0.50 usec/msg
[1:mpi03] ping-pong 0 bytes ...
0 bytes: 0.50 usec/msg
[2:mpi03] ping-pong 0 bytes ...
0 bytes: 0.50 usec/msg
[3:mpi03] ping-pong 0 bytes ...
0 bytes: 0.49 usec/msg$ mpirun -host "mpi03" -np 4 --bind-to none --report-bindings --mca btl openib,sm,self ./ping_pong_ring.x2
[mpi03:21058] MCW rank 0 is not bound (or bound to all available processors)
[mpi03:21059] MCW rank 1 is not bound (or bound to all available processors)
[mpi03:21060] MCW rank 2 is not bound (or bound to all available processors)
[mpi03:21061] MCW rank 3 is not bound (or bound to all available processors)
[0:mpi03] ping-pong 0 bytes ...
0 bytes: 0.50 usec/msg
[1:mpi03] ping-pong 0 bytes ...
0 bytes: 0.51 usec/msg
[2:mpi03] ping-pong 0 bytes ...
0 bytes: 0.51 usec/msg
[3:mpi03] ping-pong 0 bytes ...
0 bytes: 0.49 usec/msgFinally just for completeness... the best 0 byte ping pong ring times I could get was with --bind-to core --map-by core:
$ mpirun -host "mpi03" -np 4 --bind-to core --map-by core --report-bindings --mca btl vader,self ./ping_pong_ring.x2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
[mpi03:32149] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[mpi03:32149] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[mpi03:32149] MCW rank 2 bound to socket 0[core 2[hwt 0-1]]: [../../BB/../../../../..][../../../../../../../..]
[mpi03:32149] MCW rank 3 bound to socket 0[core 3[hwt 0-1]]: [../../../BB/../../../..][../../../../../../../..]
[0:mpi03] ping-pong 0 bytes ...
0 bytes: 0.37 usec/msg
[1:mpi03] ping-pong 0 bytes ...
0 bytes: 0.37 usec/msg
[2:mpi03] ping-pong 0 bytes ...
0 bytes: 0.38 usec/msg
[3:mpi03] ping-pong 0 bytes ...
0 bytes: 0.38 usec/msgI've attached my source for ping_pong_ring.c: