Improve performance of network I/O in comparison to MPI #12133

amitmurthy · 2015-07-13T15:44:13Z

I benchmarked the time taken to transfer Float64 arrays of lengths 1, 10, 100, 1000, 10,000 and 100,000
a total of 10,000 times each.

Julia MPI code is here - https://gist.github.com/amitmurthy/6a3dda483f2008e2a4b7 . In each iteration, the processes asynchronously send/recv data and then wait for this step to complete before the next iteration.

Julia put!/take! using RemoteRefs is here - https://gist.github.com/amitmurthy/50aaa18bb65487773fa4
In each iteration, the put!s are asynchronous, but the processes synchronize on the take!s given the single value store nature of RemoteRefs.

Julia direct socket read/write code is here - https://gist.github.com/amitmurthy/c583f4dbf19e02498e61
The server just echoes the data sent, so we need to halve the timings to get the overhead of just one-sided data transfer.

Timings

number of floats : time in seconds for 10,000 iterations of send/recv

MPI (mpich)
----------------
1 : 0.062286847
10 : 0.024415091
100 : 0.029516504
1000 : 0.063592197
10000 : 0.378325158
100000 : 4.200537498


put!/take!
------------
1 : 1.194750299
10 : 0.699608227
100 : 0.735959668
1000 : 0.893945718
10000 : 1.119156809
100000 : 4.078035043


Direct socket read/write
--------------------------------
1 : 0.193145016
10 : 0.205704565
100 : 0.329878817
1000 : 0.361115851
10000 : 0.88611087
100000 : 4.382292916

As can be seen, the data transport overhead in Julia is high even in the direct socket transfer case, i.e. without the overhead of serialization or the RemoteRef implementation. And the fixed overhead seems quite high as can be seen for transfers upto 1000 floats.

While this has been tangentially noted in other issues, I hope simple the simple benchmark against MPI will help us narrow down the causes of this slowness, especially at the network IO / libuv layer.

The text was updated successfully, but these errors were encountered:

malmaud · 2015-07-13T15:49:48Z

To get at the effect of libuv vs julia, what about a benchmark in C that uses libuv to implement the direct socket test?

jakebolewski · 2015-07-13T15:51:31Z

Did you send these between two computers over TCP? You need to do some work on the MPI side to make sure that you are doing an apples to apples comparison if you are on a single physical node.

amitmurthy · 2015-07-13T15:55:14Z

@malmaud that is a good idea - I'll do that.

@jakebolewski No, it is all local. What do you mean by "work on the MPI side" ? The benchmark mimics the put!/take! code in the sense that both sides are doing non-blocking Irecv!, Isend before doing a Waitall! on the previous two calls.

jakebolewski · 2015-07-13T16:00:58Z

MPI implementations will often optimize interprocess communication on a physical node by communicating over shared memory segments. You need to make sure that you force the implementation to use the network layer if you are benchmarking communication on the same node. You can usually do this through command line flags.

jakebolewski · 2015-07-13T16:03:44Z

Although same node interprocess communication is important to optimize, these benchmarks should really be run in a true distributed setting.

amitmurthy · 2015-07-13T16:16:23Z

This benchmark is to just get an idea of Julia overhead and where it we can be optimized. Till the time we have true multithreading, a bunch of distributed Julia will continue to be be run on many-core machines via the multi-processor model we have today.

Yeah, the MPI code may be using shmem, I'll re-run forcing TCP and update the numbers.

JeffBezanson · 2015-07-13T17:07:37Z

Same as #9992?

amitmurthy · 2015-07-14T02:39:13Z

Similar, but I wanted to narrow the focus in this issue to just the socket layer in the stack.

amitmurthy added performance Must go faster io Involving the I/O subsystem: libuv, read, write, etc. labels Jul 13, 2015

ViralBShah added the parallelism Parallel or distributed computation label Apr 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of network I/O in comparison to MPI #12133

Improve performance of network I/O in comparison to MPI #12133

amitmurthy commented Jul 13, 2015

malmaud commented Jul 13, 2015

jakebolewski commented Jul 13, 2015

amitmurthy commented Jul 13, 2015

jakebolewski commented Jul 13, 2015

jakebolewski commented Jul 13, 2015

amitmurthy commented Jul 13, 2015

JeffBezanson commented Jul 13, 2015

amitmurthy commented Jul 14, 2015

Improve performance of network I/O in comparison to MPI #12133

Improve performance of network I/O in comparison to MPI #12133

Comments

amitmurthy commented Jul 13, 2015

Timings

malmaud commented Jul 13, 2015

jakebolewski commented Jul 13, 2015

amitmurthy commented Jul 13, 2015

jakebolewski commented Jul 13, 2015

jakebolewski commented Jul 13, 2015

amitmurthy commented Jul 13, 2015

JeffBezanson commented Jul 13, 2015

amitmurthy commented Jul 14, 2015