Skip to content

Latest commit

 

History

History
16 lines (11 loc) · 893 Bytes

README.md

File metadata and controls

16 lines (11 loc) · 893 Bytes

Naive parallel Floyd-Warshall algorithm (all-pairs shortest path on dense weighted graphs) using OpenMP.

Performance

Plot above is time (us) vs number of threads (on a machine with 4 cores as you can guess), for a 1000x1000 matrix of random floats. The multi-threaded version is up to 3.6x faster than single-threaded on 4 cores, which is pretty good for a memory-bound algorithm. However, it's still a O(n^3) algorithm.

Additionally enabling auto-vectorization in gcc 5.4.0 (-march=native -ftree-vectorize) results in a total of up to 14x speedup on a 4-core Intel i5-3570 (sse4.2), see plot below:

Vectorized performance