You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been using version 0.5.0 and observed some performance inconsistencies across different nodes in my Spark cluster. Specifically, some nodes execute tasks significantly faster than others, with the difference in execution times ranging from tens to thousands of times slower on certain nodes.
Given this situation, I'm curious to know if there are any CPU-specific optimizations made during the compilation of this library. For instance, are there optimizations that favor Intel CPUs over AMD CPUs, which might explain the observed performance disparity?
Any insights or suggestions on this matter would be greatly appreciated.
The text was updated successfully, but these errors were encountered:
TensorFlow will optimize things based on the available CPU instructions, so if you have Intel Xeons with AVX-512 and older AMD Epycs without AVX-512 then you'll get a lot faster matrix multiplies and convolution operations on the Intel CPUs. I think we compile against AVX 1, but it pulls in MKL for matrix operations and that has fast paths for more complicated vector instructions. As MKL is made by Intel it might also favour their CPUs in other ways, but we don't have much control over that.
Tens to thousands of times slower doesn't sound right though, typically I'd expect AVX-512 to result in at most a 2x speedup over AVX 2. Are there other differences between these nodes beyond the CPU?
I've been using version 0.5.0 and observed some performance inconsistencies across different nodes in my Spark cluster. Specifically, some nodes execute tasks significantly faster than others, with the difference in execution times ranging from tens to thousands of times slower on certain nodes.
Given this situation, I'm curious to know if there are any CPU-specific optimizations made during the compilation of this library. For instance, are there optimizations that favor Intel CPUs over AMD CPUs, which might explain the observed performance disparity?
Any insights or suggestions on this matter would be greatly appreciated.
The text was updated successfully, but these errors were encountered: