You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 23, 2020. It is now read-only.
I am working with @Larofeticus at LBNL, who is interested in performance with threading on KNL. He had some questions that some of you could help me with.
The openMP pragmas are all around nCells and nEdges loops. Why don't we have them around block loops? Was it tried in the past? It seems like that would reduce the OpenMP overhead by reducing OMP calls and logic.
I have never used multiple blocks per rank. Has anyone tested it lately? I assume each block is a single graph partition, but that each rank only has a halo around the combined blocks, not a halo around each block. Thanks!
The text was updated successfully, but these errors were encountered:
@mark-petersen and @Larofeticus ; Doug and Abhinav tried this earlier and found little difference performance-wise either way (around blocks or around loops), though suspect some of the memory issues that @amametjanov encounters might be reduced under a block version if done correctly. But now, because no one uses multiple blocks, the use of multiple blocks per rank is actually broken in many places. We did discuss this on the recent threading call and most agreed that the current block architecture needs to be re-written or abandoned as the linked-list structure gets in the way of optimizations in halo exchange, among other things. Most of us believe we need to move to either a set of dedicated index ranges (both @Larofeticus and the atmosphere groups had options for that) or move to a block index on arrays. I've had a design doc on my to-do list for a while to lay out these options...
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I am working with @Larofeticus at LBNL, who is interested in performance with threading on KNL. He had some questions that some of you could help me with.
The openMP pragmas are all around nCells and nEdges loops. Why don't we have them around block loops? Was it tried in the past? It seems like that would reduce the OpenMP overhead by reducing OMP calls and logic.
I have never used multiple blocks per rank. Has anyone tested it lately? I assume each block is a single graph partition, but that each rank only has a halo around the combined blocks, not a halo around each block. Thanks!
The text was updated successfully, but these errors were encountered: