-
Notifications
You must be signed in to change notification settings - Fork 918
Hybrid parallel coloring fallback strategies (better strong scaling and user friendliness) #908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
b1f667a
Merge branch 'small_blas_docs_update' into feature_hybrid_parallel_an…
pcarruscag 3611c10
Merge remote-tracking branch 'upstream/develop' into feature_hybrid_p…
pcarruscag e428ef0
Merge branch 'small_blas_docs_update' into feature_hybrid_parallel_an…
pcarruscag 66f4ac5
Merge remote-tracking branch 'upstream/develop' into feature_hybrid_p…
pcarruscag 148d948
prevent restarted FGMRES from going into infinite loop when RHS is zero
pcarruscag 4dddf6d
fix for "old compiler" compatibility and legacy build system
pcarruscag a4c1cf7
Merge remote-tracking branch 'upstream/develop' into feature_hybrid_p…
pcarruscag 9344f88
update old build system
pcarruscag f4e8185
unnecessary initialization of stiffness matrix in CMeshSolver
pcarruscag 101b52d
fix OpenMP bug in SetMesh_Stiffness, allow upper bound on element sti…
pcarruscag aba265f
potential fix for potential cause of observed deadlock
pcarruscag 998777d
add dummy locks and functions to omp_structure
pcarruscag 7e25a93
bad coloring fallback strategy for CFEASolver
pcarruscag 32ef194
use the "reducer strategy" as a fallback for when grid coloring is ba…
pcarruscag 14ed268
build diagonal and transpose map in parallel
pcarruscag 9cd52b6
Merge remote-tracking branch 'upstream/feature_update_elasticity_outp…
pcarruscag d7bb841
use a more expressive function to round up to next multiple
pcarruscag 2fc6f30
use reducer strategy for turbulence solvers
pcarruscag c88ea7a
polish up the reducer strategy, avoid unnecessary resets of CSysMatrix
pcarruscag 12c12b1
Merge remote-tracking branch 'upstream/develop' into feature_hybrid_p…
pcarruscag bcf6af4
small regression changes
pcarruscag 19f2fe5
write Undivided Laplacian as point loop
pcarruscag 13e9572
write Centered_Dissipation_Sensor as a point loop
pcarruscag 733aef1
allow forcing of "reducer strategy" without warnings, fix some indent…
pcarruscag debb952
update fixedcl regression
pcarruscag b406fa3
methods to set natural colorings
pcarruscag fb500f2
fix FEA solver lock strategy
pcarruscag db8d771
make overhead of reducer strategy due to bad coloring same as when co…
pcarruscag f4dd41a
fuse "JST dissipation" loops, cleanup Euler/NS preprocessing
pcarruscag 775980e
fix virtual bug in SetPrimitiveVariables
pcarruscag 45e3cbc
fix some indentation in CDriver
pcarruscag 82e2d6d
Merge remote-tracking branch 'upstream/develop' into feature_hybrid_p…
pcarruscag 7fb78f8
add hybrid options to config_template
pcarruscag 2aaf89f
convert moving mesh part of turb solver's dual time residual to point…
pcarruscag fea625e
revise logic for flow variable reconstruction with MUSCL turbulence
pcarruscag cdf6b2c
Merge remote-tracking branch 'upstream/develop' into hybrid_parallel_…
pcarruscag b6361f1
update some testcases
pcarruscag File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -84,11 +84,6 @@ class CAkimaInterpolation final: public C1DInterpolation{ | |
| SetSpline(X,Data); | ||
| } | ||
|
|
||
| /*! | ||
| * \brief Destructor of the CAkimaInterpolation class. | ||
| */ | ||
| ~CAkimaInterpolation(){} | ||
|
|
||
|
Comment on lines
-87
to
-91
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was removed for old compiler compatibility. |
||
| /*! | ||
| * \brief for setting the cofficients for the Akima spline. | ||
| * \param[in] X - the x values. | ||
|
|
@@ -119,11 +114,6 @@ class CLinearInterpolation final: public C1DInterpolation{ | |
| SetSpline(X,Data); | ||
| } | ||
|
|
||
| /*! | ||
| * \brief Destructor of the CInletInterpolation class. | ||
| */ | ||
| ~CLinearInterpolation(){} | ||
|
|
||
| /*! | ||
| * \brief for setting the cofficients for Linear 'spline'. | ||
| * \param[in] X - the x values. | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,6 +28,7 @@ | |
| #pragma once | ||
|
|
||
| #include "C2DContainer.hpp" | ||
| #include "../omp_structure.hpp" | ||
|
|
||
| #include <set> | ||
| #include <vector> | ||
|
|
@@ -59,6 +60,7 @@ class CCompressedSparsePattern { | |
| su2vector<Index_t> m_outerPtr; /*!< \brief Start positions of the inner indices for each outer index. */ | ||
| su2vector<Index_t> m_innerIdx; /*!< \brief Inner indices of the non zero entries. */ | ||
| su2vector<Index_t> m_diagPtr; /*!< \brief Position of the diagonal entry. */ | ||
| su2vector<Index_t> m_innerIdxTransp; /*!< \brief Position of the transpose non zero entries, requires symmetry. */ | ||
|
|
||
| public: | ||
| using IndexType = Index_t; | ||
|
|
@@ -107,10 +109,30 @@ class CCompressedSparsePattern { | |
| if(!m_diagPtr.empty()) return; | ||
|
|
||
| m_diagPtr.resize(getOuterSize()); | ||
|
|
||
| SU2_OMP_PARALLEL_(for schedule(static,roundUpDiv(getOuterSize(),omp_get_max_threads()))) | ||
| for(Index_t k = 0; k < getOuterSize(); ++k) | ||
| m_diagPtr(k) = findInnerIdx(k,k); | ||
| } | ||
|
|
||
| /*! | ||
| * \brief Build a list of pointers to the transpose entries of the pattern, requires symmetry. | ||
| */ | ||
| void buildTransposePtr() { | ||
| if(!m_innerIdxTransp.empty()) return; | ||
|
|
||
| m_innerIdxTransp.resize(getNumNonZeros()); | ||
|
|
||
| SU2_OMP_PARALLEL_(for schedule(static,roundUpDiv(getOuterSize(),omp_get_max_threads()))) | ||
| for(Index_t i = 0; i < getOuterSize(); ++i) { | ||
| for(Index_t k = m_outerPtr(i); k < m_outerPtr(i+1); ++k) { | ||
| auto j = m_innerIdx(k); | ||
| m_innerIdxTransp(k) = findInnerIdx(j,i); | ||
| assert(m_innerIdxTransp(k) != m_innerIdx.size() && "The pattern is not symmetric."); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /*! | ||
| * \return True if the pattern is empty, i.e. has not been built yet. | ||
| */ | ||
|
|
@@ -224,6 +246,14 @@ class CCompressedSparsePattern { | |
| return m_diagPtr.data(); | ||
| } | ||
|
|
||
| /*! | ||
| * \return Raw pointer to the transpose pointer vector. | ||
| */ | ||
| inline const su2vector<Index_t>& transposePtr() const { | ||
| assert(!m_innerIdxTransp.empty() && "Transpose map has not been built."); | ||
| return m_innerIdxTransp; | ||
| } | ||
|
|
||
| /*! | ||
| * \return The minimum inner index. | ||
| */ | ||
|
|
@@ -384,6 +414,30 @@ CEdgeToNonZeroMap<Index_t> mapEdgesToSparsePattern(Geometry_t& geometry, | |
| } | ||
|
|
||
|
|
||
| /*! | ||
| * \brief Create the natural coloring (equivalent to the normal sequential loop | ||
| * order) for a given number of inner indexes. | ||
| * \note This is to reduce overhead in "OpenMP-ready" code when only 1 thread is used. | ||
| * \param[in] numInnerIndexes - Number of indexes that are to be colored. | ||
| * \return Natural (sequential) coloring of the inner indices. | ||
| */ | ||
| template<class T = CCompressedSparsePatternUL, | ||
| class Index_t = typename T::IndexType> | ||
| T createNaturalColoring(Index_t numInnerIndexes) | ||
| { | ||
| /*--- One color. ---*/ | ||
| su2vector<Index_t> outerPtr(2); | ||
| outerPtr(0) = 0; | ||
| outerPtr(1) = numInnerIndexes; | ||
|
|
||
| /*--- Containing all indexes in ascending order. ---*/ | ||
| su2vector<Index_t> innerIdx(numInnerIndexes); | ||
| std::iota(innerIdx.data(), innerIdx.data()+numInnerIndexes, 0); | ||
|
|
||
| return T(std::move(outerPtr), std::move(innerIdx)); | ||
| } | ||
|
|
||
|
|
||
| /*! | ||
| * \brief Color contiguous groups of outer indices of a sparse pattern such that | ||
| * within each color, any two groups do not have inner indices in common. | ||
|
|
@@ -404,7 +458,7 @@ CEdgeToNonZeroMap<Index_t> mapEdgesToSparsePattern(Geometry_t& geometry, | |
| * \param[out] indexColor - Optional, vector with colors given to the outer indices. | ||
| * \return Coloring in the same type of the input pattern. | ||
| */ | ||
| template<class T, typename Color_t = char, size_t MaxColors = 64, size_t MaxMB = 128> | ||
| template<class T, typename Color_t = char, size_t MaxColors = 32, size_t MaxMB = 128> | ||
| T colorSparsePattern(const T& pattern, size_t groupSize = 1, bool balanceColors = false, | ||
| std::vector<Color_t>* indexColor = nullptr) | ||
| { | ||
|
|
@@ -415,6 +469,10 @@ T colorSparsePattern(const T& pattern, size_t groupSize = 1, bool balanceColors | |
|
|
||
| const Index_t grpSz = groupSize; | ||
| const Index_t nOuter = pattern.getOuterSize(); | ||
|
|
||
| /*--- Trivial case. ---*/ | ||
| if(groupSize >= nOuter) return createNaturalColoring(nOuter); | ||
|
|
||
| const Index_t minIdx = pattern.getMinInnerIdx(); | ||
| const Index_t nInner = pattern.getMaxInnerIdx()+1-minIdx; | ||
|
|
||
|
|
@@ -520,30 +578,6 @@ T colorSparsePattern(const T& pattern, size_t groupSize = 1, bool balanceColors | |
| } | ||
|
|
||
|
|
||
| /*! | ||
| * \brief Create the natural coloring (equivalent to the normal sequential loop | ||
| * order) for a given number of inner indexes. | ||
| * \note This is to reduce overhead in "OpenMP-ready" code when only 1 thread is used. | ||
| * \param[in] numInnerIndexes - Number of indexes that are to be colored. | ||
| * \return Natural (sequential) coloring of the inner indices. | ||
| */ | ||
| template<class T = CCompressedSparsePatternUL, | ||
| class Index_t = typename T::IndexType> | ||
| T createNaturalColoring(Index_t numInnerIndexes) | ||
| { | ||
| /*--- One color. ---*/ | ||
| su2vector<Index_t> outerPtr(2); | ||
| outerPtr(0) = 0; | ||
| outerPtr(1) = numInnerIndexes; | ||
|
|
||
| /*--- Containing all indexes in ascending order. ---*/ | ||
| su2vector<Index_t> innerIdx(numInnerIndexes); | ||
| std::iota(innerIdx.data(), innerIdx.data()+numInnerIndexes, 0); | ||
|
|
||
| return T(std::move(outerPtr), std::move(innerIdx)); | ||
| } | ||
|
|
||
|
|
||
| /*! | ||
| * \brief A way to represent one grid color that allows range-for syntax. | ||
| */ | ||
|
|
@@ -553,9 +587,11 @@ struct GridColor | |
| static_assert(std::is_integral<T>::value,""); | ||
|
|
||
| const T size; | ||
| T groupSize; | ||
| const T* const indices; | ||
|
|
||
| GridColor(const T* idx = nullptr, T sz = 0) : size(sz), indices(idx) { } | ||
| GridColor(const T* idx = nullptr, T sz = 0, T grp = 0) : | ||
| size(sz), groupSize(grp), indices(idx) { } | ||
|
|
||
| inline const T* begin() const {return indices;} | ||
| inline const T* end() const {return indices+size;} | ||
|
|
@@ -592,3 +628,23 @@ struct DummyGridColor | |
| inline IteratorLikeInt begin() const {return IteratorLikeInt(0);} | ||
| inline IteratorLikeInt end() const {return IteratorLikeInt(size);} | ||
| }; | ||
|
|
||
|
|
||
| /*! | ||
| * \brief Computes the efficiency of a grid coloring for given number of threads and chunk size. | ||
| */ | ||
| template<class SparsePattern> | ||
| su2double coloringEfficiency(const SparsePattern& coloring, int numThreads, int chunkSize) | ||
| { | ||
| using Index_t = typename SparsePattern::IndexType; | ||
|
|
||
| /*--- Ideally compute time is proportional to total work over number of threads. ---*/ | ||
| su2double ideal = coloring.getNumNonZeros() / su2double(numThreads); | ||
|
|
||
| /*--- In practice the total work is quantized first by colors and then by chunks. ---*/ | ||
| Index_t real = 0; | ||
| for(Index_t color = 0; color < coloring.getOuterSize(); ++color) | ||
| real += chunkSize * roundUpDiv(roundUpDiv(coloring.getNumNonZeros(color), chunkSize), numThreads); | ||
|
Comment on lines
+641
to
+647
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The computation of coloring efficiency is described here, it is just a simple heuristic. |
||
|
|
||
| return ideal / real; | ||
| } | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As usual we define "do-nothing" types and functions when compiling without hybrid parallel support to make compilation compatible without having to throw #ifdefs everywhere.