-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dist2 initial improvements #623
Dist2 initial improvements #623
Conversation
* Filtering remote columns, making MueLu aggregation actually work on distributed problems (apart for a nondeterministic bug in MueLu UncoupledAggregation_kokkos) * Removed old distance-2 serial algorithm from distance-1 coloring handle * Implemented a much faster (>= 18x) net-based serial algorithm to replace the old one * Implemented a modestly faster (2x) net-based parallel algorithm (COLORING_D2_VB_DYNAMIC) - more improvement should be possible in future
meaning net-based, with bitset forbidden.
* Don't require a number after "--serial" * Do require a number after "--cuda" (the device ID) * If --flag expects another arg after it, but at the end of argc/argv, print a message saying that instead of crashign
Let me understand this, we still have the d2 bug when diagonal entries are not present. Otherwise, things are ok ? |
@srajama1 Correct. |
Ok, reviewing it now. Can we file two issues, one for these changes. one for the missing diagonal giving wrong results ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brian-kelley : Thanks for the work on cleaning this up.
Two minor comments. No need to rerun spot check if you end up fixing the typo.
const_lno_nnz_view_t Ccolinds; | ||
const nnz_lno_type numVerts; | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to write this algorithm up in latex.
Thanks, @brian-kelley ! The issues will help @ndellingwood track these for the release. We also need to update the documentation. |
COLORING_D2_NB_BIT
). It gives a 2x speedup over VB and VB_BIT on GPU, but can still be improved.Also, @srajama1 and I just found out about a serious bug in all D2 algorithms where D1 conflicts ignored (the algorithms assume that diagonal edges exist, but this is wrong). The D2 coloring verification has the same bug, so it was never caught in testing. This PR is still a step in the right direction (doesn't introduce bugs) so I will just submit it as-is and work on the next fix on top of these changes.
RIDE spot check:
#######################################################
PASSED TESTS
#######################################################
cuda-10.1.105-Cuda_OpenMP-release build_time=619 run_time=416
cuda-10.1.105-Cuda_Serial-release build_time=498 run_time=518
cuda-9.2.88-Cuda_OpenMP-release build_time=550 run_time=418
cuda-9.2.88-Cuda_Serial-release build_time=521 run_time=520
gcc-6.4.0-OpenMP_Serial-release build_time=215 run_time=393
gcc-7.2.0-OpenMP-release build_time=122 run_time=128
gcc-7.2.0-OpenMP_Serial-release build_time=194 run_time=360
gcc-7.2.0-Serial-release build_time=120 run_time=226
ibm-16.1.0-Serial-release build_time=490 run_time=392
Bowman spot check:
#######################################################
PASSED TESTS
#######################################################
intel-16.4.258-Pthread-release build_time=768 run_time=1033
intel-16.4.258-Pthread_Serial-release build_time=1112 run_time=1910
intel-16.4.258-Serial-release build_time=723 run_time=820
intel-17.2.174-OpenMP-release build_time=958 run_time=560
intel-17.2.174-OpenMP_Serial-release build_time=1364 run_time=1456
intel-17.2.174-Pthread-release build_time=870 run_time=896
intel-17.2.174-Pthread_Serial-release build_time=1298 run_time=1781
intel-17.2.174-Serial-release build_time=827 run_time=883