-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallelize hot spots #2438
Comments
The Grid::GetRange() method would be a good "first issue" for parallelization: it is simple, can be expensive with larger data, and is invoked every time a new variable/time-step is accessed. A good example showing how to use OpenMP with Grid Iterators is the test_get_value() function inside of vapor/test_apps/./datamgr/test_datamgr.cpp vapor/test_apps/grid_iter/test_grid_iter.cpp might serve as a starting point for a unit test |
@clyne John, I feel parallelization should not be a priority right now, and the reason is that existing codebase has too many issues. I'm afraid that once we move on to parallelization, both of the following scenarios will come true:
I'm particularly concerned about the existing memory issues. To make it clear, even serial code that has no memory issues would likely need some taking care of when it is parallelized. One thing you definitely don't want to have before moving to parallelization is that the serial code is already showing many memory issues. To explain my point, I opened VAPOR, loaded a CF dataset, closed the dataset, and closed VAPOR. This process is monitored by Valgrind and it reports the following issues:
On the one hand, a lot of these reported errors are from QT and Python, so only a fraction of these numbers are caused by VAPOR. On the other hand, my test only involves open and close a dataset; there isn't even a real rendering job performed, so the actual number of issues is likely much bigger. It is true that many memory issues won't stop a program from being parallelized, but the sheer number of detected issues is very worrisome. To provide a reference here and not to brag or anything, my compression code (much less complex than VAPOR, but still fairly complex) has zero memory error, so it is something very doable. So my suggestion is to address the code quality first (eliminate much of memory errors at least, possibly take other measures) before we deploy parallelization. This approach will save overall development time by a huge amount, compared to an ad hoc bug busting approach. |
Fair enough. Then I'd like to task you with taking a memory sanitation pass for 3.6. You might want to chat with @StasJ. At one point he had some scripts to remove the Qt memory errors from the valgrind report. |
I can take on this task, and I don't think a script is needed, since Valgrind provides detailed report on where the memory error is from. I'm thinking of extracting individual memory errors from the Valgrind report which contains the nature of the error and where they are, and then assign each error to individuals who're responsible for that file. Does this sound like a good approach to you @clyne ? |
Well, I'll have >5000 lines of Valgrind report to analyze (just from opening and closing VAPOR) so that I can achieve pinpointing individual errors. I think I'll provide assistance but it'll be up to the real owner to actually fix the error. P.S. It's a good learning process for a programmer to look back their code and realize that some of the old habits are error-prone. Update: this issue is dependent on issue #2870 |
Fixed by #3091 |
We've held off on parallelizing much of vapor because Apple has not adopted support for OpenMP in their version of the clang compiler. It's getting pretty appear that apple may never support OMP. However, the generic version of clang can easily be installed on the Mac via homebrew or macports. We should explore using these compilers and parallelizing many of the hot spots in the code. Preliminary investigation indicates that the Grid class iterators are thread safe and can easily be used in parallel.
The text was updated successfully, but these errors were encountered: