Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallelize hot spots #2438

Closed
clyne opened this issue Nov 11, 2020 · 7 comments
Closed

parallelize hot spots #2438

clyne opened this issue Nov 11, 2020 · 7 comments
Assignees
Milestone

Comments

@clyne
Copy link
Collaborator

clyne commented Nov 11, 2020

We've held off on parallelizing much of vapor because Apple has not adopted support for OpenMP in their version of the clang compiler. It's getting pretty appear that apple may never support OMP. However, the generic version of clang can easily be installed on the Mac via homebrew or macports. We should explore using these compilers and parallelizing many of the hot spots in the code. Preliminary investigation indicates that the Grid class iterators are thread safe and can easily be used in parallel.

@clyne
Copy link
Collaborator Author

clyne commented Sep 30, 2021

The Grid::GetRange() method would be a good "first issue" for parallelization: it is simple, can be expensive with larger data, and is invoked every time a new variable/time-step is accessed.

A good example showing how to use OpenMP with Grid Iterators is the test_get_value() function inside of vapor/test_apps/./datamgr/test_datamgr.cpp

vapor/test_apps/grid_iter/test_grid_iter.cpp might serve as a starting point for a unit test

@clyne clyne added the High label Sep 30, 2021
@clyne clyne added this to the 3_6_0 Release milestone Sep 30, 2021
@shaomeng
Copy link
Contributor

shaomeng commented Sep 30, 2021

@clyne John, I feel parallelization should not be a priority right now, and the reason is that existing codebase has too many issues. I'm afraid that once we move on to parallelization, both of the following scenarios will come true:

  1. The majority of the development time will be spent on debugging existing code, and on an ad hoc basis;
  2. There will still be many bugs not found during development and only show up when a user starts to use VAPOR with 64 threads and doing heavy lifting work.

I'm particularly concerned about the existing memory issues. To make it clear, even serial code that has no memory issues would likely need some taking care of when it is parallelized. One thing you definitely don't want to have before moving to parallelization is that the serial code is already showing many memory issues.

To explain my point, I opened VAPOR, loaded a CF dataset, closed the dataset, and closed VAPOR. This process is monitored by Valgrind and it reports the following issues:

  • Conditional jump or move depends on uninitialised value(s) 40 occurrences.
  • Invalid read of size xxx 119 occurrences.
  • Use of uninitialised value of size xxx 30 occurences.
  • invalid file descriptor 6 occurences.
  • definitely lost memory blocks: 222.

On the one hand, a lot of these reported errors are from QT and Python, so only a fraction of these numbers are caused by VAPOR. On the other hand, my test only involves open and close a dataset; there isn't even a real rendering job performed, so the actual number of issues is likely much bigger. It is true that many memory issues won't stop a program from being parallelized, but the sheer number of detected issues is very worrisome.

To provide a reference here and not to brag or anything, my compression code (much less complex than VAPOR, but still fairly complex) has zero memory error, so it is something very doable.

So my suggestion is to address the code quality first (eliminate much of memory errors at least, possibly take other measures) before we deploy parallelization. This approach will save overall development time by a huge amount, compared to an ad hoc bug busting approach.

@clyne
Copy link
Collaborator Author

clyne commented Oct 1, 2021

Fair enough. Then I'd like to task you with taking a memory sanitation pass for 3.6. You might want to chat with @StasJ. At one point he had some scripts to remove the Qt memory errors from the valgrind report.

@shaomeng
Copy link
Contributor

shaomeng commented Oct 1, 2021

I can take on this task, and I don't think a script is needed, since Valgrind provides detailed report on where the memory error is from.

I'm thinking of extracting individual memory errors from the Valgrind report which contains the nature of the error and where they are, and then assign each error to individuals who're responsible for that file. Does this sound like a good approach to you @clyne ?

@clyne
Copy link
Collaborator Author

clyne commented Oct 4, 2021

Great! I'd ask to please try and address as many of these yourself as possible, but feel free to engage the "code owner" if something is not straightforward. @StasJ and @sgpearse already have plenty on their plate for 3.6.

@shaomeng
Copy link
Contributor

shaomeng commented Oct 4, 2021

Well, I'll have >5000 lines of Valgrind report to analyze (just from opening and closing VAPOR) so that I can achieve pinpointing individual errors. I think I'll provide assistance but it'll be up to the real owner to actually fix the error.

P.S. It's a good learning process for a programmer to look back their code and realize that some of the old habits are error-prone.

Update: this issue is dependent on issue #2870

@clyne clyne removed this from the 3_6_0 Release milestone Oct 19, 2021
@clyne clyne modified the milestones: 3_6_0 Release, 3_7_0_Release Mar 17, 2022
@clyne
Copy link
Collaborator Author

clyne commented Jul 27, 2022

Fixed by #3091

@clyne clyne closed this as completed Jul 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants