-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable MPI to run on arbitrary number of baselines #16
Conversation
This way we're not having some ranks sitting idle.
I think this all looks okay. The only change I don't know if we want to introduce with the current implementation is printing the output on all ranks. I think it might get visually messy if all ranks are printing simultaneously to stdout? It would be useful to have the verbose outputs on different ranks in the future, though. I think we could maybe do this via:
What do you think? |
Agreed - it is messy, but was useful to check progress. I guess it depends what the verbose output is useful for? For me it was to check the code was running, so all I need really is rank/iteration/time stamp to know something is still happening for each baseline. How would you typically use the verbose output? |
Adding it as an issue https://github.com/UoMResearchIT/hydra-mpi-issues/issues/21 sounds like a good idea. The verbose output is very useful for diagnosing how the code is running, so we'll definitely want to keep that information somewhere. It provides real time information about the execution time and performance of the linear system solve step. For example, if the chi-squared value is large, that tells us that the model is a bad approximation for the data and is a good indicator that we should change the model. Or, if the info column is non-zero, there was a numerical error with the linear system solve step. I guess in the meantime, we can just revert that commit? |
Ok great. I'll revert that commit and we'll just output on rank 0 for now, and will add the thoughts to the other issue so they're captured and can be tackled together as a unit of work. |
ce293b3
to
db481c7
Compare
The code should now run even when the number of baselines doesn't match the number of ranks.