Enable MPI to run on arbitrary number of baselines #16

gcapes · 2024-04-18T13:54:16Z

The code should now run even when the number of baselines doesn't match the number of ranks.

This way we're not having some ranks sitting idle.

jburba · 2024-04-23T09:39:28Z

I think this all looks okay. The only change I don't know if we want to introduce with the current implementation is printing the output on all ranks. I think it might get visually messy if all ranks are printing simultaneously to stdout?

It would be useful to have the verbose outputs on different ranks in the future, though. I think we could maybe do this via:

Log files for each baseline that get written / appended to in the output directory for each rank
Adding a column to the verbose output with the baseline antenna pair so we know which numbers correspond to what baseline/rank

What do you think?

gcapes · 2024-04-23T10:04:00Z

Agreed - it is messy, but was useful to check progress.
Perhaps in the short term it would best if I revert that particular change, and add the log file idea to https://github.com/UoMResearchIT/hydra-mpi-issues/issues/21?

I guess it depends what the verbose output is useful for? For me it was to check the code was running, so all I need really is rank/iteration/time stamp to know something is still happening for each baseline. How would you typically use the verbose output?

jburba · 2024-04-23T10:13:47Z

Adding it as an issue https://github.com/UoMResearchIT/hydra-mpi-issues/issues/21 sounds like a good idea. The verbose output is very useful for diagnosing how the code is running, so we'll definitely want to keep that information somewhere. It provides real time information about the execution time and performance of the linear system solve step. For example, if the chi-squared value is large, that tells us that the model is a bad approximation for the data and is a good indicator that we should change the model. Or, if the info column is non-zero, there was a numerical error with the linear system solve step.

I guess in the meantime, we can just revert that commit?

gcapes · 2024-04-23T10:21:24Z

Ok great. I'll revert that commit and we'll just output on rank 0 for now, and will add the thoughts to the other issue so they're captured and can be tackled together as a unit of work.

gcapes added 5 commits April 18, 2024 10:03

Remove unused imports

2e21b0a

Update project dependencies

394e711

Ignore results, test-data and pycharm directories

0d1a59a

Split baselines over arbitrary number of ranks

1b3d78c

Fix UoMResearchIT/hydra-mpi-issues#11

Abort if number of ranks > number of baselines

08deca7

This way we're not having some ranks sitting idle.

gcapes requested a review from jburba April 18, 2024 13:57

Give number of ranks/baselines in error message

db481c7

gcapes force-pushed the arbitrary_n_baselines branch from ce293b3 to db481c7 Compare April 23, 2024 10:27

gcapes merged commit 3ef322d into main Apr 25, 2024

gcapes deleted the arbitrary_n_baselines branch May 22, 2024 08:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable MPI to run on arbitrary number of baselines #16

Enable MPI to run on arbitrary number of baselines #16

gcapes commented Apr 18, 2024

jburba commented Apr 23, 2024

gcapes commented Apr 23, 2024

jburba commented Apr 23, 2024 •

edited

Loading

gcapes commented Apr 23, 2024

Enable MPI to run on arbitrary number of baselines #16

Enable MPI to run on arbitrary number of baselines #16

Conversation

gcapes commented Apr 18, 2024

jburba commented Apr 23, 2024

gcapes commented Apr 23, 2024

jburba commented Apr 23, 2024 • edited Loading

gcapes commented Apr 23, 2024

jburba commented Apr 23, 2024 •

edited

Loading