Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable MPI to run on arbitrary number of baselines #16

Merged
merged 6 commits into from
Apr 25, 2024

Conversation

gcapes
Copy link
Collaborator

@gcapes gcapes commented Apr 18, 2024

The code should now run even when the number of baselines doesn't match the number of ranks.

@gcapes gcapes requested a review from jburba April 18, 2024 13:57
@jburba
Copy link
Contributor

jburba commented Apr 23, 2024

I think this all looks okay. The only change I don't know if we want to introduce with the current implementation is printing the output on all ranks. I think it might get visually messy if all ranks are printing simultaneously to stdout?

It would be useful to have the verbose outputs on different ranks in the future, though. I think we could maybe do this via:

  1. Log files for each baseline that get written / appended to in the output directory for each rank
  2. Adding a column to the verbose output with the baseline antenna pair so we know which numbers correspond to what baseline/rank

What do you think?

@gcapes
Copy link
Collaborator Author

gcapes commented Apr 23, 2024

Agreed - it is messy, but was useful to check progress.
Perhaps in the short term it would best if I revert that particular change, and add the log file idea to https://github.com/UoMResearchIT/hydra-mpi-issues/issues/21?

I guess it depends what the verbose output is useful for? For me it was to check the code was running, so all I need really is rank/iteration/time stamp to know something is still happening for each baseline. How would you typically use the verbose output?

@jburba
Copy link
Contributor

jburba commented Apr 23, 2024

Adding it as an issue https://github.com/UoMResearchIT/hydra-mpi-issues/issues/21 sounds like a good idea. The verbose output is very useful for diagnosing how the code is running, so we'll definitely want to keep that information somewhere. It provides real time information about the execution time and performance of the linear system solve step. For example, if the chi-squared value is large, that tells us that the model is a bad approximation for the data and is a good indicator that we should change the model. Or, if the info column is non-zero, there was a numerical error with the linear system solve step.

I guess in the meantime, we can just revert that commit?

@gcapes
Copy link
Collaborator Author

gcapes commented Apr 23, 2024

Ok great. I'll revert that commit and we'll just output on rank 0 for now, and will add the thoughts to the other issue so they're captured and can be tackled together as a unit of work.

@gcapes gcapes force-pushed the arbitrary_n_baselines branch from ce293b3 to db481c7 Compare April 23, 2024 10:27
@gcapes gcapes merged commit 3ef322d into main Apr 25, 2024
@gcapes gcapes deleted the arbitrary_n_baselines branch May 22, 2024 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants