Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug fix to have save point weight file be different name #1357

Conversation

JessicaMeixner-NOAA
Copy link
Collaborator

Pull Request Summary

A bug fix for #1350

Description

On some machines, for the unstructured grid cases such as:
./bin/run_cmake_test -b slurm -o all -S -T -s MPI -s PDLIB -w work_pdlib -g pdlib -f -p srun -n 24 ../model ww3_tp2.6
processor 1 was so much faster than other processors, that the NetCDF file writting out the point output existed for some processors, but not all. This was causing the model to then hang. We did not see this on every machine.

To fix this issue, I have renamed the output file to a different file name. On hercules with intel, this fixed the issue. Additional testing to ensure this fixes everyones issue is needed.

Issue(s) addressed

Commit Message

bug fix to have save point weight file be different name

Check list

Testing

  • How were these changes tested?

Currently have just run one test on hercules with intel, additional testing to follow.

  • Are the changes covered by regression tests? (If not, why? Do new tests need to be added?)
  • Have the matrix regression tests been run (if yes, please note HPC and compiler)?
  • Please indicate the expected changes in the regression test output, (Note the list of known non-identical tests.)
  • Please provide the summary output of matrix.comp (matrix.Diff.txt, matrixCompFull.txt and matrixCompSummary.txt):

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@thesser1 - Can you try this bugfix on your machine?

I should have more info and test results on my end in tomorrow.

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@thesser1 - I was incorrect about this bug-fix. It worked once, but didn't after that. I'm closing this PR, I don't think it's worth trying. I'll keep you posted.

@thesser1
Copy link
Collaborator

thesser1 commented Jan 27, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ww3_tp2.6 regression test hanging
2 participants