Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Urban time-varying data doesn't broadcast the mapping algorithm #2533

Open
briandobbins opened this issue May 9, 2024 · 0 comments
Open
Assignees

Comments

@briandobbins
Copy link
Contributor

briandobbins commented May 9, 2024

Brief summary of bug

This is a very simple bug that likely is never encountered in practice, but was hit while investigating an issue scaling CTSM out to large processor counts while using native (ultra high-resolution) grids for the map algorithm for this streams file. The hope was that by switching to a native grid, we could avoid some of the communication patterns in the nearest neighbor mapping, but when testing that we noticed that the 'urbantvmapalgo' namelist variable is read, but not broadcast to other ranks. As a result, other ranks got the default 'nn' method, and the main rank didn't, leading to a hang in communication on the MPI communicator.

General bug information

CTSM version you are using:

ctsm5.2.003

Does this bug cause significantly incorrect results in the model's science?

No, but it causes a hang when changing the map algorithm from the default.

Configurations affected:

Details of bug

Simply put, a namelist variable is read, but not broadcast to other tasks. The result is that those other tasks try the default map algorithm vs the specified one. This is likely immaterial to 99.99% of users, but it's an easy fix, hence the upcoming one-line PR.

Important details of your setup / configuration so we can reproduce the bug

Admittedly I haven't tested too many configurations, but the change is a logical one-line one that has been confirmed to work in at least two cases.

Important output or errors that show the problem

No output, since it leads to a hang as N-1 processors call an MPI_Allreduce (when set to 'nn') and the main rank does not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant