Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NetCDF file close error running MPI with NetCDF forcing #819

Merged
merged 13 commits into from
May 30, 2024

Conversation

stcui007
Copy link
Contributor

@stcui007 stcui007 commented May 10, 2024

This PR provides a fix for running ngen in MPI with NetCDF forcing where a NetCDF file close error is generated at end of run for each MPI process. Fixes #749, #818.

Additions and Changes

  1. DataProvider::finalize() interface for end-of-run cleanup activities that need to be sequenced more specifically than destructor-call order
    1. NetCDFPerFeatureProvider::finalize() implementation of DataProvider interface to close files
    2. Make no-op nature of WrappedDataProvider::finalize() explicit with explanatory comment
    3. CsvPerFeatureForcingProvider: delete unused private members
  2. Move HY_CatchmentRealization::forcing member down to Catchment_Formulation, and remove constructor argument pass-through from HY_CatchmentArea accordingly.
  3. Catchment_Formulation::finalize() to call finalize() on its forcing DataProvider
  4. Formulation_Manager::finalize to call finalize() on all Catchment_Formulation-descendant formulations that it owns
  5. Formulation_Manger call to clean up cache of shared NetCDFPerFeatureProvider instances moved from its destructor to its new finalize()
  6. Main code calls Formulation_Manager::finalize(), to get calls down the stack

Testing

Run ngen with mpirun. Tests show that the fix removes the NetCDF file close error at the end of the mpirun using NetCDF forcing.

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Target Environment support

  • Linux
  • macOS

Copy link
Contributor

@PhilMiller PhilMiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more change for me to make manually, in base DataProvider

src/NGen.cpp Outdated Show resolved Hide resolved
include/realizations/catchment/Formulation_Manager.hpp Outdated Show resolved Hide resolved
include/core/catchment/HY_CatchmentRealization.hpp Outdated Show resolved Hide resolved
include/forcing/GenericDataProvider.hpp Outdated Show resolved Hide resolved
include/forcing/GenericDataProvider.hpp Outdated Show resolved Hide resolved
include/forcing/GenericDataProvider.hpp Outdated Show resolved Hide resolved
include/forcing/NetCDFPerFeatureDataProvider.hpp Outdated Show resolved Hide resolved
src/forcing/NetCDFPerFeatureDataProvider.cpp Outdated Show resolved Hide resolved
@PhilMiller PhilMiller force-pushed the fix_netcdf_close branch 4 times, most recently from 42afc96 to bc27fd3 Compare May 10, 2024 20:37
@PhilMiller
Copy link
Contributor

@hellkite500 @donaldwj I rewrote the code changes, so I probably shouldn't also review it when it's ready.

@PhilMiller
Copy link
Contributor

(@stcui007 I just rebased and force-pushed)

@stcui007
Copy link
Contributor Author

The rebased and revised version is tested to work.

@PhilMiller PhilMiller self-assigned this May 15, 2024
@PhilMiller PhilMiller marked this pull request as draft May 15, 2024 19:34
@PhilMiller PhilMiller requested a review from robertbartel May 15, 2024 19:34
@PhilMiller
Copy link
Contributor

Per discussion with @robertbartel, WrappedDataProvider can probably rely on whatever other object owns the provider instance to which it points to call finalize() on it. Thus, I'm adding a no-op implementation there.

@PhilMiller
Copy link
Contributor

In reviewing this PR, I think the hard thing will not be deciding whether the present additions and changes are correct, but whether there are related things that are missing.

@PhilMiller PhilMiller marked this pull request as ready for review May 15, 2024 22:07
@PhilMiller
Copy link
Contributor

I would try testing on macOS, except my Homebrew-installed NetCDF presents linking errors.

@PhilMiller
Copy link
Contributor

Ok, I didn't try to reproduce the original failure, but I can at least get a clean build and test run on macOS, by brew uninstall netcdf-cxx and using our built-in NetCDF C++ bindings instead.

@robertbartel
Copy link
Contributor

@PhilMiller, I'm curious why you didn't make finalize a pure virtual in DataProvider.

@PhilMiller
Copy link
Contributor

I gave a concrete empty implementation of DataProvider::finalize() because I expect that most implementations of DataProvider will not need to do anything.

Note that the multiple inheritance of class Catchment_Formulation : public Formulation, public HY_CatchmentArea is a bit of a nuisance here - Formulation inherits DataProvider, while HY_CatchmentArea inherits HY_CatchmentRealization, which owns a provider instance that potentially need to be finalized. So, really, the void finalize() override implementations are in a few places ambiguous which one it's actually supposed to cover. I'll see if I can make that explicit, either in syntax, or failing that, in comment.

@PhilMiller
Copy link
Contributor

@stcui007 Could you test this code with a realization config that uses the NetCDF forcings, MPI, and a BMI Multi realization? I think that may still fail, since I don't think the implementation of Bmi_Multi_Formulation::finalize() I wrote will actually get called.

@stcui007
Copy link
Contributor Author

stcui007 commented May 28, 2024 via email

donaldwj
donaldwj previously approved these changes May 28, 2024
@PhilMiller PhilMiller marked this pull request as draft May 28, 2024 15:11
@PhilMiller
Copy link
Contributor

Setting to draft mode until that further testing is complete, so that it doesn't get merged inadvertently.

@stcui007
Copy link
Contributor Author

The code has been tested again with MPI and NetCDF forcing and BMI realization config (all from ngen/data/ directory) and it still runs to completion without error. Prior to this PR, this combination produced a NetCDF file close error.

@PhilMiller PhilMiller marked this pull request as ready for review May 30, 2024 00:10
@PhilMiller PhilMiller requested a review from donaldwj May 30, 2024 00:10
@PhilMiller PhilMiller marked this pull request as draft May 30, 2024 00:30
Copy link
Contributor

@donaldwj donaldwj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I got a review request I assume this is complete. It looks like there is an undocumented [on the PR home page] change where forcings are moved from CatchmentArea to Catchment_Formulation. Could you include this in the changes section and set and remove the draft status.

@PhilMiller PhilMiller marked this pull request as ready for review May 30, 2024 17:12
@PhilMiller PhilMiller requested a review from donaldwj May 30, 2024 17:12
@PhilMiller
Copy link
Contributor

Donald, thanks for reminding me to update the PR description. That's done, and PR is once again 'Ready'

@donaldwj donaldwj merged commit 94ce654 into NOAA-OWP:master May 30, 2024
19 checks passed
@donaldwj
Copy link
Contributor

donaldwj commented May 30, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants