Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix output_root #855

Merged
merged 15 commits into from
Jul 30, 2024
Merged

Fix output_root #855

merged 15 commits into from
Jul 30, 2024

Conversation

stcui007
Copy link
Contributor

@stcui007 stcui007 commented Jul 17, 2024

This PR fixes the problem of directory that output_root pointed to does not exist. If this is the case, the code creates a sub-directory based in the name given by the output_root in the realization. The code defaults to writing the outputs to the project directory if output_root does not exist as in the current realization.

This is an attempt to solve the problem raised in issues #857 and #848, where large number of output files are written. This is a common problem for running ngen on any large basin.

Additions

Removals

Changes

get_output_root() function and unit test.
Two realizations with output_root for demonstration. One realization in its original form, i.e., w/o output_root for demonstration

Testing

  1. Local run ngen in serial and parallel modes, and unit test

Screenshots

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist (automated report can be put here)

Target Environment support

  • Linux

const auto output_root = this->tree.get_optional<std::string>("output_root");
std::string str;
Copy link
Member

@aaraney aaraney Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stcui007, I think this should fix your issue.

Suggested change
std::string str;
std::string str = "./";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the failing tests or something else. I don't want to write to "./" though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly certain the call to stat will fail if you provide it the empty string like you are right now in the case when output_root is not present. I think that is why the Build_and_Run_Model action is failing.

I think you should return "./" if output_root is not present (boost::none) like in the previous code. We can always assume that the directory ngen is run from, ./ , exists.

One solution is:

                if (output_root != boost::none && *output_root != "") {
                    // Check if the path ends with a trailing slash,
                    // otherwise add it.
                    output_root->back() == '/'
                           ? *output_root
                           : *output_root + "/";

                     //use C system function to check if there is a dir match that defined in realization
                    struct stat sb;
                    if (stat(output_root->c_str(), &sb) == 0 && S_ISDIR(sb.st_mode))
                        return *output_root;
                    else {
                        throw std::runtime_error("output_root directory does not exist, please create one matching that in realization");
                    }
                }
                return "./";

Aside, I would think we would want to check that the calculated output_root can be written to by the current user. That is out of the scope of this PR and should be handled elsewhere if deemed necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) It looks like this code would return "./" if output_root directory does not exist.
(2) This code would seem not get a chance to throw based on the outer if, which is not what we want.
(3) This block of code is a bit tricky:

                    output_root->back() == '/'
                           ? *output_root
                           : *output_root + "/";

where does the value land?
(4) The failed test looks like come from this line in Example_model_run.yml:
./ngen data/catchment_data.geojson '' data/nexus_data.geojson '' data/example_realization_config.json
It takes realization file from the master, not from the PR, right?
(5) In the latest realization file, I set output_root to "./", so it automatically exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code handles the output root for ngen, so if there is one in a realization config file under the key, output_root, that one will be used, otherwise, the default is to write to the directory where ngen was called, "./".

I don't think you pushed an updated data/example_realization_config.json file. Looks like you pushed up data/example_bmi_multi_realization_config.json and data/example_bmi_multi_realization_config_w_noah_pet_cfe.json.

@program--
Copy link
Contributor

program-- commented Jul 18, 2024

Based on the comments, I think this PR is partially solving #848 but needs a bit more discussion, since it feels like an XY problem.

There's 2 separate issues here:

(1) output_root fails to write outputs if the specified path is invalid (doesn't exist, isn't writable, etc.) as outlined in #848 -- There are two approaches to solving this I think: either try to create the directory path (i.e. emulate mkdir -p) if possible, or always throw an error and have the user create the directory. This PR currently performs the latter via stat.

(2) If no output_root is specified, potentially lots of files will be outputted into the project directory since output_root defaults to ./, as outlined in #857. I don't think this is actually an issue as the solution is to just specify an output_root in the config file.

I think the question still unanswered for this PR is: do we want output_root to be optional or required? Or rather, if output_root has a non-existent path, should we try to make the directories or should we always throw an error?

@stcui007
Copy link
Contributor Author

stcui007 commented Jul 18, 2024 via email

@aaraney
Copy link
Member

aaraney commented Jul 18, 2024

Great, summary @program--. I share the same feelings.

I think the real question for this PR is: do we want output_root to be optional or required? Or rather, if output_root has a non-existent path, should we try to make the directories or should we always throw an error?

My preference would be that output_root stayed optional. For instance in calibration, we take advantage of the absence of output_root so that the output is written to where ever we call ngen from. Right, of course we could add functionality to account for this, but this is just my personal preference.

Likewise, if possible, emulating mkdir -p is desirable IMO. I would expect that a program would create the output directory I specified if it didn't already exist. For sure a matter of option, so I am going to ping @PhilMiller, @hellkite500, and @donaldwj to get their thoughts as well.

From a practicality stance, we are already including sys/stat.h, so we have access to mkdir but there are likely other issues that I might not be considering.

@stcui007
Copy link
Contributor Author

stcui007 commented Jul 18, 2024 via email

@aaraney
Copy link
Member

aaraney commented Jul 18, 2024

The issue is, if we can still call it an issue, if someone runs a big job and forgot to create the directory or entered wrong format in realization, then when the job finishes, he ends up having a lot of problems. This PR tries to stop that happening at the start.

Yeah, that's a good point. Like you have started to do in this PR, transitioning our examples in data to use output_root I think is a great middle ground. Thoughts?

@aaraney
Copy link
Member

aaraney commented Jul 18, 2024

@stcui007, no worries!

@stcui007
Copy link
Contributor Author

stcui007 commented Jul 18, 2024 via email

@stcui007 stcui007 marked this pull request as ready for review July 22, 2024 17:03
include/realizations/catchment/Formulation_Manager.hpp Outdated Show resolved Hide resolved
include/realizations/catchment/Formulation_Manager.hpp Outdated Show resolved Hide resolved
include/realizations/catchment/Formulation_Manager.hpp Outdated Show resolved Hide resolved
@stcui007
Copy link
Contributor Author

stcui007 commented Jul 23, 2024 via email

@stcui007
Copy link
Contributor Author

stcui007 commented Jul 23, 2024 via email

@stcui007
Copy link
Contributor Author

stcui007 commented Jul 23, 2024 via email

Copy link
Contributor

@program-- program-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaraney If you could do a final review just to see if this satisfies #848 that'd be great! Otherwise, this looks just about done

@stcui007
Copy link
Contributor Author

stcui007 commented Jul 23, 2024 via email

@stcui007
Copy link
Contributor Author

Changed the included header.

@PhilMiller
Copy link
Contributor

The logic looks fine to me.

@program-- program-- dismissed their stale review July 24, 2024 18:57

my comments were resolved

@hellkite500
Copy link
Member

Looks like this adds some reasonable QoL logic to help with output management. Thanks!

@PhilMiller PhilMiller merged commit f15b0b1 into NOAA-OWP:master Jul 30, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants