Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safety and simplification updates for devclean.sh #12

Conversation

mkavulich
Copy link

@mkavulich mkavulich commented Jul 12, 2024

These are the changes I am proposing for devclean.sh to make the script safe to use. The new behavior is:

  • Will only delete the build artifacts at their default locations. If users built in custom paths they need to delete them manually
  • Before deleting anything, puts the proposed deletions in a list and prompts user to confirm the deletion of the directories on screen
    • Does include a --force option for the bold, but that will be on them.
  • User should specify what they want to be deleted with optional flags; if no flags are specified then nothing gets deleted.

Copy link

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @mkavulich, for showing the nature of the changes that you were hoping to see with issue #1073.

I agree with the removal of bin_dir, build_dir, and conda_dir and only using the default naming for CMake build artifacts. I don't see the reason to add an additional check for users to remove these. The purpose of the script is to remove directories. So long as the default artifacts are the only entries being removed, that should be sufficient. If a user didn't want to do it, then it will only set them back thirty minutes to rebuild.

Ultimately, I don't see the reason to add these additional checks before removing the directories. Removing the build artifacts won't set a user back, outside of the time to rebuild. If the user doesn't want to delete the build directories, it is on them not to run a script that is designed to remove directories.

@mkavulich
Copy link
Author

@MichaelLueken If users want to be bold and skip the check, they can include the new --force flag. But if the consensus is for the default to be removal, then I could convert this to a --dry-run flag, to allow users to see which files/directories would be deleted when executed if they want to be cautious.

@@ -37,19 +34,10 @@ settings () {
cat << EOF_SETTINGS
Settings:

SRW_DIR=${SRW_DIR}
BUILD_DIR=${BUILD_DIR}
BIN_DIR=${BIN_DIR}
Copy link
Owner

@natalie-perlin natalie-perlin Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have BIN_DIR option to have set up by the user. This is required for working with a singularity container, when the binaries are built in a non-standard location. A standard location is used to have links to a container wrapper script.

Copy link
Owner

@natalie-perlin natalie-perlin Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(UPDATED/removed my comment about SRW_DIR, as it is set earlier in the script)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what context would a user be cleaning/rebuilding in an existing Singularity container? This feels like a very niche use case that only an advanced user would be using.

If absolutely necessary, I could include a --singularity flag that would delete the location that is used in a singularity, but I don't understand the reason why this needs to be in a different directory, and why instructions can't just be to "remove this directory for a clean build". Also, this use case is also not covered in the users guide or documentation that I can find, so users wouldn't know how to do this anyway!

Again, I'll emphasize my arguments against including custom directories to feed into this script to be deleted: If a user wants to delete a directory in a custom location, they already need to know that location exists and must be deleted. In this case, it is extra work for them to provide flags to this clean script rather than directly deleting that directory that they know they need to remove! So including these custom directory flags is 1. extra work for the user, and 2. extra work for developers

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich @MichaelLueken
The current way the software container was set to work was to launch a wrapper script in place of the executable that starts singularity container first. That requires binary directory to be different from a default, because the default directory, i.e., where the model expects to find executables, needs to contain a wrapper script for the executables that starts singularity container first. Sharing a presentation that is viewable for all NOAA accounts, and also for some additional accounts:
https://docs.google.com/presentation/d/1yMZG5b4FHTvzf4cwNSAjmbye9-LybWMoukLONw9eR_E/edit?usp=sharing

Slides 40,42 demonstrate the way this is approached.
The way the links in a standard binary location (./exec/) are created is a bit tricky and requires doing very precise steps. If model cleaning is required, these links created in ./exec/ directory need to remain; BIN_DIR with the actual binaries needs to be cleaned.

As for now, singularity container is the only way we offer current SRW use for the public at this point and conduct trainings. Using spack-stack to build the libraries for the SRW is neither very accessible for the public nor could be easily built on users machines.

devclean.sh Outdated
--sub-modules
remove sub-module directories. They need to be checked out again by sourcing "\${SRW_DIR}/manage_externals/checkout_externals" before attempting subsequent builds
--force
removes files and directories without confirmation. Use with caution!
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to list specifically which directories are removed

@natalie-perlin
Copy link
Owner

@mkavulich -
With the changes you suggest, an option "-a" or "--all" now removes everything including conda and submodules.
There is no option to remove the the artifacts except for conda and submodules. I'm not quite clear what does option "--force" do - does it removes everything, including conda and submodules, without asking for a confirmation?

@natalie-perlin
Copy link
Owner

natalie-perlin commented Jul 15, 2024

(UPDATED)
@mkavulich @MichaelLueken - While I agree with some of the suggestions, others do not work so well, such as removal of BIN_DIR, and not explicitly checking for the directory existence before removing it (which was one of the concerns of ufs-community#1073 as well).

If you do not mind, let me try to include some of the changes your suggested. Feel free to continue with this PR#12 as well!

@mkavulich
Copy link
Author

@natalie-perlin @MichaelLueken I have made some changes that will hopefully help us converge. The script now has four basic use cases:

  1. No arguments provided. This will print the usage/help and exit
  2. --build flag; this will remove the build artifacts
  3. --conda flag; this will remove the conda directory and location file
  4. --sub-modules flag; this will remove the submodules

Additionally, there is an --all flag that is the same as including all of the above flags.

@mkavulich - With the changes you suggest, an option "-a" or "--all" now removes everything including conda and submodules. There is no option to remove the the artifacts except for conda and submodules.

I tried to make the argument list simpler with this new script. There were many confusing/unintuitive options in the old script:

  • There was a "remove" flag, that only deleted the build directory. This is confusing, since all other options "remove" more things.
  • Previously the "all" option only removed some things (the build directory and bin/lib/inc/share), which is very confusing!
  • There was also a "--clean" option, which did exactly the same thing as "all".

I'm not quite clear what does option "--force" do - does it removes everything, including conda and submodules, without asking for a confirmation?

My proposed changes include a confirmation step (the user must type "Yes" after the list of to-be-deleted files/directories is printed). The --force option is included in case the user wants to skip the confirmation step to delete everything. This is standard UNIX verbiage for overriding a safety check, and much safer than deleting things by default, as the previous script did.

This is an example what the output now looks like:

$ ./devclean.sh -a
The following files/directories will be deleted:

/scratch1/BMC/hmtb/kavulich/UFS/workdir/test_develop/2024-07-13/intel/ufs-srweather-app/build
/scratch1/BMC/hmtb/kavulich/UFS/workdir/test_develop/2024-07-13/intel/ufs-srweather-app/exec
/scratch1/BMC/hmtb/kavulich/UFS/workdir/test_develop/2024-07-13/intel/ufs-srweather-app/share
/scratch1/BMC/hmtb/kavulich/UFS/workdir/test_develop/2024-07-13/intel/ufs-srweather-app/include
/scratch1/BMC/hmtb/kavulich/UFS/workdir/test_develop/2024-07-13/intel/ufs-srweather-app/lib
/scratch1/BMC/hmtb/kavulich/UFS/workdir/test_develop/2024-07-13/intel/ufs-srweather-app/lib64
./sorc/AQM-utils
./sorc/arl_nexus
./sorc/UFS_UTILS
./sorc/ufs-weather-model
./sorc/UPP
/scratch1/BMC/hmtb/kavulich/UFS/workdir/test_develop/2024-07-13/intel/ufs-srweather-app/conda_loc
/scratch1/BMC/hmtb/kavulich/UFS/workdir/test_develop/2024-07-13/intel/ufs-srweather-app/conda

Confirm that you want to delete these files/directories! (Yes/No): nope
User chose not to delete, exiting...

@MichaelLueken Given that the rebuilding penalty can be very substantial on some platforms (nearly 2 hours on jet, for example), I think it is prudent to include a confirmation by default. If there are contexts where interactive input like this is not desired or possible (for example, testing scripts) they can always use the --force option.

not explicitly checking for the directory existence before removing it (which was one of the concerns of ufs-community#1073 as well).

This seems to be something that you both have misunderstood from my original issue, sorry if this was unclear. My concern is about the existence of variables, not files/paths. If the variables passed to rm do not exist/are not set, then an rm -rf command can be executed on an unpredictable input. However, there is no harm done attempting to delete a non-existent file/directory; it will just be a no-op.

@natalie-perlin
Copy link
Owner

@mkavulich @MichaelLueken

@natalie-perlin @MichaelLueken I have made some changes that will hopefully help us converge. The script now has four basic use cases:

  1. No arguments provided. This will print the usage/help and exit
  2. --build flag; this will remove the build artifacts
  3. --conda flag; this will remove the conda directory and location file
  4. --sub-modules flag; this will remove the submodules

Additionally, there is an --all flag that is the same as including all of the above flags.

@MichaelLueken Given that the rebuilding penalty can be very substantial on some platforms (nearly 2 hours on jet, for example), I think it is prudent to include a confirmation by default. If there are contexts where interactive input like this is not desired or possible (for example, testing scripts) they can always use the --force option.

Thank you for the updates and clarifications!

@mkavulich @MichaelLueken - suggesting not to remove the sub-modules when the flag "--all" is given. Conda is installed during the build phase, and could be OK to remove when we need to recompile/rebuild the SRW. Sub-modules/checked out repositories, however, are not product of a build. The flag "--sub-modules" exists as an addition to cleaning the SRW builds.

Could this change of keeping submodules unless explicitly provided "--sub-modules" option be made (with more details on each option in help session what directories are removed)?
Also, as noted #12 (comment) , the need to keep the option to specify BIN_DIR by users for our SRW trainings, when SRW is built and run using containerized approach.

not explicitly checking for the directory existence before removing it (which was one of the concerns of ufs-community#1073 as well).

This seems to be something that you both have misunderstood from my original issue, sorry if this was unclear. My concern is about the existence of variables, not files/paths. If the variables passed to rm do not exist/are not set, then an rm -rf command can be executed on an unpredictable input. However, there is no harm done attempting to delete a non-existent file/directory; it will just be a no-op.

@mkavulich - Oh, OK, thank you for clarifying the confusion! The variables in the original devbuild.sh script were all initialized and had their default values if not specified explicitly by the user.

@mkavulich
Copy link
Author

@natalie-perlin I added a "--container" flag that will delete the "container-bin" instead of "exec" as the binary directory for use with containers. It wasn't clear to me if "exec" should also be deleted here, let me know if that is needed as well.

@natalie-perlin
Copy link
Owner

@natalie-perlin I added a "--container" flag that will delete the "container-bin" instead of "exec" as the binary directory for use with containers. It wasn't clear to me if "exec" should also be deleted here, let me know if that is needed as well.

Thank you, Michael, for a great suggestion. This may indeed solve it, if the users use container-bin when building SRW binaries. It is not required to be named this way (could be bin or container-exec), but our further directions could suggest using that directory name for a containerized SRW build. When the option --all is used, both container-bin and exec are better be cleaned.

 - Format, remove old comment
 - Only remove conda if the location in conda_loc is the same as the
default conda install
 - Default case (no flags provided) now does nothing except print usage
and exit
 - Include "--build" flag to remove build artifacts
 - Include short aliases for all the removal options
@mkavulich mkavulich force-pushed the scripts_plots_updates branch from c4aa3e0 to 8b7a79a Compare August 16, 2024 13:16
Updates and simplifications of the devclean.sh script
Copy link
Owner

@natalie-perlin natalie-perlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only update was to alphabetize the list of OPTIONS.
Thank you for your the thoughtful edits!

@natalie-perlin natalie-perlin merged commit 9459da5 into natalie-perlin:scripts_plots_updates Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants