Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python script to load distributed snapshots and Cosmology Dark Matter Only 64^3 test #383

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from

Conversation

bvillasen
Copy link
Collaborator

Added tools to load a distributed snapshot without merging the snapshot files into a single snapshot. This is especially useful when dealing with large snapshots and duplicating the memory footprint becomes very inconvenient.
The script also has the option to load only a sub-volume of the grid, where only the set of files covering the sub-volume is loaded; this is useful, for example, when plotting a slice of a very large grid that wouldn't fit in memory if the full volume is loaded.
An example and some description here: python_scripts/load_cholla_snapshot_distributed.py

Added a test for a cosmological dark matter only run (only particles) for 64^3 particles on a single GPU.
To run, go to tests/cosmology/dark_matter_only/64_N1
and run:
sh run_test.sh

The script will:

  1. compile Cholla with the correct configuration for the dark matter only run
  2. download the initial conditions at redshift 100
  3. run the simulation to redshift 0
  4. download the reference result (redshift = 0)
  5. run a Python script to compare dark matter density from the output of the simulation and the reference snapshot
  6. the script returns 0 if the test passes and 1 if it fails.

@evaneschneider, let me know if this format for testing is OK, and I can add other tests for the Hydrodynamics cosmological simulation (adiabatic and with H+He chemistry).

@bvillasen
Copy link
Collaborator Author

bvillasen commented Apr 15, 2024

I had a conversation with @bcaddy about closing this PR since there might not be value in it, which I don't agree with, but I wanted to hear other opinions from the team.

Besides the test, I think some tools could be useful for others. I have added scripts to load snapshots without having to write the snapshot into a single file, which is a big problem when dealing with large simulations. the script also has the option to load a sub-volume without loading the entire grid. From what I see, the concatenation scripts only paste all the snapshot parts into a single file. Is that right? @mabruzzo

The test script offers a simple way to add tests that can be more flexible than the current testing system, which seem to have some limitations that, according to Bob, would require a big change to the code to accommodate.

If @evaneschneider and others don't think it's appropriate to merge this PR, then I'll close it, but I wanted to hear more voices before

@bcaddy
Copy link
Collaborator

bcaddy commented Apr 15, 2024

I haven't completed my code review yet but to add some clarification:

I think being able to load sub-volumes in a performant way is great. I know that's something that @mabruzzo and @evaneschneider have been working on/thinking about so looping you in seemed like a good idea. I'm not sure what the best method is but integrating with the current concatenation scripts seems like a more maintainable option to me than adding new scripts to do a similar thing.

The testing scripts in this PR only new feature is to provide the potential ability to run tests with non-default builds. Currently they can only do that with a single, hard coded, build and would require significant reworking to the scripts, testing framework, and CI pipelines to generalize and integrate them. They also rely on unversioned data hosted on dropbox instead of versioned data in our testing data repo. IMO instead, if we want to test a specific build with the current system, we can just add it to the list of defaults with a new build type file for it.

The floating point comparison method in our current system is also more robust, stable, and accurate and has multiple different comparison methods depending on what works best for a specific test. Also, these scripts duplicate much of the code in run_tests.sh and the SystemTestRunner class and make our testing scheme less clear for newcomers by adding additional code that may not be used.

The build files for Lockhart are great, always happy to see native support for more machines.

@evaneschneider
Copy link
Collaborator

Thank you both for working on this! I really appreciate the effort on both your parts.

To respond to Bruno's questions - I think there is significant value in this PR (after all, I am the one who asked for a system test of the cosmology build, and this one seems like it will work great!), and I asked @bcaddy to take a look at it to figure out what exactly we need to change to integrate it with our current testing system. I do think this test will be more effective in the long run if we can incorporate it into our automated build / test system, and the work that remains to be done in this PR is sorting out how to do that.

My understanding is that adding the build would be simple (we just have to add it to the matrix), and the main thing that needs to be changed is to put the initial conditions and reference data into our GitHub tests subdirectory, and add the cosmology system test to the list of automated tests. I have asked Bob to help with that, since he set up the testing system and as far as I know this would be the first test that requires reading in the initial conditions, so there is probably not a bread-and-butter example to follow. If I am off base about how difficult it will be to incorporate this kind of test into our existing system, then we need to take a look at accommodating testing some other way, perhaps through a script like the one included in this PR, because we do need that kind of flexibility.

I am also fine with the additional python scripts, since adding those is consistent with our past practice. That said, I think there is significantly more likelihood that others will use them if the wiki is updated to document them.

If it would be helpful to set up a Zoom call to discuss strategy once Bob has finished his code review, I am happy to do that. Again, I very much appreciate the effort on everyone's part.

@bvillasen
Copy link
Collaborator Author

I'm always happy to join a call to talk about Cholla ;)
I'm normally free at 8 am Pacific time (except for Thursdays), some weekdays in the afternoon would also work in case 8am PT doesn't.
Happy to talk about this further.

@bcaddy
Copy link
Collaborator

bcaddy commented Apr 15, 2024

The test itself is a separate topic that will be in an upcoming PR. Bruno is working on one that works with the default cosmology build; I think that's a better choice since it will presumably also test the coupling of the particles to the fluid rather than just the particles.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was added to help with the testing scripts. I don't see a clear use case for it without those that isn't met by the existing testing tools or h5diff. Unless you have a strong case for this I would like it to be removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we currently have a single file python script that can be used to compare snapshots like this? If so I am not aware of it, and this seems easy to use. I see no downside to adding it to this directory, although I agree it would be nice if it had some documentation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm not terribly opposed to it. I don't see an immediate use case but we can always change/remove it later if needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like its primary purpose is to be documentation for the contents of io_tools.py. I'd prefer to see that documentation with the code that it's documenting in io_tools.py and in the wiki and this file removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. To me it looks like a thin wrapper around load_cholla_snapshot_distributed with all the choices hard coded and it doesn't actually do anything besides call load_cholla_snapshot_distributed and provide some documentation for the contents of io_tools.py.

If it had all the choices as CLI arguments and was importable into another script that might be different, but then it would just be a second, identical, interface for load_cholla_snapshot_distributed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but some of us LIKE have example files to work off of versus having everything be a CLI choice with documentation elsewhere. In a sense, this script provides the usage documentation that you are requesting, it's just a different style. I think it good for us to accommodate different styles, so I'd like this to stay.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, including it is fine with me then.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be documented in the wiki, ideally along with some documentation in the code as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - I think documentation will make this much more useful, but right now I do not think it is duplicating functionality that we have...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded by PR #390, please remove.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded by PR #390, please remove.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded by PR #390, please remove.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded by PR #390, please remove.

@bvillasen
Copy link
Collaborator Author

Closing this for now. I guess I'll resubmit at some point with some documentation... we'll see how that goes.

@bvillasen bvillasen closed this Apr 16, 2024
@evaneschneider
Copy link
Collaborator

Apologies that I was not able to get to this immediately after Bob's code review, but I think there is still a lot of value in this PR, even if not everything is documented yet. In particular, right now there is no simple way in the public repo to load distributed data and compare it, something that the python scripts in this PR would allow. They look easy to use, and it was my intention to keep the bar for adding scripts to this directory low, so that people can share tools quickly even if they are not as polished as code entering the main source directory. I think if we summarily dismiss PRs like this, it will only lead to people sharing scripts as files over email/slack instead, which I do not think is preferable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants