Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganizing subset_data.py to follow SE recommendations #1461

Merged
merged 193 commits into from
Feb 18, 2022

Conversation

negin513
Copy link
Contributor

@negin513 negin513 commented Aug 18, 2021

Description of changes

This PR is based on a meeting we had with @billsacks @ekluzek and @slevisconsulting. They suggested to add a top level skeleton code called (subset_data instead of subset_data.py) and move the subset_data.py to CTSM python folder. Next, I created a seperate module file for each class.

Specific notes

Contributors other than yourself, if any: @adrifoster

CTSM Issues Fixed (include github issue #):
Partially addresses #1441
Fixes #1436
Fixes #1437
Fixes #1594
Fixes #1606

Are answers expected to change (and if so in what way)?

Any User Interface Changes (namelist or namelist defaults changes)?

Testing performed, if any: tools testing python testing
I checked the subset_data for all neon sites and the results were identical (bit-for-bit) except for the two crop sites (i.e. KONA and STER sites). The crop sites previously had an issue based on #1606.

Verified

This commit was signed with the committer’s verified signature.

Verified

This commit was signed with the committer’s verified signature.

Verified

This commit was signed with the committer’s verified signature.
Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for moving forward with these initial steps, @negin513 . I have some minor comments below. In addition:

  • It seems like you haven't deleted the original subset_data.py; please delete that file.

Sorry, something went wrong.

@negin513
Copy link
Contributor Author

negin513 commented Sep 7, 2021

Thanks @billsacks for all your comments.

I'd answer all the above comments but I did not mark them as resolved. Please mark them as resolved if you think my answers and my fixes were satisfactory.

Besides your comments, I have some comments on my code and I am listing them here not to forget about them:
These are either based on our discussions or my thoughts:

  • Make a top-level skeleton code in tools/site_and_regional and relevant modules under python/ctsm-- add instructions...
  • Classes should be sorted separately in different module under python/ctsm/site_and_regional.
  • Make sure PEP-8 complaint. Run through black.
  • identify for possible areas for testing
  • Logging update
  • output_to_file in utils.py
  • Adding --input-directory as an option...

Sorry, something went wrong.

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 8, 2021

@negin513 I'd like this PR to fix #1436 if possible. We just talked about it and it sounds reasonably straightforward.

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 8, 2021

This works on #1441 for these specific tools.

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 17, 2021

@negin513 we talked about the need to have subset_data and the modify_data python tools to create a user-mod directory that can then be pointed to for create_newcase, which provides a nice workflow. It also makes the workflow for NEON very similar to the workflow for a generic site which helps. Do you want to do that as part of this PR or make that a future change?

@billsacks
Copy link
Member

Do you want to do that as part of this PR or make that a future change?

I'd suggest doing that separately, so this PR can get merged and @slevisconsulting can build off of it.

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 17, 2021

OK. I think in general for these tools to create a user-mod directory that can then be easily used to create a case is a good thing. I think most of the tools should do that.

So the question is should @slevisconsulting scripts create user-mod directories or not? If yes perhaps you should include it here, but if not then we should do it as a future PR. I think his scripts are just modifying a single surface dataset, so perhaps it shouldn't create a user-mod directory. If they are doing more than that though, perhaps they should.

@negin513
Copy link
Contributor Author

negin513 commented Feb 11, 2022

@negin513 and I had a nice discussion going over this and were able to complete a few of the final things together. And we started a project of pulling out a tricky bit of code to it's own method so that it can be easily unit-tested. @negin513 should be able to complete this next week, and it can come in one of the very next CTSM tags.

Thanks to @ekluzek for testing tricky part of the code (modifying surface dataset based on the user flags combination): we separated this section into a single class method called modify_surfdata_atpoint.
I added 24 unit tests for this specific method (in test_unit_singlept_data_surfdata.py).
For these tests, instead of reading/writing files, we are creating an xarray datasets (similar to the surface dataset files) on the fly with the same data structure and attributes but fill it with random numbers...
Then the tests checks to make sure different variables from this "dummy" dataset is set correctly to the specific value (for example if we expect the outcome of PCT_CROP to be all zeros or one). This way we can make sure that the correct part of the code is triggered and the variables are set accordingly.

Here we are creating two "dummy" xarray dataset: one mimicking 16 pft dataset and the other is for the 78 pft dataset to check the code behavior for both crop and no crop dataset.

In this single PR we added 41 python unit tests which increased the number of CTSM total python unit tests by 80%...

@negin513
Copy link
Contributor Author

negin513 commented Feb 11, 2022

@ekluzek With the addition of these random xarray dataset testing, I think I have addressed all the comments on this PR. Please let me know if there is anything else remaining.


  • pylint output is clean on this code:
--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

  • The 3 python system tests are running without problems.
  • I ran clm_pymod tests and the tests show PASS.
  • All 113 python unit tests passed without any issues.

Thanks again @ekluzek for your help with this PR.

@ekluzek
Copy link
Collaborator

ekluzek commented Feb 17, 2022

I've marked off and made sure all of the check boxes have been addressed. And the query tool shows they are all addressed.

gh-pr-query -p 1461 -t -r ESCOMP/CTSM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code health improving internal code structure to make easier to maintain (sustainability) enhancement new capability or improved behavior of existing capability
Projects
None yet
9 participants