Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Split algorithm added #136

Merged
merged 37 commits into from
Nov 16, 2022
Merged

Conversation

kelcyno
Copy link
Collaborator

@kelcyno kelcyno commented Jun 6, 2022

The addition in this pull request is a post processing algorithm for the tobac tracking. This file contains the merge_split function to address merging/splitting cells in the tobac track, as well as a quick compression function, and a function to standardize the track dataset in order to have all tracking, feature, and segmentation mask datasets in one xarray dataset.
An example of these functions is given below:

d = merge_split(Track)
ds = standardize_track_dataset(Track, refl_mask, data['ProjectionCoordinateSystem'])
both_ds = xr.merge([ds, d],compat ='override')
both_ds = compress_all(both_ds)
both_ds.to_netcdf(os.path.join(savedir,'Track_features_merges.nc'))

This file is a post processing step to the tobac tracking step. This is a first iteration in addressing split/merged cells.
@kelcyno
Copy link
Collaborator Author

kelcyno commented Jun 6, 2022

@freemansw1 Nothing was going right on my end so I just started fresh. Let me know if there's anything wonky.

@JuliaKukulies
Copy link
Member

JuliaKukulies commented Jun 8, 2022

@kelcyno Great thank you, I will also start to have a look your PR now! Is it maybe possible to add tests for your additions (or at least one for the merge_split function)?

Copy link
Member

@JuliaKukulies JuliaKukulies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this PR @kelcyno! I am still figuring out some details in your code, because I am not sure I completely understand what the merge_split() function does at all steps. But overall it looks good and I have tested your functions with some of my data which worked fine (except for the data standardization). Please have a look at my comments and questions below which we can solve together before I approve this!

tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
@freemansw1 freemansw1 added this to the Version 1.5 milestone Jun 10, 2022
@freemansw1 freemansw1 added the enhancement Addition of new features, or improved functionality of existing features label Jun 21, 2022
@freemansw1
Copy link
Member

Sorry for my delay on this, I should be able to review next week after my defense. @kelcyno do you want me to review ASAP or are there changes to be made from @JuliaKukulies's review?

@kelcyno
Copy link
Collaborator Author

kelcyno commented Jun 22, 2022 via email

@freemansw1
Copy link
Member

OK, I'll wait to review this until you have made the changes @kelcyno . When you do, we can re-request reviews from both @JuliaKukulies and myself.

One note, with only having glanced at the PR, if we need to add new requirements for this, we should either make them optional (add a check to see if the package can be imported when trying to use this function; if not throw an error and quit) or add them to the various requirements files.

Updated changes to merge/split based on suggestions for lat/lon convection, and documentation.
@kelcyno
Copy link
Collaborator Author

kelcyno commented Jul 3, 2022

@freemansw1 @JuliaKukulies I updated the merge_split.py to comply with documentation and Julia's previous comments. Looking forward to more input/comments.

@codecov
Copy link

codecov bot commented Jul 3, 2022

Codecov Report

Base: 36.22% // Head: 38.97% // Increases project coverage by +2.74% 🎉

Coverage data is based on head (800fdfe) compared to base (720ef18).
Patch coverage: 85.60% of modified lines in pull request are covered.

Additional details and impacted files
@@              Coverage Diff              @@
##           RC_v1.4.0     #136      +/-   ##
=============================================
+ Coverage      36.22%   38.97%   +2.74%     
=============================================
  Files             10       11       +1     
  Lines           2120     2245     +125     
=============================================
+ Hits             768      875     +107     
- Misses          1352     1370      +18     
Flag Coverage Δ
unittests 38.97% <85.60%> (+2.74%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tobac/utils.py 48.28% <13.33%> (-2.12%) ⬇️
tobac/merge_split.py 95.41% <95.41%> (ø)
tobac/__init__.py 92.30% <100.00%> (+0.30%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@freemansw1
Copy link
Member

Thanks @kelcyno , I'll take a look at this today or tomorrow. Just so you know- the reason your PR is failing checks is that you need to run the black formatter on your new file.

One question before I begin review- do you have documentation handy for this?

kelcyno added 2 commits July 6, 2022 19:10
Formatted using the Black python package.
Allowed optional import of additional packages: geopy, and networkx. Now a message will print if the package is not available to import.
@freemansw1
Copy link
Member

Just noting that I know you are still waiting on a review from me. It's been very busy and I haven't quite had the time to give reviewing this the attention it deserves. I should have time early next week, though. I do have some ideas both on the documentation front and the testing front though, and I'll post those when I post my review.

Copy link
Member

@freemansw1 freemansw1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these really important changes, @kelcyno. I've made a number of comments throughout, but as I stated today, I'm happy to help you work on any of them.

I think the biggest thing for me is documentation. It would be great to have more comments throughout the code and an example or two on how to use it. @fsenf do you think the better place for usage examples are here or in the tutorials repo?

tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
@freemansw1
Copy link
Member

I think given the number of things left to resolve, I would be inclined to push this to v1.5? That would give us documentation and spectral filtering in v1.4. What do you think @JuliaKukulies @w-k-jones @fsenf

@JuliaKukulies
Copy link
Member

That sounds like a good idea if @kelcyno is fine with that as well. Thanks for the additional comments @freemansw1!

Updated to tobac v1.3.2, and made suggested changes. Merge_split.py now ingests a pandas dataframe, ancillary functions have been moved to utils.py.
@freemansw1
Copy link
Member

Thanks for pushing on this @kelcyno! Just trying to get the code done and paper in before I start my new position :). I'll be pushing on #127 here soon with the same goal in mind. I'm also working on a new PR for converting between grids between tracking and segmentation.

This update also includes a fix to an existing error in the merge_split algorithm that incorrectly multiplied the dimension and not values.
@kelcyno
Copy link
Collaborator Author

kelcyno commented Nov 5, 2022

@freemansw1 @JuliaKukulies, I've updated Sean's PR, our numbering for Tracks begins at 0. I updated the dimension names to be consistent, and similarly in the utils file the standardize_track_dataset function. I commented out the assert tests at the bottom of the merge_split function, but can delete them entirely if needed. I rather like keeping them for future use if needed.
What will need some attention quickly is how I've used a default value of None in the merge distance, but internally if there is a 'None' then 25*dxy is used for the distance.

I think it's nearly ready for 1.4 - and all checks have passed (for the first time!). Let me know what you think and if there's changes to be made on the minimum distance value part. Thanks all!

Copy link
Member

@JuliaKukulies JuliaKukulies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kelcyno I have had a look again at your latest changes and the way you handle distance now is fine for me. I do have a few remaining remarks that should be quick fixes and two questions. But otherwise, I am really fine to approve this PR from my side. Have you found the reason why merge_split does not occur in the auto-generated API reference @freemansw1 ?

tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
doc/merge_split_out_vars.csv Outdated Show resolved Hide resolved
This submodule will label merged/split cells with a TRACK number in addition to its CELL number.

Features, cells, and tracks are combined using parent/child nomenclature.
(quick note on terms; “feature” is a detected object at a single time step. “cell” is a series of features linked together over multiple timesteps. "track" may be an individual cell or series of cells which have merged and/or split.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(quick note on terms; “feature” is a detected object at a single time step. “cell” is a series of features linked together over multiple timesteps. "track" may be an individual cell or series of cells which have merged and/or split.)
(quick note on terms; “feature” is a detected object at a single time step (see :doc:`feature_detection_overview`). “cell” is a series of features linked together over multiple timesteps (see :doc:`linking`). "track" may be an individual cell or series of cells which have merged and/or split.)


Features, cells, and tracks are combined using parent/child nomenclature.
(quick note on terms; “feature” is a detected object at a single time step. “cell” is a series of features linked together over multiple timesteps. "track" may be an individual cell or series of cells which have merged and/or split.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on this comment - but I guess we save this for another PR @freemansw1 or what do you think?

tobac/merge_split.py Show resolved Hide resolved
tobac/merge_split.py Outdated Show resolved Hide resolved
tobac/merge_split.py Show resolved Hide resolved
tobac/merge_split.py Show resolved Hide resolved
@freemansw1
Copy link
Member

Good catch on the documentation, @JuliaKukulies! I resolved it by adding merge_split to the tobac.rst file.

@JuliaKukulies
Copy link
Member

Perfect @freemansw1 ! I was looking for a api.rst and missed it :)

@kelcyno
Copy link
Collaborator Author

kelcyno commented Nov 8, 2022

@freemansw1 I'm checking an error Eric found, it's running on real data now, but we need to hold off to merge until we're satisfied it wasn't just human error.

@freemansw1
Copy link
Member

Glad to be finding these now! If it is a bug, once you get it fixed, let's add a test for it to make sure we don't accidentally break it again in the future.

@freemansw1
Copy link
Member

@kelcyno one more general comment... can we rename the merge_split_cells method to something a little more specific that is appropriate to describe the method? As we move forward with tobac, I think it would be good for us to more generally name things specifically and allow for multiple methods to be added to do the various steps, and this is a good place to start.

@kelcyno
Copy link
Collaborator Author

kelcyno commented Nov 8, 2022

@freemansw1 Do you have something in mind? the method I'm using here is the minimum euclidian spanning tree. It's a weighted graph using the Kruskal algorithm. Merge_split_spanningtree or merge_split_MEST might be more specific, but I'm not sure if that's what you had in mind.

@freemansw1
Copy link
Member

@freemansw1 Do you have something in mind? the method I'm using here is the minimum euclidian spanning tree. It's a weighted graph using the Kruskal algorithm. Merge_split_spanningtree or merge_split_MEST might be more specific, but I'm not sure if that's what you had in mind.

I will defer to you, but maybe something just like merge_split_MEST or merge_split_min_span_tree? I'm just trying to keep options open in the future for other merge/split algorithms, especially if we add one that includes storm tracks.

…mbering error which incorrectly numbered tracks for merged cells.
@kelcyno
Copy link
Collaborator Author

kelcyno commented Nov 8, 2022

@freemansw1 I changed the function name to merge_split_MEST, and updated the documentation. The error Eric found is in how the track ids are numbered after merging. I simplified how the IDs are numbered using list.index, and used the same methods for cell_parent_track_id, feature_parent_track_id etc. This passed the tobac merge_split test, and also passes our tests using the TRACER data.

Copy link
Member

@freemansw1 freemansw1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now happy with these changes, once the final comment from @JuliaKukulies is addressed.

@freemansw1
Copy link
Member

Once @JuliaKukulies has a chance to re-review (and approve), we can get this merged and v1.4.0 out. I'm hoping we can do this this week; it would be great to go into the (American) holiday with this finished.

Copy link
Member

@JuliaKukulies JuliaKukulies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also happy with the new changes! Thanks a lot for your hard work on this, @kelcyno and @freemansw1. How great if we can merge this week!

@kelcyno it would be great if you can quickly clarify the code snippet I was wondering about in merge_split.py before @freemansw1 presses the button. I did not really understand what is done here, but it could be that I just misunderstood the code.

@freemansw1
Copy link
Member

With #199 now merged, I will release v1.4.0 when this PR is merged.

@JuliaKukulies
Copy link
Member

Thanks @kelcyno for the explanation, that makes sense! I agree with you that this should be kept as it is then. It looks like we are ready to merge @freemansw1 !

@freemansw1 freemansw1 merged commit 3052b7f into tobac-project:RC_v1.4.0 Nov 16, 2022
@freemansw1
Copy link
Member

Thanks to @kelcyno and @deeplycloudy for their hard work on getting this amazing new capability into tobac. Thanks also to @JuliaKukulies for her reviews and to @snilsn for helping with testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Addition of new features, or improved functionality of existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants