-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge Split algorithm added #136
Conversation
This file is a post processing step to the tobac tracking step. This is a first iteration in addressing split/merged cells.
@freemansw1 Nothing was going right on my end so I just started fresh. Let me know if there's anything wonky. |
@kelcyno Great thank you, I will also start to have a look your PR now! Is it maybe possible to add tests for your additions (or at least one for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this PR @kelcyno! I am still figuring out some details in your code, because I am not sure I completely understand what the merge_split()
function does at all steps. But overall it looks good and I have tested your functions with some of my data which worked fine (except for the data standardization). Please have a look at my comments and questions below which we can solve together before I approve this!
Sorry for my delay on this, I should be able to review next week after my defense. @kelcyno do you want me to review ASAP or are there changes to be made from @JuliaKukulies's review? |
Sean, I still have some edits from Julia to work on. It can wait until after your defense.
|
OK, I'll wait to review this until you have made the changes @kelcyno . When you do, we can re-request reviews from both @JuliaKukulies and myself. One note, with only having glanced at the PR, if we need to add new requirements for this, we should either make them optional (add a check to see if the package can be imported when trying to use this function; if not throw an error and quit) or add them to the various requirements files. |
Updated changes to merge/split based on suggestions for lat/lon convection, and documentation.
@freemansw1 @JuliaKukulies I updated the merge_split.py to comply with documentation and Julia's previous comments. Looking forward to more input/comments. |
Codecov ReportBase: 36.22% // Head: 38.97% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## RC_v1.4.0 #136 +/- ##
=============================================
+ Coverage 36.22% 38.97% +2.74%
=============================================
Files 10 11 +1
Lines 2120 2245 +125
=============================================
+ Hits 768 875 +107
- Misses 1352 1370 +18
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Thanks @kelcyno , I'll take a look at this today or tomorrow. Just so you know- the reason your PR is failing checks is that you need to run the One question before I begin review- do you have documentation handy for this? |
Formatted using the Black python package.
Allowed optional import of additional packages: geopy, and networkx. Now a message will print if the package is not available to import.
Just noting that I know you are still waiting on a review from me. It's been very busy and I haven't quite had the time to give reviewing this the attention it deserves. I should have time early next week, though. I do have some ideas both on the documentation front and the testing front though, and I'll post those when I post my review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these really important changes, @kelcyno. I've made a number of comments throughout, but as I stated today, I'm happy to help you work on any of them.
I think the biggest thing for me is documentation. It would be great to have more comments throughout the code and an example or two on how to use it. @fsenf do you think the better place for usage examples are here or in the tutorials repo?
I think given the number of things left to resolve, I would be inclined to push this to v1.5? That would give us documentation and spectral filtering in v1.4. What do you think @JuliaKukulies @w-k-jones @fsenf |
That sounds like a good idea if @kelcyno is fine with that as well. Thanks for the additional comments @freemansw1! |
Updated to tobac v1.3.2, and made suggested changes. Merge_split.py now ingests a pandas dataframe, ancillary functions have been moved to utils.py.
Added merge/split tests
This update also includes a fix to an existing error in the merge_split algorithm that incorrectly multiplied the dimension and not values.
@freemansw1 @JuliaKukulies, I've updated Sean's PR, our numbering for Tracks begins at 0. I updated the dimension names to be consistent, and similarly in the utils file the standardize_track_dataset function. I commented out the assert tests at the bottom of the merge_split function, but can delete them entirely if needed. I rather like keeping them for future use if needed. I think it's nearly ready for 1.4 - and all checks have passed (for the first time!). Let me know what you think and if there's changes to be made on the minimum distance value part. Thanks all! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kelcyno I have had a look again at your latest changes and the way you handle distance
now is fine for me. I do have a few remaining remarks that should be quick fixes and two questions. But otherwise, I am really fine to approve this PR from my side. Have you found the reason why merge_split
does not occur in the auto-generated API reference @freemansw1 ?
doc/merge_split.rst
Outdated
This submodule will label merged/split cells with a TRACK number in addition to its CELL number. | ||
|
||
Features, cells, and tracks are combined using parent/child nomenclature. | ||
(quick note on terms; “feature” is a detected object at a single time step. “cell” is a series of features linked together over multiple timesteps. "track" may be an individual cell or series of cells which have merged and/or split.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(quick note on terms; “feature” is a detected object at a single time step. “cell” is a series of features linked together over multiple timesteps. "track" may be an individual cell or series of cells which have merged and/or split.) | |
(quick note on terms; “feature” is a detected object at a single time step (see :doc:`feature_detection_overview`). “cell” is a series of features linked together over multiple timesteps (see :doc:`linking`). "track" may be an individual cell or series of cells which have merged and/or split.) |
doc/merge_split.rst
Outdated
|
||
Features, cells, and tracks are combined using parent/child nomenclature. | ||
(quick note on terms; “feature” is a detected object at a single time step. “cell” is a series of features linked together over multiple timesteps. "track" may be an individual cell or series of cells which have merged and/or split.) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on this comment - but I guess we save this for another PR @freemansw1 or what do you think?
Good catch on the documentation, @JuliaKukulies! I resolved it by adding |
Perfect @freemansw1 ! I was looking for a |
@freemansw1 I'm checking an error Eric found, it's running on real data now, but we need to hold off to merge until we're satisfied it wasn't just human error. |
Glad to be finding these now! If it is a bug, once you get it fixed, let's add a test for it to make sure we don't accidentally break it again in the future. |
@kelcyno one more general comment... can we rename the |
@freemansw1 Do you have something in mind? the method I'm using here is the minimum euclidian spanning tree. It's a weighted graph using the Kruskal algorithm. Merge_split_spanningtree or merge_split_MEST might be more specific, but I'm not sure if that's what you had in mind. |
I will defer to you, but maybe something just like |
…mbering error which incorrectly numbered tracks for merged cells.
@freemansw1 I changed the function name to merge_split_MEST, and updated the documentation. The error Eric found is in how the track ids are numbered after merging. I simplified how the IDs are numbered using list.index, and used the same methods for cell_parent_track_id, feature_parent_track_id etc. This passed the tobac merge_split test, and also passes our tests using the TRACER data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now happy with these changes, once the final comment from @JuliaKukulies is addressed.
Once @JuliaKukulies has a chance to re-review (and approve), we can get this merged and v1.4.0 out. I'm hoping we can do this this week; it would be great to go into the (American) holiday with this finished. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also happy with the new changes! Thanks a lot for your hard work on this, @kelcyno and @freemansw1. How great if we can merge this week!
@kelcyno it would be great if you can quickly clarify the code snippet I was wondering about in merge_split.py
before @freemansw1 presses the button. I did not really understand what is done here, but it could be that I just misunderstood the code.
With #199 now merged, I will release v1.4.0 when this PR is merged. |
Thanks @kelcyno for the explanation, that makes sense! I agree with you that this should be kept as it is then. It looks like we are ready to merge @freemansw1 ! |
Thanks to @kelcyno and @deeplycloudy for their hard work on getting this amazing new capability into tobac. Thanks also to @JuliaKukulies for her reviews and to @snilsn for helping with testing. |
The addition in this pull request is a post processing algorithm for the tobac tracking. This file contains the merge_split function to address merging/splitting cells in the tobac track, as well as a quick compression function, and a function to standardize the track dataset in order to have all tracking, feature, and segmentation mask datasets in one xarray dataset.
An example of these functions is given below:
d = merge_split(Track)
ds = standardize_track_dataset(Track, refl_mask, data['ProjectionCoordinateSystem'])
both_ds = xr.merge([ds, d],compat ='override')
both_ds = compress_all(both_ds)
both_ds.to_netcdf(os.path.join(savedir,'Track_features_merges.nc'))