-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor pipeline to use grain crop dictionaries #1022
base: main
Are you sure you want to change the base?
Conversation
…ti class & subgrains
…mask required bool. Tested in debugger.
…2] >= 2, shape[1]==shape[2]
…rainCrops. Locally debugged working
Proposed solution to the data frame issue
|
Its that or split into separate files. I'm ambivalent as to the preferred solution as I don't use the output but consideration for end users should be given. Whilst data management, manipulation, summarisation and plotting are, in my view, core skills for researchers these days experience levels vary widely and I don't know what would be easiest. |
Are we aiming to include this refactoring in |
Co-authored-by: Neil Shephard <n.shephard@sheffield.ac.uk>
I think this was addressed in #995. Its down to Do you by chance have |
Okay I have ticked all the boxes in the PR guide (that is such a good feature) I believe. I'm absolutely sure I'll have missed big things, but I'll submit for review now. Reviewers, please do just flag stuff as it comes, don't assume I've done something right and waste time trying to comprehend my silly ways of doing things, I don't want to take up more of your time than is absolutely necessary 👍 Also @ns-rse, IIRC you like to go through a PR commit-by-commit. If so, I would strongly advise against this because I developed this PR in a very broken state to begin with, the early commits won't make sense / reflect the end PR. I am very sorry for how awful this PR is 😅 |
… validation during construction using instance property 'padding'
- Spotted a few `print()` statements from debugging. - Explicitly test the number of grains below that are returned. - switching to a dictionary in the parameterisaion of `test_merge_classes()` instead of multiple individual options with comments/labels. The dictionaries are expressive about what the values are since the keys are the configuration options themselves. This in turn means we can just use `**vet_grains_conf` to unpack the dictionary of options when calling the `vet_grains()` function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EPIC amount of excellent work @SylviaWhittle I think making our objects Classes is a really good decision long term and this gives us a good basis on which to build on.
I think you've seen #1092 which made some minor suggestions. I've looked through most things and think I have a general feel for things (many of the changes are names and/or where a functions options have changed removing those options).
Various in-line suggestions have been made.
tests/resources/process_scan_expected_above_height_profiles.pickle
Outdated
Show resolved
Hide resolved
tests/resources/process_scan_expected_below_height_profiles.pickle
Outdated
Show resolved
Hide resolved
tests/_regtest_outputs/test_grainstats_minicircle.test_grainstats_regression.out
Outdated
Show resolved
Hide resolved
tests/_regtest_outputs/test_processing.test_process_scan_above.out
Outdated
Show resolved
Hide resolved
if plotting_config["image_set"] == "all": | ||
grain_out_path_direction.mkdir(parents=True, exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not checked in detail but are we sure the grain_out_path_direction
will exist and this line is no longer needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1,3 +1,3 @@ | |||
,image,grain_number,num_crossings,avg_crossing_confidence,min_crossing_confidence | |||
grain_0,test_image,0,4,0.4013589828832889,0.2129989376767838 | |||
grain_1,test_image,1,4,0.3441057054647598,0.17063184531586506 | |||
grain_0,test_image,0,4,0.21426594097881008,0.001258249874731443 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we happy with these changes? They seem quite large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suspicious of these changes too, however I looked at the traces and the traces are fine, though they slightly different (due to randomness Max and I believe). the differences in the traces are not significant, but they do result in some quite different results as you can see. The crossings in question for that image are very difficult and a slight change in trace does produce a large change in results, however upon manual review, we believe these changes do reflect the slightly altered (but not problematic trace)
so TLDR: yes, caused by a small change in the trace due to randomness, turns out the tracing for that image is actually slightly better now (but that's just luck)
Co-authored-by: Neil Shephard <n.shephard@sheffield.ac.uk>
Co-authored-by: Neil Shephard <n.shephard@sheffield.ac.uk>
This is a draft PR and documentation / tests will be added before full PR is made
This PR is huge (sorry)
Main things
This PR is designed to improve how we handle grains in the processing stages of TopoStats, starting at the grain finding stage, up to the disordered tracing stage. In future, this might be extended through the disordered tracing stage and beyond, however I've restricted the scope of this PR for the sake of everyone's sanity. The reason for stopping at disordered tracing is that once disordered tracing returns, all the data is wrapped up in neatly structured dictionaries, by grain and molecule, similar to what I've implemented, so I deemed this similar enough to not bother changing it yet.
The way this PR tries to standardize how we handle grains, is using DataClasses:
ImageGrainCrops
above
andbelow
, each holding aDirectionGrainCrops
object for that direction's grain cropsGrainCropsDirection
crops
andfull_mask_tensor
crops
stores dictionaries ofGrainCrop
objects ([int, GrainCrop]
)full_mask_tensor
stores a full sized mask for the image, size is NxNxC where C is the number of classes. This is NOT automatically updated when thecrops
property is edited, this is because we don't want to update things during a loop. This can be discussed if this is an incorrect decision!GrainCrop
This has the benefit of standardizing how we handle grains going forward, as we had previously been rather discordant in the types of data structures that we use in various parts of the codebase.
It also adds a helpful (I hope!!) layer of abstraction to processing functions, for example the
run_grainstats
function in processing no longer needs to takeimage
,grain_masks
,pixel_to_nm_scaling
, it now takes justimage_grain_crops
which contains all the data for each crop.This of course does come at the cost of increased memory usage as there are duplication of parts of images in the data structures as well as repeatedly listing the
pixel_to_nm_scaling
factor etc, however I personally find that the benefits here far outweigh the negatives. When working on the harbo-rings project, I found myself naturally extracting all the grains and storing them in a dictionary rather than keeping track of full image masks, I know Max also does this based on how he's handled the tracing code.disordered_tracing.py
prep_arrays
. Prep arrays no longer needed, since it made a dictionary of grain crops, but we already have these now with the refactor.TopoStats Pull Requests
Please provide a descriptive summary of the changes your Pull Request introduces.
The Software Development section of
the Contributing Guidelines may be useful if you are unfamiliar with linting, pre-commit, docstrings and testing.
NB - This header should be replaced with the description but please complete the below checklist or a short
description of why a particular item is not relevant.
Before submitting a Pull Request please check the following.
docs/configuration.md
docs/usage.md
docs/data_dictionary.md
docs/advanced.md
and new pages it should link to.Optional
topostats/default_config.yaml
If adding options to
topostats/default_config.yaml
please ensure.topostats/validation.py
to ensure entries are valid.topostats/entry_point.py
.