-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preprocessing #8
Conversation
…e flip is redundant.
…excessive values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor adjustments to normalize tomograms and return pixel size from the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a small error in the augmentation script: if "prob_to_one" (i.e. maximum augmentations) is specified, it should apply the flipping with probability 0.5 instead of 1.0. Otherwise the flipping always happens and is redundant.
(Probably I should have added this in another branch, but I hope it's also okay here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all good here for sure!
src/membrain_seg/networks/unet.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another small training adjustment (sorry for placing it here).
This multiplication by the number of elements leads to the correct computation of training loss for logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from the DeePict repository. I'm not sure how to properly credit them. I copied these comments to the top of the file and hope that is okay. Is it?
The script opens the tomogram, normalizes it, extracts the rotationally averaged Fourier spectrum, and stores it in a csv file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the script for matching the pixel sizes.
It first computes the output shape, and then, depending on whether the output pixel size is smaller than the input pixel size, it will choose to perform either Fourier cropping (i.e. crop Fourier components to the shape of the output tomogram) or Fourier extension (Padding the Fourier space with zeros to receive the desired shape).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the interface script for matching the spectrum of two tomograms. It is also adjusted from the DeePict repository.
It first loads the tomogram and the target spectrum (csv) and then matches the input tomogram spectrum to the target spectrum.
I adjusted the script to additionally accept arguments "--shrink_excessive_value" to avoid too large matching vector components and "--almost_zero_cutoff", which cuts off the matching vector at the first Fourier coefficient that is below 0.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from the DeePict repository.
Some utilities to deal with the spectrum vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Utils for the pixel size matching:
- Fourier cropping
- Fourier extension
- ellipsoid mask (maximally fit into tomogram shape with certain border)
- smooth cosine decay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also mainly copied from DeePict, but adjusted in some cases to avoid artifacts.
almost_zero_cutoff_value = np.maximum( | ||
np.minimum(np.min(almost_zeros_input) - 4, np.min(almost_zeros_target) - 4), 0 | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cuts off the FFT values from the first value that is below 0.1 in the input spectrum. The target spectrum will be divided by the input spectrum, so low values in the input spectrum can be problematic.
try: | ||
equal_v[cutoff:] = 0 | ||
except IndexError: | ||
warnings.warn("Flat cutoff is higher than maximum frequency") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adjusted this smoothing also: It is normally used for a sigmoid decay to zero that starts at the "cutoff" value. However, I think it's more intuitive if after cutoff, all values are zeros. So I shifted the sigmoid to smaller values and set all values above cutoff to 0.
|
||
if shrink_excessive_value: | ||
equal_v[equal_v > shrink_excessive_value] = shrink_excessive_value | ||
# Create the equalization kernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another security check to avoid too large factors (by default, it's limited by shrink_excessive_value=50).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @LorenzLamm
First - this is really awesome, I'm glad we have this functionality in here. The PR is too big for comments on everything individually to not be a slow, frustrating experience to iterate on - instead I suggest we merge this and I will provide some overarching comments that can guide improving things as we move forward
-
x/y/z ordering of images is a little strange, could you explain what's going on there? :)
-
matching utils should probably be a subpackage of the preprocessing package, if writing this myself I would probably the following organisation
-
preprocessing
-
- pixel_size_matching
-
-
- _cli.py
-
-
-
- match_pixel_size.py
-
-
- amplitude_spectrum_matching
-
-
- _cli.py
-
-
-
- match_amplitude_spectrum.py
-
Rather than .py scripts which have to be located/added to path/executed, you can install scripts during package installation automatically with the project.scripts
block in the pyproject.toml
file - here is a PR to a different project where I template/discuss this for someone else bbarad/ETSegTools#1
Discussed in the PR above, I really like using Typer
for turning a simple type annotated function into a script which can be executed from the command line - worth trying!
In general it would be great to have more explicit function names e.g. radial_average_3d
rather than rad_avg
Does the spectrum matching take a while because of the large fft it computes or is it no big deal? If so we might consider calculating the sum of spectra over a number of smaller 3D patches to do the estimation - this can also be used to increase signal by taking overlapping patches
def load_tomogram(filename, return_header=False, normalize_data=False): | ||
def load_tomogram( | ||
filename, return_pixel_size=False, return_header=False, normalize_data=False | ||
): | ||
""" | ||
Loads data and transposes s.t. we have data in the form x,y,z. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loads data and transposes s.t. we have data in the form x,y,z. | |
Loads data and transposes s.t. we have data in the form x,y,z. |
I hadn't noticed this before - could you explain why the transpose is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mrcfile package loads the tomograms by default in format (z, x, y) if I remember correctly. It's not really necessary to transpose the axes. I just find it more intuitive to have the axes ordered (x, y, z). Removing this transpose should not matter for any functionalities, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all good here for sure!
src/membrain_seg/networks/unet.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all good!
to be clear - the path forward here is to merge and iterate! 🙂 |
Simplify redundant condition Co-authored-by: alisterburt <alisterburt@gmail.com>
Thanks a lot for your suggestions. I'll definitely try to implement them in the next iteration. The project.scripts block and the Typer function for the command line interface sounds like they can make the whole package much more convenient to use! :) Regarding the timing of the FFT: It does indeed take a while to compute the FFTs for the large tomograms. So you would propose a sliding window approach for extraction / matching of the frequencies? Since FFT scales with O(n*log(n)), it should be more efficient to compute a lot of smaller FFTs than computing the FFt of the entire volume, right? |
I wouldn't match on a window directly, I would average the FFTs over the sliding windows then do my spectrum matching on that average spectrum - dealing with the smaller spectrum should be a little easier and if needed the FFTs of the windows could be evaluated in parallel. A quick look at complexity suggests batching should be quicker (n*log(n/8)), actual tests don't seem to show a huge benefit there... a = torch.rand((256, 256, 256))
b = torch.rand((8, 128, 128, 128))
%timeit torch.fft.fftn(a, dim=(-3, -2, -1))
%timeit torch.fft.fftn(b, dim=(-3, -2, -1))
148 ms ± 1.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
117 ms ± 583 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) |
This branch implements the
For both cropping and extension, an ellipsoid mask with cosine decay to zero is applied to avoid artifacts.
Let me know what you think :)