Preprocessing #8

LorenzLamm · 2023-05-16T13:23:13Z

This branch implements the

pixel size matching: Fourier cropping / Fourier extension to achieve the specified tomogram pixel size.
For both cropping and extension, an ellipsoid mask with cosine decay to zero is applied to avoid artifacts.
Spectral matching: This was adapted from the implementation of DeePict (https://github.com/ZauggGroup/DeePiCt/tree/main/spectrum_filter). I adjusted some details to avoid artifacts and division by values close to zero.

Let me know what you think :)

…e flip is redundant.

…excessive values.

LorenzLamm · 2023-05-16T13:24:07Z

src/membrain_seg/dataloading/data_utils.py

Minor adjustments to normalize tomograms and return pixel size from the header.

LorenzLamm · 2023-05-16T13:25:57Z

src/membrain_seg/dataloading/memseg_augmentation.py

Found a small error in the augmentation script: if "prob_to_one" (i.e. maximum augmentations) is specified, it should apply the flipping with probability 0.5 instead of 1.0. Otherwise the flipping always happens and is redundant.
(Probably I should have added this in another branch, but I hope it's also okay here)

all good here for sure!

LorenzLamm · 2023-05-16T13:26:44Z

src/membrain_seg/networks/unet.py

Another small training adjustment (sorry for placing it here).
This multiplication by the number of elements leads to the correct computation of training loss for logging.

LorenzLamm · 2023-05-16T13:28:37Z

src/tomo_preprocessing/extract_spectrum.py

This is copied from the DeePict repository. I'm not sure how to properly credit them. I copied these comments to the top of the file and hope that is okay. Is it?

The script opens the tomogram, normalizes it, extracts the rotationally averaged Fourier spectrum, and stores it in a csv file.

LorenzLamm · 2023-05-16T13:30:52Z

src/tomo_preprocessing/match_pixel_size.py

This is the script for matching the pixel sizes.
It first computes the output shape, and then, depending on whether the output pixel size is smaller than the input pixel size, it will choose to perform either Fourier cropping (i.e. crop Fourier components to the shape of the output tomogram) or Fourier extension (Padding the Fourier space with zeros to receive the desired shape).

LorenzLamm · 2023-05-16T13:33:52Z

src/tomo_preprocessing/match_spectrum.py

This is the interface script for matching the spectrum of two tomograms. It is also adjusted from the DeePict repository.

It first loads the tomogram and the target spectrum (csv) and then matches the input tomogram spectrum to the target spectrum.

I adjusted the script to additionally accept arguments "--shrink_excessive_value" to avoid too large matching vector components and "--almost_zero_cutoff", which cuts off the matching vector at the first Fourier coefficient that is below 0.1.

LorenzLamm · 2023-05-16T13:34:39Z

src/tomo_preprocessing/matching_utils/filter_utils.py

Copied from the DeePict repository.
Some utilities to deal with the spectrum vector.

LorenzLamm · 2023-05-16T13:37:05Z

src/tomo_preprocessing/matching_utils/px_matching_utils.py

Utils for the pixel size matching:

Fourier cropping

Fourier extension

ellipsoid mask (maximally fit into tomogram shape with certain border)

smooth cosine decay

LorenzLamm · 2023-05-16T13:38:10Z

src/tomo_preprocessing/matching_utils/spec_matching_utils.py

This is also mainly copied from DeePict, but adjusted in some cases to avoid artifacts.

LorenzLamm · 2023-05-16T13:40:04Z

src/tomo_preprocessing/matching_utils/spec_matching_utils.py

+    almost_zero_cutoff_value = np.maximum(
+        np.minimum(np.min(almost_zeros_input) - 4, np.min(almost_zeros_target) - 4), 0
+    )
+


This cuts off the FFT values from the first value that is below 0.1 in the input spectrum. The target spectrum will be divided by the input spectrum, so low values in the input spectrum can be problematic.

LorenzLamm · 2023-05-16T13:42:21Z

src/tomo_preprocessing/matching_utils/spec_matching_utils.py

+            try:
+                equal_v[cutoff:] = 0
+            except IndexError:
+                warnings.warn("Flat cutoff is higher than maximum frequency")


I adjusted this smoothing also: It is normally used for a sigmoid decay to zero that starts at the "cutoff" value. However, I think it's more intuitive if after cutoff, all values are zeros. So I shifted the sigmoid to smaller values and set all values above cutoff to 0.

LorenzLamm · 2023-05-16T13:43:02Z

src/tomo_preprocessing/matching_utils/spec_matching_utils.py

+
+    if shrink_excessive_value:
+        equal_v[equal_v > shrink_excessive_value] = shrink_excessive_value
+    # Create the equalization kernel


This is another security check to avoid too large factors (by default, it's limited by shrink_excessive_value=50).

alisterburt

hey @LorenzLamm

First - this is really awesome, I'm glad we have this functionality in here. The PR is too big for comments on everything individually to not be a slow, frustrating experience to iterate on - instead I suggest we merge this and I will provide some overarching comments that can guide improving things as we move forward

x/y/z ordering of images is a little strange, could you explain what's going on there? :)
matching utils should probably be a subpackage of the preprocessing package, if writing this myself I would probably the following organisation
preprocessing
- pixel_size_matching
- - _cli.py
- - match_pixel_size.py
- amplitude_spectrum_matching
- - _cli.py
- - match_amplitude_spectrum.py

Rather than .py scripts which have to be located/added to path/executed, you can install scripts during package installation automatically with the project.scripts block in the pyproject.toml file - here is a PR to a different project where I template/discuss this for someone else bbarad/ETSegTools#1

Discussed in the PR above, I really like using Typer for turning a simple type annotated function into a script which can be executed from the command line - worth trying!

In general it would be great to have more explicit function names e.g. radial_average_3d rather than rad_avg

Does the spectrum matching take a while because of the large fft it computes or is it no big deal? If so we might consider calculating the sum of spectra over a number of smaller 3D patches to do the estimation - this can also be used to increase signal by taking overlapping patches

alisterburt · 2023-05-16T17:36:34Z

src/membrain_seg/dataloading/data_utils.py

-def load_tomogram(filename, return_header=False, normalize_data=False):
+def load_tomogram(
+    filename, return_pixel_size=False, return_header=False, normalize_data=False
+):
    """
    Loads data and transposes s.t. we have data in the form x,y,z.


Suggested change

Loads data and transposes s.t. we have data in the form x,y,z.

Loads data and transposes s.t. we have data in the form x,y,z.

I hadn't noticed this before - could you explain why the transpose is necessary?

The mrcfile package loads the tomograms by default in format (z, x, y) if I remember correctly. It's not really necessary to transpose the axes. I just find it more intuitive to have the axes ordered (x, y, z). Removing this transpose should not matter for any functionalities, though.

alisterburt · 2023-05-16T17:37:05Z

src/membrain_seg/dataloading/memseg_augmentation.py

all good here for sure!

src/membrain_seg/dataloading/memseg_augmentation.py

alisterburt · 2023-05-16T17:37:50Z

src/membrain_seg/networks/unet.py

alisterburt · 2023-05-17T08:55:11Z

to be clear - the path forward here is to merge and iterate! 🙂

Simplify redundant condition Co-authored-by: alisterburt <alisterburt@gmail.com>

LorenzLamm · 2023-05-17T14:31:22Z

hey @LorenzLamm

First - this is really awesome, I'm glad we have this functionality in here. The PR is too big for comments on everything individually to not be a slow, frustrating experience to iterate on - instead I suggest we merge this and I will provide some overarching comments that can guide improving things as we move forward

x/y/z ordering of images is a little strange, could you explain what's going on there? :)

matching utils should probably be a subpackage of the preprocessing package, if writing this myself I would probably the following organisation

preprocessing

pixel_size_matching

_cli.py

match_pixel_size.py

amplitude_spectrum_matching

_cli.py

match_amplitude_spectrum.py

Rather than .py scripts which have to be located/added to path/executed, you can install scripts during package installation automatically with the project.scripts block in the pyproject.toml file - here is a PR to a different project where I template/discuss this for someone else bbarad/ETSegTools#1

Discussed in the PR above, I really like using Typer for turning a simple type annotated function into a script which can be executed from the command line - worth trying!

In general it would be great to have more explicit function names e.g. radial_average_3d rather than rad_avg

Does the spectrum matching take a while because of the large fft it computes or is it no big deal? If so we might consider calculating the sum of spectra over a number of smaller 3D patches to do the estimation - this can also be used to increase signal by taking overlapping patches

Thanks a lot for your suggestions. I'll definitely try to implement them in the next iteration. The project.scripts block and the Typer function for the command line interface sounds like they can make the whole package much more convenient to use! :)

Regarding the timing of the FFT: It does indeed take a while to compute the FFTs for the large tomograms. So you would propose a sliding window approach for extraction / matching of the frequencies? Since FFT scales with O(n*log(n)), it should be more efficient to compute a lot of smaller FFTs than computing the FFt of the entire volume, right?
But is the transform then still roughly equivalent?
I guess I'll do some experiments on this! :)

alisterburt · 2023-05-17T15:45:20Z

Regarding the timing of the FFT: It does indeed take a while to compute the FFTs for the large tomograms. So you would propose a sliding window approach for extraction / matching of the frequencies? Since FFT scales with O(n*log(n)), it should be more efficient to compute a lot of smaller FFTs than computing the FFt of the entire volume, right?
But is the transform then still roughly equivalent?

I wouldn't match on a window directly, I would average the FFTs over the sliding windows then do my spectrum matching on that average spectrum - dealing with the smaller spectrum should be a little easier and if needed the FFTs of the windows could be evaluated in parallel.

A quick look at complexity suggests batching should be quicker (n*log(n/8)), actual tests don't seem to show a huge benefit there...

a = torch.rand((256, 256, 256))
b = torch.rand((8, 128, 128, 128))
%timeit torch.fft.fftn(a, dim=(-3, -2, -1))
%timeit torch.fft.fftn(b, dim=(-3, -2, -1))
148 ms ± 1.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
117 ms ± 583 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Lorenz Lamm added 10 commits April 21, 2023 18:10

First version

43559e5

First version

760a3d9

Clean up and add docstrings

de0fe98

Add tomogram normalization and conversion to float.

4f26564

Set prob of RandAxisFlipd to 0.5 for prob_to_one argument -- otherwis…

cc5863d

…e flip is redundant.

Normalize training loss w.r.t. number of samples

199572a

Add tomogram normalization

6560be8

Add tomogram normalization and adjust parser

0f5734a

Normalize tomogram and add new frequency cutoffs to the parser

39839f2

Conversion to float and adding close-to-zero cutoff and shrinking of …

8f21c3c

…excessive values.

LorenzLamm commented May 16, 2023

View reviewed changes

LorenzLamm marked this pull request as ready for review May 16, 2023 13:45

alisterburt approved these changes May 17, 2023

View reviewed changes

Update src/membrain_seg/dataloading/memseg_augmentation.py

6513207

Simplify redundant condition Co-authored-by: alisterburt <alisterburt@gmail.com>

LorenzLamm merged commit 09571cb into main May 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing #8

Preprocessing #8

LorenzLamm commented May 16, 2023

LorenzLamm May 16, 2023

LorenzLamm May 16, 2023

alisterburt May 16, 2023

LorenzLamm May 16, 2023

alisterburt May 16, 2023

LorenzLamm May 16, 2023

LorenzLamm May 16, 2023

LorenzLamm May 16, 2023

LorenzLamm May 16, 2023

LorenzLamm May 16, 2023

LorenzLamm May 16, 2023

LorenzLamm May 16, 2023

LorenzLamm May 16, 2023

LorenzLamm May 16, 2023

alisterburt left a comment

alisterburt May 16, 2023

LorenzLamm May 17, 2023

alisterburt May 16, 2023

alisterburt May 16, 2023

alisterburt commented May 17, 2023

LorenzLamm commented May 17, 2023

alisterburt commented May 17, 2023

	Loads data and transposes s.t. we have data in the form x,y,z.
	Loads data and transposes s.t. we have data in the form x,y,z.

Preprocessing #8

Preprocessing #8

Conversation

LorenzLamm commented May 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alisterburt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alisterburt commented May 17, 2023

LorenzLamm commented May 17, 2023

alisterburt commented May 17, 2023