Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update normalization #32

Merged
merged 4 commits into from
Nov 3, 2020
Merged

update normalization #32

merged 4 commits into from
Nov 3, 2020

Conversation

FynnBe
Copy link
Member

@FynnBe FynnBe commented Oct 23, 2020

README.md will be updated shortly...

@FynnBe FynnBe requested a review from oeway October 23, 2020 10:14
@FynnBe FynnBe changed the title [WIP] update normalization update normalization Oct 23, 2020
@FynnBe
Copy link
Member Author

FynnBe commented Oct 23, 2020

We forgot about the need to specify a list of means/std values for mode 'fixed' in case axes are specified. Other than that everything is exactly as discussed in today's bioimage-io meeting.

- `data_type` data type (e.g. float32)
- `data_range` tuple of (minimum, maximum)
- `axes` string of axes identifying characters from: btczyx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a place to give detailed definition for the axes and the meaning? Are we allowed to give a custom letter to it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be as restrictive with these letters as possible to keep them meaningful/useful from a consumer software perspective, e.g restrict them to btczyx. We can have a (separate) discussion on the axes keys (use HW, instead of xy, etc...). I'd prefer to delay that for 0.3.1+, for now in a given input the description field could add specific meaning in the model context for humans? I feel an axes_description field would go a bit too far anyway, but again, let's leave that for future discussion if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that for now, it is too ambitious to get this kind of specifications. I think it was already discussed at some point but outputs might also be better described with rows and columns. While in NumPy arrays those could be understood as HW, displaying them as tables could need a different kind of description.

Copy link
Contributor

@oeway oeway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@esgomezm
Copy link
Contributor

OK for the normalization as well.

PD: In a quick search, I found some nice comments about normalization/standardization/stretching:
https://stackoverflow.com/questions/33610825/normalization-in-image-processing/33611556#33611556
there is also centering like mean/median centering for just fixing the mean/median to 0 value.

@FynnBe
Copy link
Member Author

FynnBe commented Oct 26, 2020

https://stackoverflow.com/questions/33610825/normalization-in-image-processing/33611556#33611556

  • Data standarization is another way of normalizing the data

So normalizing data is not the same as normalization I suppose... I see the ambiguity. So how about we change the name of the normalization field to preprocessing (What, wasn't that what it's called before?? Yes, it was, we had our reasons then and maybe should have listened to our past us... Anyhow, the big difference would be that this is a subfield of an input now. And it is not attached to any .transformation.yaml files, but instead we agree on a strict set of valid transformation names and their kwargs).

All in favor of renaming our newly introduced normalization to preprocessing say 👍

@oeway
Copy link
Contributor

oeway commented Oct 26, 2020

preprocessing looks good, it make sense that we support only a very minimal set of preprocessing ops.

- `axes` subset of input `axes` to normalize independently (e.g. 'c')
- `mean` mean to normalize with (only applies for `mode` 'fixed'). This may be a (nested) list depending on `axes`, e.g. for `axes` 'c' a list of means for each channel; or for axes: 'cz' a list for each channel c of a nested list of means for each z position of that channel.
- `std` standard deviation to normalize with (only applies for `mode` 'fixed'). This may be a (nested) list depending on `axes` analogously to `mean`.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We like explicit normalization descriptions in this way, but would like to talk about supported normalization schemes rather sooner than later. Maybe one of the next meetings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put it on the agenda 👍

@FynnBe
Copy link
Member Author

FynnBe commented Oct 30, 2020

We decided in today's bioimage.io meeting:

  • the preprocessing key is associated to an input in inputs. More complicated preprocessing schemes will have to be covered by model source code for the time being.
  • make preprocessing a list of dicts with a name and a kwargs key. Order of application is the order in this list.
  • name needs to be a valid preprocessing name
  • kwargs need to contain valid key word arguments for the preprocessing specified by name

valid preprocessings we start with:

  • zero_mean_unit_variance: with kwargs:
    • mode: (fixed/per_dataset/per_sample)
    • axes: xy # subset of axes to normalize jointly, batch ('b') is not a valid axis key here!
    • mean: [1.1, 2.2, 3.3] # mean if mode == fixed. Here it is a list (because in this example we assume a channel dimension of length c=3)
    • std: [0.1, 0.2, 0.3] # standard deviation if mode == fixed analogously to mean

todos:

  • min_max normalization with fixed min, max (from training)
  • percentile normalization
  • clipping

@FynnBe
Copy link
Member Author

FynnBe commented Nov 3, 2020

open todos moved to #37

@FynnBe FynnBe merged commit 1d12ee6 into master Nov 3, 2020
@FynnBe FynnBe deleted the update_normalization branch November 3, 2020 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants