Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update normalization #32

Merged
merged 4 commits into from
Nov 3, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,13 +59,21 @@ Must be a list of *tensor specification keys*.

*tensor specification keys*:
- `name` tensor name
- `axes` string of axes identifiers, e.g. btczyx
- `data_type` data type (e.g. float32)
- `data_range` tuple of (minimum, maximum)
- `axes` string of axes identifying characters from: btczyx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a place to give detailed definition for the axes and the meaning? Are we allowed to give a custom letter to it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be as restrictive with these letters as possible to keep them meaningful/useful from a consumer software perspective, e.g restrict them to btczyx. We can have a (separate) discussion on the axes keys (use HW, instead of xy, etc...). I'd prefer to delay that for 0.3.1+, for now in a given input the description field could add specific meaning in the model context for humans? I feel an axes_description field would go a bit too far anyway, but again, let's leave that for future discussion if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that for now, it is too ambitious to get this kind of specifications. I think it was already discussed at some point but outputs might also be better described with rows and columns. While in NumPy arrays those could be understood as HW, displaying them as tables could need a different kind of description.

- `shape` specification of tensor shape\
Either as *exact shape with same length as `axes`*\
or as {`min` *minimum shape with same length as `axes`*, `step` *minimum shape change with same length as `axes`*}

- `preprocessing` optional description of how this input should be preprocessed
- `name` name of preprocessing (currently only 'zero_mean_unit_variance' is supported)
- `kwargs` key word arguments for `preprocessing`\
for 'zero_mean_unit_variance' these are:
- `mode`: either 'fixed', 'per_dataset', or 'per_sample'
- `axes`: subset of axes to normalize jointly, e.g. 'xy', batch ('b') is not a valid axis key here!
- `mean`: mean if mode == fixed, e.g. (with channel dimension of length c=3, and all axes 'cxy') [1.1, 2.2, 3.3]
- `std`: standard deviation if mode == fixed analogously to mean

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We like explicit normalization descriptions in this way, but would like to talk about supported normalization schemes rather sooner than later. Maybe one of the next meetings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put it on the agenda 👍

- `outputs`
Describes the output tensors from this model.
Must be a list of *tensor specification*.
Expand Down
19 changes: 12 additions & 7 deletions models/UNet2dExample.model.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# TODO physical scale of the data
# TODO discuss the depenendencies more closely

### keys shared with website rdf: https://github.com/bioimage-io/bioimage.io/blob/master/docs/resource-description-file.md
Expand All @@ -21,21 +20,27 @@ tags: [unet2d, pytorch, nucleus-segmentation]
license: MIT

documentation: ./unet2d.md
covers: [] # todo: covers in root or model?
covers: []
attachments: {}



### model spec specific keys:

inputs: # needs to become ordered dict (same for output) or list of dicts should contain names to
# discussion about NCHWD versus bxyzt this also has implications on the location of the origin and the
# TODO physical scale of the data
inputs:
- name: input
axes: bcyx #btczyx
data_type: float32
data_range: [-inf, inf]
shape: [1, 1, 512, 512]
normalization: {name: zero_mean_unit_var, kwargs: {mean: 129, std: 23}}
axes: bcyx # letters in axes in btczyx # todo: discussion about NTCHWD versus btczyx
shape: [1, 2, 512, 512]
preprocessing: # optional description of the input's preprocessing
name: zero_mean_unit_variance # name of preprocessing. Currently only zero_mean_unit_variance is supported
kwargs: # example kwargs for zero_mean_unit_variance
mode: fixed # mode in [fixed, per_dataset, per_sample]
axes: xy # subset of axes to normalize jointly, batch ('b') is not a valid axis key here!
mean: [1.1, 2.2, 3.3] # mean if mode == fixed. Here it is a list (because in this example we assume a channel dimension of length c=2)
std: [0.1, 0.2, 0.3] # standard deviation if mode == fixed analogously to mean


outputs:
Expand Down