Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_multiscales_metadata no_transformations #162

Merged

Conversation

will-moore
Copy link
Member

@will-moore will-moore commented Jan 25, 2022

As discussed at ome/omero-cli-zarr#93 (comment)
this allows the 'pathsargument ofwrite_multiscales_metadata()to take a list ofdatasets` dicts.

EDIT: Also updated to validate latest 0.4 spec with coordinateTransformations of scale and translation.

@codecov
Copy link

codecov bot commented Jan 25, 2022

Codecov Report

Merging #162 (b081342) into master (437b8b0) will increase coverage by 0.41%.
The diff coverage is 93.02%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #162      +/-   ##
==========================================
+ Coverage   83.11%   83.53%   +0.41%     
==========================================
  Files          12       12              
  Lines        1285     1354      +69     
==========================================
+ Hits         1068     1131      +63     
- Misses        217      223       +6     
Impacted Files Coverage Δ
ome_zarr/writer.py 96.31% <92.30%> (-1.51%) ⬇️
ome_zarr/format.py 96.81% <93.33%> (-1.42%) ⬇️
ome_zarr/reader.py 85.20% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 437b8b0...b081342. Read the comment docs.

@will-moore will-moore changed the title remote transformations from write_multiscales_metadata() write_multiscales_metadata no_transformations Jan 26, 2022
Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good and inline with the plate.wells handling. Can a few unit tests be added to test_writer to cover datasets as dictionaries as well as the None,[] scenarios?

I suspect the next step will be to add some code to validate optional transformations elements on write. This can be captured as an issue for now.

@sbesson
Copy link
Member

sbesson commented Jan 31, 2022

With transformations being proposed to be renamed as coordinateTransforms and to become a requirement at the datasets level, I think supporting datasets to be passed as dictionaries is becoming extremely timely.

@will-moore
Copy link
Member Author

@sbesson The test_writer() does write_image(...transformations) and then reads the result to check transformations are there. The test didn't change in this PR yet because the API for write_image() didn't change.

But I can also add more tests for write_multiscales_metadata directly.
I think I'll include the coordinateTransforms renaming in this PR. It will be slightly out of step with omero-cli-zarr but it is probably simpler to keep the number of PRs smaller?

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few API questions primarily related to the questions around defining/validating transformations

ome_zarr/writer.py Show resolved Hide resolved

if coordinateTransformations is not None:
for dataset, transform in zip(datasets, coordinateTransformations):
dataset["coordinateTransformations"] = transform
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also discussion in https://github.com/ome/ngff/pull/85/files#r794486756 alongside the ability to define global vs dataset/level coordinate transformations. If both forms are allowed, I suspect we will need to have a way for the API to handle both variants?

ome_zarr/writer.py Outdated Show resolved Hide resolved
raise ValueError("Empty datasets list")
for dataset in datasets:
if isinstance(dataset, str):
validated_datasets.append({"path": dataset})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the question below, should a scale transformations be auto-inferred if only lists of strings are passed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default scale automatically include the aspect ratios between the different multi-scale datasets? In this case, this is not an information we will be able to derive from a list of paths. Should this API only support list of dictionaries as an input?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In omero-cli-zarr I use this for generating a set of transformations for a pyramid:

def marshal_transformations(image, levels=1, multiscales_zoom=2.0)

But I realise that this is quite omero-cli-zarr specific (e.g. no down-sampling in Z). So yes, I guess this is required now.

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions


def validate_coordinate_transformations(
self,
shapes: List[tuple],
Copy link
Member

@sbesson sbesson Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels strange to have this API depending on shapes especially as only the shapes length is used in the rest. Given the known assumptions on datasets, I wonder if len(axes) would be sufficient ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that was left-over from earlier when the validate_coordiante_transformations() included the functionality from generate_coordinate_transformations(shapes).
However, it still validates that len(scale) == len(shape) for each level, so we at least need a [len(data.shape) for data in pyramid].
I don't think we have any validation that len(shape) is the same for all datasets in a pyramid. In fact, I don't even see that this is required by the spec?!
Also we don't have any validation that the first dataset is the largest, which we could do with shapes, but I guess that won't happen in validate_coordinate_transformations() so we don't really need shapes there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct there is no explicit statement about the lengths of the dimension arrays is not explicit. Probably the closest is this sentence about axes in multiscales "The number of values MUST be the same as the number of dimensions of the arrays corresponding to this image."

I am implicitly reading that that all the arrays for each dataset must have identical dimensions. Am I over interpreting anything @joshmoore @constantinpape ?

ome_zarr/format.py Show resolved Hide resolved
if datasets is None or len(datasets) == 0:
raise ValueError("Empty datasets list")
for dataset in datasets:
if isinstance(dataset, str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this block can be completely removed now and datasets == str should be handled by the ValueError below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That didn't quite work because we still support List[str] list of paths for < 0.4. Lots of tests failed when I removed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this API was introduced in #149 and never made available in a public release, I would be personally fine with completely dropping this form and updating all the failing tests to use the new datasets form. The alternative would be to add some form of check on the format version here and fail hard if fmt.version is not 0.3 or lower.

I think the biggest issue I have reading the code is that it looks like there is a path for using this API and creating a 0.4 version of the multiscales with datasets only composed of path which is invalid. I can look into that in a follow-up PR.

ome_zarr/writer.py Show resolved Hide resolved
Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. One minor question around the redundant call to the transformations validation.

I think the unit tests should cover most of the validation logic for the transforms. I propose to review the code coverage as well as #162 (comment) in a follow-up PR so that we can start testing this API.

Two general API thoughts which are probably outside the scope of this PR:

  • I can foresee the need to add support for writing of bot datasets:coordinateTransformations and multiscales:coordinateTransformations. A fairly crude idea would be to use the current API but support coordinate_transformations lists of dimensions len(datasets) + 1 and use the first or last item as the shared transformation
  • the additional logic in write_multiscales_metadata points at the fact that axes and datasets are now more tightly coupled to each other. Moving forward, I wonder whether _validate_datasets should simply become _validate_multiscales and internalize all the internal checks between fields. Longer-term, I would hope this API cab=n be simply replaced by the usage of generic validation framework/constraints.

In the interim, proposing to merge this so that we can test it together with omero-cli-zarr. Is it worth cutting a new pre-release?

shapes = [data.shape for data in pyramid]
if coordinate_transformations is None:
coordinate_transformations = fmt.generate_coordinate_transformations(shapes)
fmt.validate_coordinate_transformations(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this does not hurt, is that call now redundant with the one that is called later during the validation of the datasets element?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite! If you pass in a list of transformations that is longer than the number of pyramid levels, this will fail validation at this point. BUT, if we remove the validation here it won't fail later because the transformations are zipped onto the Datasets, so the lengths will match (and the extra transformations will be ignored).
It is probably better to fail than to ignore the mismatch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Semi-related I discovered that Python 3.10 introduced a strict argument to zip that will eventually allow similar code to fail on mismatching list lengths rather than truncating. Let's keep things as they are for now but maybe add an inline comment explaining this logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b081342

@sbesson
Copy link
Member

sbesson commented Feb 7, 2022

@will-moore https://pypi.org/project/ome-zarr/0.3a2/ is now available as a pre-release with this API. Do you want to bump the requirement in napari-ome-zarr and omero-cli-zarr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants