-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
write_multiscales_metadata no_transformations #162
write_multiscales_metadata no_transformations #162
Conversation
Codecov Report
@@ Coverage Diff @@
## master #162 +/- ##
==========================================
+ Coverage 83.11% 83.53% +0.41%
==========================================
Files 12 12
Lines 1285 1354 +69
==========================================
+ Hits 1068 1131 +63
- Misses 217 223 +6
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good and inline with the plate.wells
handling. Can a few unit tests be added to test_writer
to cover datasets
as dictionaries as well as the None,[]
scenarios?
I suspect the next step will be to add some code to validate optional transformations
elements on write. This can be captured as an issue for now.
With transformations being proposed to be renamed as |
@sbesson The But I can also add more tests for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few API questions primarily related to the questions around defining/validating transformations
|
||
if coordinateTransformations is not None: | ||
for dataset, transform in zip(datasets, coordinateTransformations): | ||
dataset["coordinateTransformations"] = transform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also discussion in https://github.com/ome/ngff/pull/85/files#r794486756 alongside the ability to define global vs dataset/level coordinate transformations. If both forms are allowed, I suspect we will need to have a way for the API to handle both variants?
raise ValueError("Empty datasets list") | ||
for dataset in datasets: | ||
if isinstance(dataset, str): | ||
validated_datasets.append({"path": dataset}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the question below, should a scale
transformations be auto-inferred if only lists of strings are passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the default scale
automatically include the aspect ratios between the different multi-scale datasets? In this case, this is not an information we will be able to derive from a list of paths. Should this API only support list of dictionaries as an input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In omero-cli-zarr
I use this for generating a set of transformations for a pyramid:
def marshal_transformations(image, levels=1, multiscales_zoom=2.0)
But I realise that this is quite omero-cli-zarr
specific (e.g. no down-sampling in Z). So yes, I guess this is required now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions
ome_zarr/format.py
Outdated
|
||
def validate_coordinate_transformations( | ||
self, | ||
shapes: List[tuple], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels strange to have this API depending on shapes especially as only the shapes length is used in the rest. Given the known assumptions on datasets, I wonder if len(axes)
would be sufficient ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that was left-over from earlier when the validate_coordiante_transformations()
included the functionality from generate_coordinate_transformations(shapes)
.
However, it still validates that len(scale) == len(shape)
for each level, so we at least need a [len(data.shape) for data in pyramid]
.
I don't think we have any validation that len(shape)
is the same for all datasets in a pyramid. In fact, I don't even see that this is required by the spec?!
Also we don't have any validation that the first dataset is the largest, which we could do with shapes
, but I guess that won't happen in validate_coordinate_transformations()
so we don't really need shapes
there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct there is no explicit statement about the lengths of the dimension arrays is not explicit. Probably the closest is this sentence about axes
in multiscales
"The number of values MUST be the same as the number of dimensions of the arrays corresponding to this image."
I am implicitly reading that that all the arrays for each dataset
must have identical dimensions. Am I over interpreting anything @joshmoore @constantinpape ?
if datasets is None or len(datasets) == 0: | ||
raise ValueError("Empty datasets list") | ||
for dataset in datasets: | ||
if isinstance(dataset, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this block can be completely removed now and datasets == str
should be handled by the ValueError
below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That didn't quite work because we still support List[str]
list of paths for < 0.4
. Lots of tests failed when I removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this API was introduced in #149 and never made available in a public release, I would be personally fine with completely dropping this form and updating all the failing tests to use the new datasets form. The alternative would be to add some form of check on the format version here and fail hard if fmt.version
is not 0.3 or lower.
I think the biggest issue I have reading the code is that it looks like there is a path for using this API and creating a 0.4
version of the multiscales
with datasets
only composed of path
which is invalid. I can look into that in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. One minor question around the redundant call to the transformations validation.
I think the unit tests should cover most of the validation logic for the transforms. I propose to review the code coverage as well as #162 (comment) in a follow-up PR so that we can start testing this API.
Two general API thoughts which are probably outside the scope of this PR:
- I can foresee the need to add support for writing of bot
datasets:coordinateTransformations
andmultiscales:coordinateTransformations
. A fairly crude idea would be to use the current API but supportcoordinate_transformations
lists of dimensionslen(datasets) + 1
and use the first or last item as the shared transformation - the additional logic in
write_multiscales_metadata
points at the fact thataxes
anddatasets
are now more tightly coupled to each other. Moving forward, I wonder whether_validate_datasets
should simply become_validate_multiscales
and internalize all the internal checks between fields. Longer-term, I would hope this API cab=n be simply replaced by the usage of generic validation framework/constraints.
In the interim, proposing to merge this so that we can test it together with omero-cli-zarr
. Is it worth cutting a new pre-release?
shapes = [data.shape for data in pyramid] | ||
if coordinate_transformations is None: | ||
coordinate_transformations = fmt.generate_coordinate_transformations(shapes) | ||
fmt.validate_coordinate_transformations( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this does not hurt, is that call now redundant with the one that is called later during the validation of the datasets
element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite! If you pass in a list of transformations that is longer than the number of pyramid levels, this will fail validation at this point. BUT, if we remove the validation here it won't fail later because the transformations are zipped onto the Datasets, so the lengths will match (and the extra transformations will be ignored).
It is probably better to fail than to ignore the mismatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Semi-related I discovered that Python 3.10 introduced a strict
argument to zip that will eventually allow similar code to fail on mismatching list lengths rather than truncating. Let's keep things as they are for now but maybe add an inline comment explaining this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in b081342
@will-moore https://pypi.org/project/ome-zarr/0.3a2/ is now available as a pre-release with this API. Do you want to bump the requirement in |
As discussed at ome/omero-cli-zarr#93 (comment)
this allows the 'paths
argument of
write_multiscales_metadata()to take a list of
datasets` dicts.EDIT: Also updated to validate latest 0.4 spec with
coordinateTransformations
ofscale
andtranslation
.