Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Templating and deterministic maps #7

Closed
rabernat opened this issue Nov 19, 2020 · 4 comments
Closed

Templating and deterministic maps #7

rabernat opened this issue Nov 19, 2020 · 4 comments

Comments

@rabernat
Copy link
Contributor

In yesterday's meeting, we discussed the widespread desire for this spec to handle deterministic, evenly spaced-out chunks within a file, rather than explicitly enumerating each chunk and its offsets. We would need some sort of support for templates and symbolic expressions. For a 1D file, it could be something like

"{i}", ["s3://bucket/path/file.nc", "3200 * {i}", 3200]

or

"{i}", ["s3://bucket/path/file.nc", "{itemsize} * {i}", 3200]

where {itemsize} would get filled with 3200

For ND it would be harder because you need to know the array shape:

"{j}.{i}", ["s3://bucket/path/file.nc", "{itemsize} * ({j} * Nx + {i})", 3200]

I can't think of a clever way around that.

Ideas? @manzt? @joshmoore?

@martindurant
Copy link
Member

In sketching this out, we may also want to consider:

  • key names that are regex, which is a well-defined thing
  • expressions are jinja-like, including support for functions
  • expressions are simple, like "itemsize * i" (actually, if we know itemsize, I would replace it before saving)

I wonder, is there an overlap with the discussion about coordinates specified programatically in xarray?

To what extent does this the putative option of binary storage for offsets (i.e., not json, perhaps zarr itself) mitigate the problem, due to efficient compression of integers?

Sorry if that's too many thoughts for the limits of this issue.

@rabernat
Copy link
Contributor Author

Given the complexity, we might also consider punting on this and releasing an initial spec that requires explicit keys.

@martindurant
Copy link
Member

A spec can be added to easier than removed from ...

@joshmoore
Copy link

Martin suggested my adding feedback I received from the @bioformats team here. I asked in a general way if there were anything needed for the spec to maximize the number of formats that we could support. There's hope that some files for some of the proprietary formats will be readable, but just taking the most prevalent (TIFF), there are a large number of edge cases that would require a callback function of some form. So a fourth argument:

(name, templated-size, templated-offset, processing-function)

since data that is interleaved, reversed, sample packed, etc. might need reversing, striding, etc. Details can be found in TiffParser.getSamples. Obviously, this is in no way a MUST (all for KISS) but it would be interesting to hear if a minimal library of such functions would help in other domains as well.

cc: @melissalinkert @dgault @sbesson @cgohlke @manzt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants