-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support reductions in slice notation, inspired by uhi #32
Comments
Should we support It keeps coming up. I worry about adding much in the way of data processing because slippery slope. Things like log-scaling an image can be done in the front end or by a separate microservice. But downsampling specifically is helpful to do “close to” the data because you can save so much space and time. My concerns are mostly practical:
|
In addition to downsampling over N pixels with By supporting mean but not sum we ensure that we can coerce to the original dtype (with rounding, if integer). The Central Limit Theorem removes concerns about overflow. |
I guess I’m leaning: “Let’s do it but mark it as experimental and reserve the right to revisit moving it into a data reduction/processing microservice, once we actually have one.” The enhancement wouldn’t add any new query parameters, and while the syntax is a bit “clever” it is backed by a documented standard (linked in my first post above) used by the formidable IRIS–HEP group. |
Summarizing a suggestion from @EliotGann addressing the question of how to handle boundary conditions:
That is, if the user asks for a downsampling factor that does not divide evenly, we can raise an error explaining that they need to do the trimming. That is: if you want fancy behavior you need to do a tiny bit of math to prove that you understand you are trimming data. We won’t silently trim it for you for fear that you may not realize we are doing it. |
In [1]: import numpy
In [2]: a = numpy.arange(10)
In [3]: a[::3]
Out[3]: array([0, 3, 6, 9])
In [4]: import toolz
In [5]: toolz.partition(3, a)
Out[5]: <zip at 0x7f27d213e780>
In [6]: list(toolz.partition(3, a))
Out[6]: [(0, 1, 2), (3, 4, 5), (6, 7, 8)]
In [7]: map(numpy.mean, toolz.partition(3, a))
Out[7]: <map at 0x7f27d111c220>
In [8]: list(map(numpy.mean, toolz.partition(3, a)))
Out[8]: [1.0, 4.0, 7.0] I like the idea of dashing off |
If we feel confident we'll stick with |
Summarizing the discussion above:
For example, given an image time series --- i.e. 3D array with dimensions (time, x, y):
|
https://uhi.readthedocs.io/en/latest/indexing.html
The text was updated successfully, but these errors were encountered: