Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards reductions along a subset of axes #85

Open
davidhassell opened this issue Jun 2, 2023 · 3 comments
Open

Towards reductions along a subset of axes #85

davidhassell opened this issue Jun 2, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@davidhassell
Copy link
Collaborator

@bnlawrence and I have just been chatting about issues around using active storage to create reductions along a subset of axes (e.g. calculating the temporal mean at each point in X-Y space, that creates a logically 2-d array).

This might not make too much sense (:)), but is an attempt to capture what we said before we forget it.

The gist was that reducing over a subset of axes is a problem for PyActiveStorage, and not StackHPC. We would tell PyActiveStorage (PAS) that we wanted a subset of axes, and PAS would then translate that to whatever server-side-storage-chunk slices are needed to deliver that, and then PAS would then combine these to the appropriate N-d array ready to be passed back to Dask/cf-python.

Potential pitfalls that occur to me:

  • Performance: when taking the T average of the (T, Y, X) array, there will be X times Y requests, rather than one request per storage chunk
  • StackHPC would have to pass pack the result's location, as well as the data and sample size, etc.

Brain dump over. Hopefully this will all make sense whenever we read this next (probably not before September 2023!).

@markgoddard
Copy link

Is this a case of it being too complicated to offload to S3 active storage initially, or do you see this as being an operation where S3 active storage cannot add value?

I'm just wondering whether we could accept a list of selections upon which to take a sum/count, and return a list of results to be combined as necessary by PyActiveStorage.

@sd109
Copy link
Collaborator

sd109 commented Jul 5, 2023

Just while I remember, in the wider Excalidata meeting last Friday the DDN team mentioned some technical difficulty in their active storage implementation that would make it difficult to return the result of a single reduction if that result was over something like 4kB in size (I may be mis-remembering the exact number). This is obviously not a problem for a simple reduction but, as @bnlawrence and I discussed in that meeting, it might end up being an issue if we are interested in returning reductions along subsets of axes as described here since reduction results could then become arbitrarily large. This means that it might only be possible to handle such functionality on the PyActiveStorage side if we are to have matching functionality between the S3 and Posix implementations.

I guess more generally, we should probably have some shared list of discussed feature enhancement ideas that we can run past the DDN team too before we stray too far from what's possible on their end.

@bnlawrence bnlawrence added the enhancement New feature or request label Jul 5, 2023
@markgoddard
Copy link

Perhaps where these limits apply for a storage backend we could slice up the request into sufficiently small batches then aggregate in PyActiveStorage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants