You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@bnlawrence and I have just been chatting about issues around using active storage to create reductions along a subset of axes (e.g. calculating the temporal mean at each point in X-Y space, that creates a logically 2-d array).
This might not make too much sense (:)), but is an attempt to capture what we said before we forget it.
The gist was that reducing over a subset of axes is a problem for PyActiveStorage, and not StackHPC. We would tell PyActiveStorage (PAS) that we wanted a subset of axes, and PAS would then translate that to whatever server-side-storage-chunk slices are needed to deliver that, and then PAS would then combine these to the appropriate N-d array ready to be passed back to Dask/cf-python.
Potential pitfalls that occur to me:
Performance: when taking the T average of the (T, Y, X) array, there will be X times Y requests, rather than one request per storage chunk
StackHPC would have to pass pack the result's location, as well as the data and sample size, etc.
Brain dump over. Hopefully this will all make sense whenever we read this next (probably not before September 2023!).
The text was updated successfully, but these errors were encountered:
Is this a case of it being too complicated to offload to S3 active storage initially, or do you see this as being an operation where S3 active storage cannot add value?
I'm just wondering whether we could accept a list of selections upon which to take a sum/count, and return a list of results to be combined as necessary by PyActiveStorage.
Just while I remember, in the wider Excalidata meeting last Friday the DDN team mentioned some technical difficulty in their active storage implementation that would make it difficult to return the result of a single reduction if that result was over something like 4kB in size (I may be mis-remembering the exact number). This is obviously not a problem for a simple reduction but, as @bnlawrence and I discussed in that meeting, it might end up being an issue if we are interested in returning reductions along subsets of axes as described here since reduction results could then become arbitrarily large. This means that it might only be possible to handle such functionality on the PyActiveStorage side if we are to have matching functionality between the S3 and Posix implementations.
I guess more generally, we should probably have some shared list of discussed feature enhancement ideas that we can run past the DDN team too before we stray too far from what's possible on their end.
Perhaps where these limits apply for a storage backend we could slice up the request into sufficiently small batches then aggregate in PyActiveStorage?
@bnlawrence and I have just been chatting about issues around using active storage to create reductions along a subset of axes (e.g. calculating the temporal mean at each point in X-Y space, that creates a logically 2-d array).
This might not make too much sense (:)), but is an attempt to capture what we said before we forget it.
The gist was that reducing over a subset of axes is a problem for PyActiveStorage, and not StackHPC. We would tell PyActiveStorage (PAS) that we wanted a subset of axes, and PAS would then translate that to whatever server-side-storage-chunk slices are needed to deliver that, and then PAS would then combine these to the appropriate N-d array ready to be passed back to Dask/cf-python.
Potential pitfalls that occur to me:
Brain dump over. Hopefully this will all make sense whenever we read this next (probably not before September 2023!).
The text was updated successfully, but these errors were encountered: