-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: add materialize
to materialize lazy arrays
#839
Comments
A question is whether it's appropriate for an array API consuming library to materialize a lazy graph "behind the user's back", as it were. Such an operation could be quite expensive in general, and a user might be surprised to find that a seemingly innocuous function from something like scipy is doing this. On the other hand, if an algorithm fundamentally depends on array values in a loop, there's no way it can be implemented without something like this. So maybe the answer is we should just provide guidance that functions that use |
If I understand https://data-apis.org/array-api/draft/design_topics/lazy_eager.html correctly, the primary APIs that require materialization are So another option here would be to add a |
A couple of thoughts:
|
Thank you for starting this discussion @lucascolley, and thanks for tagging me!
Note that the signature here should probably be more like
Adding some more here:
I don't think this is the primary API at all, it's just an interesting special case where the return type is out of our hands. The primary API is as @lucascolley says, a
I agree - in Xarray we very rarely use I also agree with the other 2 points @hameerabbasi just made. Xarray has a new abstraction over dask, cubed (and maybe soon JAX) called a " |
Thanks all for the comments! Just tagging @jakevdp also who can maybe shed some light on JAX. |
I don't think |
I'm +1 on an API that allows simultaneous materialisation of multiple arrays, although I'd spell it slightly differently.
With this in mind, the signature I'd propose is |
One thing I'm unclear on: what is the difference between materialized and non-materialized arrays in terms of the array API? What array API operations can you do on one, but not on the other? |
I feel this is more driven by use-cases and performance. Some of these are outlined in #748 (comment) and #728. |
One we bumped into in SciPy is |
As I mentioned above, Additionally, the APIs that have data-dependent shapes are |
OK, thanks for the clarification. In that case, |
I'd like to add a different perspective, based on execution models. I think we have fundamentally three kinds:
(1) Eager execution model Examples of implementations:
Any "execute or materialize now" API would be a no-op. (2) Fully lazy execution model Examples of implementations:
Any "execute or materialize now" API would need to raise an exception. (3) Hybrid lazy/eager execution model Examples of implementations: This is the only mode where an "execute or materialize now" API may be needed. This is not a given though, which is clear from PyTorch not having any such As pointed out by @asmeurer above, there are only very few APIs that cannot be kept lazy ( For PyTorch, the way things work in hybrid mode is that if actual values are needed, the computation is done automatically. No syntax is needed for this. And there doesn't seem to be much of a downside to this. EDIT: see https://pytorch.org/docs/stable/export.html#existing-frameworks for a short summary of various PyTorch execution models. MLX is in the middle: it does have syntax to trigger evaluation ( For Dask, it chooses to require There is another important difference between PyTorch (and fully lazy libraries like JAX/ndonnx as well) vs. Dask I think:
My current assessment is:
Now we obviously do have an issue with Dask/Xarray/Cubed that we need to understand better and find a solution for. It's a hard puzzle. That seems to require more thought, and perhaps a higher-bandwidth conversation soon. The ad-hoc-ness is (as far as I understand it - I could well be missing something of course) going to remain a fundamental problem for any attempt at standardization. I'd be curious to hear from @TomNicholas or anyone else with more knowledge about Dask why something like a user opt-in to auto-trigger compute whenever possible isn't a good solution. |
@lithomas1 asked this in dask/dask#11356 and the response from @phofl was
@fjetter said in dask/dask#11298 (comment)
|
Thanks for the pointers @lucascolley. So that seems to be a fairly conclusive "we have some experience and won't do that" - which is fair enough. A few thoughts on those discussions:
def compute(x):
if is_dask_array(x):
x.compute()
return x
def some_func(x):
if compute(x).shape[0] > 5):
# we couldn't avoid the `if` conditional in this logic
... |
Thanks Ralf, that makes sense. I'm pretty convinced that we don't want to add As @asmeurer mentioned previously, we still need to decide in SciPy whether we are comfortable with doing |
Thanks Lucas. I'll reopen this for now to signal we're not done with this discussion. I've given my input, but at least @hameerabbasi and @TomNicholas seem to have needs that perhaps aren't met yet. We may also want to improve the documentation around this topic. |
This aligns with the point I was trying to make above (#839 (comment)), which is that a library like scipy calling So I think that if scipy encounters this situation in one of its functions, it should either do nothing, i.e., require the user to materialize the array themselves before calling the function, or register the function itself as a lazy function (but how that would work would be array library dependent). |
I think it should be possible to use the introspection API to add different modes, where we raise errors by default but a user can opt-in to allowing us to force computation. The same can be said for device transfers via DLPack. |
It seems to me that the only library that cannot implement this is JAX, and there's a fairly straightforward workaround for that: Instead of a method, make this a decorator. JAX already has a decorator, so does PyTorch, and Dask could easily add one. Something like: def materialize(f):
@functools.wraps(f)
def wrapped(*a, **kw):
return dask.compute(f(*a, **kw))
return wrapped So the problem becomes one of aliasing.
I'd agree strongly with this. Maybe for dense arrays it is possible to find an optimal solution; but for sparse arrays the scheduling problem becomes intractable if the graph gets too large as one needs to actually look at the positions of the stored elements. (please correct me if I'm wrong, @willow-ahrens or @kylebd99). In addition, one very obvious case I can see scientific libraries needing it (rather than end-users who wanted to, e.g. plot something), is iterative algorithms. In this case, it'd be highly recommended to materialize at the end of every iteration of the algorithm, at least, and detecting that one was running an iterative algorithm from a graph alone could be a significant lift, or the "graph break" may occur in sub-optimal places. @materialize
def one_iteration(*some_state):
...
@materialize
def terminating_condition(*some_state) -> bool:
...
for i in range(iterations):
if terminating_condition(*some_state):
break
some_state = one_iteration(*some_state) |
I'll note that this seems again very implementation-specific. Manually breaking up a graph by inserting
The |
Right; that may be sub-optimal in many cases, as there's a big-O difference depending on where one breaks it up. The big-O difference is more severe for sparse arrays but also exists for dense arrays.
To be clear -- calls to Therefore I'm not proposing a call to Another example I can think of is incremental materialization: # `x` is a 1-D integer array
l = []
# Only way I can think of to materialize the whole thing right now
for i in range(x.shape[0]):
l.append(int(x[i])) Different algorithms may be selected depending on how much of the array needs to be materialized; and it is hard to know before the code runs how much will be materialized; or in what pattern. Such a pattern may be sub-optimal unless one does group materialization of some sort. As to the practical use of such a pattern: I can think of plotting libraries, which might require e.g. a materialized dense array of some sort. Another use-case:
It's true that where graph breaks are optimal will depend on the exact compilation pipeline, and this may seem like a case of trying to "beat the compiler" which most humans haven't historically been excellent at, but I'd consider some hints to be better than none in a field as intractable as this one; aside from library needs for arrays that exist in-memory. |
I'm not sure what you're trying to say here with "between iterations". The end of an iteration isn't challenging, iterations are invariable a loop conditioned by an
This is one of the very few other cases similar to
They'll need
Adding implementation-specific heuristics cannot be the right design approach for code written against a standard. It's just going to be library-specific. Basically there are two places where one must materialize:
Everything else seems to be a case of "it may be kept lazy, or it may benefit from library-specific hints/directives". |
This may materialize the bool resulting from those arrays, but not the arrays themselves; perhaps parts of the arrays may be optimised into another form such that they cannot be back-filled by the compilation process.
In this case, maybe |
Ah okay, that's the same concern as for Dask. My understanding is that it may indeed be suboptimal and that there is no way to make it more optimal that is general across libraries - or even different versions of the same library, since your scheduler may become smarter over time. I think the right answer here from an API standard design perspective is to live with that scenario being potentially suboptimal, and if there are severe cases that turn up in practice, the code author can reach for the library-specific primitive (e.g., |
Is there perhaps a compile-time/runtime reason to want materialization? i.e. the user would like to avoid an expensive computation later in e.g. an interactive section of the program?
Hameer is correct, the PyData Sparse jit runs long on long inputs. That's not to say we can't make improvements to make the system scale, or heuristically break things up into smaller chunks, but users can help the compiler out a lot by giving good hints. |
There may be perhaps, but if such cases arise it's again a matter of that being array library specific. Something that may be helpful or even a hard necessity for usage with one library may be detrimental for usage with another library. PyData Sparse should add a |
Preface
I do not think that I am the best person to champion this effort, as I am far from the most informed person here on Lazy arrays. I'm probably missing important things, but I would like to start this discussion as I think that it is an important topic.
The problem
The problem of mixing computation requiring data-dependent properties with lazy execution is discussed in detail elsewhere:
A possible solution
Add the function
materialize(x: Array)
to the top level of the API. Behaviour:Prior art
Concerns
device
kwargs in NumPy), but perhaps this is too obtrusive?Alternatives
compute*
or a method on the array object. Maybe with options for partial materialization (if that's a thing)?cc @TomNicholas @hameerabbasi @rgommers
The text was updated successfully, but these errors were encountered: