-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
idempotency optimization on initialized outputs #353
Comments
Okay, I took a look: The first thing to say is A similar issue arises in this non-reducing kernel when we use Ultimately, these behaviors are a result of deciding not to reason, ever, about the current value held in a tensor. We don't do constant propagation through arrays, even trying to reason about whether something is the first write to a tensor after initialization. |
A similar issue arises during reduction: it's difficult to apply idempotent simplifications if the idempotent reduction is not in the innermost loop. Consider the reducing kernel:
The above code will always densify the output because it doesn't know whether a negative number has previously been written to the output entry. Also, we can only apply an idempotent simplification if the reduction is in the innermost loop. For example, if we do:
then we can eliminate contiguous maxes with 0 in the innermost loop. However, in
each update in the inner loop modifies a single, separate entry of I think that there is likely no way to get Finch to apply the idempotent operation unless the reduction is the innermost loop. However, I do think you could keep from densifying the output by defining a similar operation to |
As far as resolving this issue goes, what do you think we need to do here? Some options:
On the second two points, I think that we should just ask the user if they intended the kernel to be sparse. However, I'm open to suggestions about how to detect that it became dense @wraith1995 @nullplay. |
Thanks for looking into this and for the detailed explanation. The behavior makes a lot more sense now. This may be a bit of a pain/not feasible in the rewrite infrastructure, but I think that reasoning about the initialization value of x would be enough to capture some of the big cases. If x initialized with default(y) and f is abelian (which seems like a fairly common case), then the idempotency optimization should apply to arbitrary loop nests. Under these assumptions, I think that
is equivalent to
So we can always assume that x[i] has been "maxed with the default value". This requires analyzing initialization values, so I'm not sure if its worth it per se. But, I think that it avoids reasoning about "current values'"? On warnings, I lean towards more warnings whenever the performance is big-O worse than might be reasonably expected, especially because the user can always turn on fast mode to remove them. So, whatever version of these warnings is feasible would be helpful. To avoid over-warning for dense kernels, you could just detect when the inputs are all dense and screen warnings out in this case. On the docs front, this seems like a good addition to the performance section. In general, a small section on operations/situations which result in densification might be useful, similar to the section on concordant iteration. Also, small note, the docs on Thanks again! |
Two cents on dense detection: if a sparse level is used, we could look at all indices used to write to it and then ask if they correspond to loops in the generated code that have work at least equal to the size of that level? We could probably write a pretty naive way of computing the work that assigns while loops O(1) and for loops O(n). |
At the very least, if all the inputs are sparse, we could replace them with their default and double check that the code is a no-op? We might also run the same check when any input is sparse, rather than just when they all are. What happens when we put two statements in a loop, and one of the statements is dense while the other is sparse? Perhaps the appropriate check is to simply have |
In my project, I'm using the instance AST to programmatically build a finch kernel. But, I've been running into an issue where it sometimes needlessly densifies fibers. The following code should reproduce the issue where
output_fiber
is fully dense even though it should just be a copy of X which is sparse. I'm probably just not building the AST correctly, but it seems like an issue of it not recognizing the idempotency ofoverwrite
.The text was updated successfully, but these errors were encountered: