Replies: 30 comments
-
Hi @Bike, For older or more complex rules we also have https://github.com/EnzymeAD/Enzyme/blob/main/enzyme/Enzyme/AdjointGenerator.h, but they might be a lot harder to understand. |
Beta Was this translation helpful? Give feedback.
-
If I recall correctly that is there for correctness around edge points. Incidentally we may now have a better way of handling this now (the checkeddiv inst), which is conditionally activated upon a flag saying that is necessary. As Manuel mentioned our derivative rules for things like this are simply specified in that file, which if you'd like to get into Enzyme dev, wouldn't be hard to open a PR and update! |
Beta Was this translation helpful? Give feedback.
-
Also that test which you mention is actually the derivative generated by tapenade, another AD tool, which also has similar handling of these points. |
Beta Was this translation helpful? Give feedback.
-
@Bike some of the reasoning behind this is outlined in this paper: https://proceedings.neurips.cc/paper/2020/file/4aaa76178f8567e05c8e8295c96171d8-Paper.pdf |
Beta Was this translation helpful? Give feedback.
-
@tgymnich Thank you, I will read. I suspect that the considerations for ML are not so relevant for us (we're doing physics simulation) but I'm glad to know the ideas. @ZuseZ4 @wsmoses Thanks for the tip. I patched this code to see how things would go without the zero test, and after optimization it ends up just as fast as my manual version. I assume Enzyme would not accept a PR to remove the test as that would result in bad behavior on the edge. As a more configurable solution for us, is it perhaps possible to manually specify the derivative of a function? Then we could have our code use |
Beta Was this translation helpful? Give feedback.
-
@Bike This functionality could be introduced behind some kind of flag. Given that the changes are not too invasive and some notion of completeness (what about FDiv and so on). |
Beta Was this translation helpful? Give feedback.
-
@tgymnich we already have a flag already within the checkedMul/checkedDiv, which I presume we could just use here. |
Beta Was this translation helpful? Give feedback.
-
Since the problem with sqrt is also due to division by zero, could the sqrt pattern just be changed to use CheckedDiv, rather than using the flag separately? There is a slight difference in that the existing sqrt pattern uses a ueq comparison rather than CheckedDiv's oeq - I don't know if that's important, depends on what a NaN input should do. I don't know how to enable or disable this EnzymeStrongZero flag, but I could try patching this and submitting a PR. |
Beta Was this translation helpful? Give feedback.
-
@Bike you can enable the flag depending on the pass manager you use like so:
|
Beta Was this translation helpful? Give feedback.
-
Just a note that turning on trapping floating point is an important debugging tool so I'd prefer not to see fp exceptions for code that is considered unexceptional. I don't know if this case is, but the way it's being talked about up above suggests maybe. |
Beta Was this translation helpful? Give feedback.
-
As a side note that you may be interested in @jedbrown a few weeks back we added support for running custom code on all the intermediate derivative values. For example, performing a nan-check (and then getting a backtrace to what triggered then nan). |
Beta Was this translation helpful? Give feedback.
-
Cool, is that for forward or reverse or both? |
Beta Was this translation helpful? Give feedback.
-
Both! Here's the (undocumented) Julia flag for enabling the nan checker as an example: https://github.com/EnzymeAD/Enzyme.jl/blob/02715a8bbac185342fe427f0090a8893d3a8af1d/src/compiler.jl#L5854 If you have other use cases and/or want to play with (from whatever input language, lmk) |
Beta Was this translation helpful? Give feedback.
-
@tgymnich Oh, thanks for the tip. I'd love to be able to use the new pass manager. But I'm a bit confused - the FAQ linked in that other issue (https://enzyme.mit.edu/getting_started/Faq/#opt-cant-find--enzyme-option) says the opposite, that |
Beta Was this translation helpful? Give feedback.
-
@Bike uou should use ClangEnzyme not LLVMEnzyme for loading into clang |
Beta Was this translation helpful? Give feedback.
-
@Bike my bad I mixed up the pass managers. It should be the other way around. |
Beta Was this translation helpful? Give feedback.
-
Thank you, I got it working. The |
Beta Was this translation helpful? Give feedback.
-
I must not be understanding CheckedDiv. I have InstructionDerivatives set up so that sqrt does a CheckedDiv instead of an FDiv (or manual FCmp). I thought that if I passed the |
Beta Was this translation helpful? Give feedback.
-
Okay - figured it out - checkedDiv does |
Beta Was this translation helpful? Give feedback.
-
@Bike |
Beta Was this translation helpful? Give feedback.
-
So the reason checkedmul/checkeddiv operate as they do is for a distinct reason: For example, res = x * infinity would lead to dres = dx * infinity. If dx was zero as we didn't want its derivative (or other activity reasons), we would instead sadly compute the total derivative of nan, instead of the intended derivative of zero. Sqrt(x) is a weird point, and at this point I don't remember the origin of the current behavior. Tapenade has the same behavior so I looked at their source code to find:
The checkedMul/Div have strong motivations to me, and both the numerator and divisor checks resolve the nan issue (albeit in different ways) at the indeterminite point. I don't recall the historical justification for the divisor condition, so I'll tag @martinjm97 and @jhueckelheim @jedbrown if they have thoughts that may be helpful. |
Beta Was this translation helpful? Give feedback.
-
Talking with @martinjm97 so one case that the old behavior solves is:
Clearly this is just equivalent to 0 [say via finite diff/etc], but even with the checkedMul (using forward mode):
|
Beta Was this translation helpful? Give feedback.
-
Alright, I misunderstood what y'all meant about the checkedDiv flag then. ah well. The paper linked earlier mentions sqrt but only to say that PyTorch and TensorFlow treat it conservatively (meaning the derivative at zero is inf). But I don't see it mention why you'd want it to be zero instead. |
Beta Was this translation helpful? Give feedback.
-
Yeah @martinjm97 just had a long conversation about this (he'll post shortly), but the tl;dr is that it is required to get the correct behavior when differentiating |
Beta Was this translation helpful? Give feedback.
-
Hi @Bike, Happy to hop in and give some thoughts. So I believe three options under consideration are: I think different approaches are reasonable depending on programmer intent: I think there are cases where each of these make sense: I guess this isn't much of an answer, but more of a choose your own adventure, but I don't yet see an ideal resolution. I'm personally partial (pun intended) to (3). I think of |
Beta Was this translation helpful? Give feedback.
-
I feel a bit uncomfortable about this from the NeurIPS paper shared above: So what happens if we apply this new rule to |
Beta Was this translation helpful? Give feedback.
-
The theory appears solid, its FP math that is lacking. Under floating point math e.g. without |
Beta Was this translation helpful? Give feedback.
-
Nothing puts @simonbyrne at ease like code that is only correct with |
Beta Was this translation helpful? Give feedback.
-
@martinjm97 brought me into this thread, and I would like to make small clarifications about the NeurIPS'20 paper discussed above, which I co-authored. I don't fully understand the context of this thread, but I hope my clarifications could be helpful to the entire discussion. Please let me know if you have any other questions or need any further clarifications :) The main messages of the paper are the following:
The following are clarifications on some of the above discussion.
This program represents the mathematical function
I would like to note that it is unavoidable to prevent AD from being silently wrong, regardless of the "derivatives" that AD uses for non-differentiable functions (e.g., sqrt or relu). A representative example is Here I am not arguing that AD should use an intensional derivative for each primitive function (such as |
Beta Was this translation helpful? Give feedback.
-
Thanks @wonyeol for your informative comment. |
Beta Was this translation helpful? Give feedback.
-
My group is getting started trying out Enzyme, and I noticed some unintuitive behavior on square roots: the derivative of
sqrt
at zero seems to be specially computed as zero.__enzyme_autodiff(my_sqrt, x)
ends up with an fcmp and select to essentially do(x = 0.0) ? 0.0 : 1/(sqrt(2.0*x))
, instead of what I expected,1/(sqrt(2.0*x))
.This is, unless I'm missing something, mathematically incorrect - the derivative at zero should be undefined as the function is not continuous there (at least on reals), or positive infinity if you take the limit from above. But I'm less concerned about that and more about the decrease in efficiency from the branch. It seems to slow things around 40% in a very quick benchmark, versus the manual
1/sqrt(2.0*x)
. Here's the explorer showing a ucomisd -> je sequence.I am not sure that this is a bug since it appears to be a deliberate design decision, with similar logic apparent e.g. in this benchmark. But I am at least interested in knowing why it was designed this way, and whether there is any way to configure this away. I couldn't find any mention of this behavior on the website, the arxiv preprint, or the code. We would like to use Enzyme rather than the symbolic differentiation we are doing now but this is a noticeable decrease in performance for us.
Beta Was this translation helpful? Give feedback.
All reactions