-
-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ode gradients with respect to t0 (Issue #1833) #1834
Conversation
Thanks for catching this! I can review this later. |
@wds15, @yizhang-yiz, if the tests don't just pass on the first try, we've probably missed this release, but if you get a chance to review this today that'd be nice. |
This is a bugfix so it doesnt fall under the feature freeze. Still better to have it in the release candidate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we need to sort out the sign thing. In any case, many thanks for catching this. My bad.
@@ -133,6 +141,10 @@ struct coupled_ode_observer { | |||
} | |||
} | |||
|
|||
if (!is_constant_all<T_t0>::value) { | |||
ops_partials.edge3_.partials_[0] = -f_y0_t0_[j]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this be equal to minus f_y0_t0_[j]? As I understand, the initial time-point is no different than the remaining time-points such that the derivative wrt to t0 is just f_y0_t0. I mean, in this version we would have at t0 the gradient equal to -f_y0_t0_[j] and just an epsilon later the sign would flip if the user would request at t0+epsilon a solution of the ODE.
So please change to f_y0_t0_[j] unless I am overlooking something here.
I am also not too happy with the variable name as this is called dy_dt usually. Maybe initial_dy_dt_ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this be equal to minus f_y0_t0_[j]?
Imagine an ode solve with only an initial point and a final point. As an integral, if the final point moves right, the value of the integral gets larger. If the initial point moves right, that integral gets smaller (or the thing here: https://en.wikipedia.org/wiki/Leibniz_integral_rule).
as this is called dy_dt usually
Yeah I was working through making the odes variadic I never found naming that I was super happy with.
dy_dt is just f, by definition, so it's annoying to type the fully dy_dt. I kinda liked f_y0_t0 cause it's evaluated at the initial points (still really painful to type).
I'm happy to go initial_dy_dt. Doesn't matter that much to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but at t0+epsilon the gradient would flip sign. That is not right. Either we have a bigger problem (and the gradients are wrong for the other time-points as well) or this PR is wrong, but I don't think that both are right.
As you are suggesting that your version is the right one, I would need to to dive into it a bit.
Let's keep this as a bugfix which we can merge during this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but at t0+epsilon the gradient would flip sign.
The derivative with respect to the lower limit of the integral will still have that negative sign. Write out the integral for something like cos(t)
from t = l
to t = r
and then differentiate that integral with respect to l
and r
and see the difference.
There's finite difference code here if you want to check with that: #1833 . I wrote this out to make sure I wasn't too off track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure... I just need a calm moment tomorrow to write this down. It's against my intuition, so I won't approve for just now.
If someone else is confident this is right, then I withdraw my review.
(I guess you are right, I just want to follow along).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah no it's fine it's confusing lol. Tomorrow is good for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Confusing, but you are right. The odd thing is that the initial is constant here while the initial time varies.
Can you add the test catching the size mis match thing below? Then this is fine to go in.
@@ -221,7 +221,7 @@ TEST_F(StanRevOde, observe_states_ddvd) { | |||
EXPECT_FLOAT_EQ(ys_coupled[t][n], y[t][n].val()); | |||
for (size_t n = 0; n < 2; n++) { | |||
y[t][n].grad(); | |||
EXPECT_FLOAT_EQ(0.0, t0.adj()); | |||
EXPECT_FLOAT_EQ(-y0[n], t0.adj()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here as well, the test expectation is y0[n], I think.
if (!is_constant_all<T_t0>::value) { | ||
f_y0_t0_ | ||
= f(value_of(t0), value_of(y0), value_of(theta_), x_, x_int_, msgs_); | ||
check_size_match("coupled_ode_observer", "dy_dt", f_y0_t0_.size(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - we need the check here. Is this check triggered somewhere in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but I can add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checks added!
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
Fixes #1833
Tests
There were tests in place, but either:
forced_harm_osc_ode_fun
which actually had zero gradientsSo not setting the gradients was accidentally making these things pass. I changed the tests to use
harm_osc_ode_fun
where one of the gradients isn't zero, and I updated the checks to look for that.Release notes
Fixed problem where ode solvers were not computing the gradients of the output with respect to the initial integration time. This affected the rk45, adams, and bdf solvers.
Checklist
Math issue ODE gradients with respect to t0 not working correctly #1833
Copyright holder: Columbia University
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit
)make test-headers
)make test-math-dependencies
)make doxygen
)make cpplint
)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested