Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ode gradients with respect to t0 (Issue #1833) #1834

Merged
merged 6 commits into from
Apr 15, 2020

Conversation

bbbales2
Copy link
Member

Fixes #1833

Tests

There were tests in place, but either:

  1. They were testing against forced_harm_osc_ode_fun which actually had zero gradients
  2. Or they were testing that the gradients were zero

So not setting the gradients was accidentally making these things pass. I changed the tests to use harm_osc_ode_fun where one of the gradients isn't zero, and I updated the checks to look for that.

Release notes

Fixed problem where ode solvers were not computing the gradients of the output with respect to the initial integration time. This affected the rk45, adams, and bdf solvers.

Checklist

  • Math issue ODE gradients with respect to t0 not working correctly #1833

  • Copyright holder: Columbia University

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@wds15
Copy link
Contributor

wds15 commented Apr 13, 2020

Thanks for catching this! I can review this later.

@bbbales2
Copy link
Member Author

@wds15, @yizhang-yiz, if the tests don't just pass on the first try, we've probably missed this release, but if you get a chance to review this today that'd be nice.

@rok-cesnovar
Copy link
Member

This is a bugfix so it doesnt fall under the feature freeze. Still better to have it in the release candidate.

Copy link
Contributor

@wds15 wds15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we need to sort out the sign thing. In any case, many thanks for catching this. My bad.

@@ -133,6 +141,10 @@ struct coupled_ode_observer {
}
}

if (!is_constant_all<T_t0>::value) {
ops_partials.edge3_.partials_[0] = -f_y0_t0_[j];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be equal to minus f_y0_t0_[j]? As I understand, the initial time-point is no different than the remaining time-points such that the derivative wrt to t0 is just f_y0_t0. I mean, in this version we would have at t0 the gradient equal to -f_y0_t0_[j] and just an epsilon later the sign would flip if the user would request at t0+epsilon a solution of the ODE.

So please change to f_y0_t0_[j] unless I am overlooking something here.

I am also not too happy with the variable name as this is called dy_dt usually. Maybe initial_dy_dt_ ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be equal to minus f_y0_t0_[j]?

Imagine an ode solve with only an initial point and a final point. As an integral, if the final point moves right, the value of the integral gets larger. If the initial point moves right, that integral gets smaller (or the thing here: https://en.wikipedia.org/wiki/Leibniz_integral_rule).

as this is called dy_dt usually

Yeah I was working through making the odes variadic I never found naming that I was super happy with.

dy_dt is just f, by definition, so it's annoying to type the fully dy_dt. I kinda liked f_y0_t0 cause it's evaluated at the initial points (still really painful to type).

I'm happy to go initial_dy_dt. Doesn't matter that much to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but at t0+epsilon the gradient would flip sign. That is not right. Either we have a bigger problem (and the gradients are wrong for the other time-points as well) or this PR is wrong, but I don't think that both are right.

As you are suggesting that your version is the right one, I would need to to dive into it a bit.

Let's keep this as a bugfix which we can merge during this week.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but at t0+epsilon the gradient would flip sign.

The derivative with respect to the lower limit of the integral will still have that negative sign. Write out the integral for something like cos(t) from t = l to t = r and then differentiate that integral with respect to l and r and see the difference.

There's finite difference code here if you want to check with that: #1833 . I wrote this out to make sure I wasn't too off track.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure... I just need a calm moment tomorrow to write this down. It's against my intuition, so I won't approve for just now.

If someone else is confident this is right, then I withdraw my review.

(I guess you are right, I just want to follow along).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah no it's fine it's confusing lol. Tomorrow is good for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Confusing, but you are right. The odd thing is that the initial is constant here while the initial time varies.

Can you add the test catching the size mis match thing below? Then this is fine to go in.

@@ -221,7 +221,7 @@ TEST_F(StanRevOde, observe_states_ddvd) {
EXPECT_FLOAT_EQ(ys_coupled[t][n], y[t][n].val());
for (size_t n = 0; n < 2; n++) {
y[t][n].grad();
EXPECT_FLOAT_EQ(0.0, t0.adj());
EXPECT_FLOAT_EQ(-y0[n], t0.adj());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here as well, the test expectation is y0[n], I think.

if (!is_constant_all<T_t0>::value) {
f_y0_t0_
= f(value_of(t0), value_of(y0), value_of(theta_), x_, x_int_, msgs_);
check_size_match("coupled_ode_observer", "dy_dt", f_y0_t0_.size(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - we need the check here. Is this check triggered somewhere in the tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I can add that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checks added!

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.8 4.84 0.99 -0.78% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -2.27% slower
eight_schools/eight_schools.stan 0.09 0.09 1.0 0.48% faster
gp_regr/gp_regr.stan 0.22 0.22 1.0 -0.18% slower
irt_2pl/irt_2pl.stan 6.47 6.48 1.0 -0.14% slower
performance.compilation 89.48 87.17 1.03 2.58% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.52 7.53 1.0 -0.13% slower
pkpd/one_comp_mm_elim_abs.stan 21.33 20.33 1.05 4.68% faster
sir/sir.stan 93.68 94.34 0.99 -0.7% slower
gp_regr/gen_gp_data.stan 0.05 0.05 0.99 -1.23% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.95 2.95 1.0 -0.03% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.3 1.01 0.93% faster
arK/arK.stan 1.74 1.74 1.0 0.11% faster
arma/arma.stan 0.66 0.65 1.01 1.39% faster
garch/garch.stan 0.51 0.51 1.01 0.57% faster
Mean result: 1.00378766533

Jenkins Console Log
Blue Ocean
Commit hash: e92cc41


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@wds15 wds15 merged commit cbcc81d into develop Apr 15, 2020
@bbbales2 bbbales2 mentioned this pull request May 8, 2021
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ODE gradients with respect to t0 not working correctly
4 participants