-
-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If model evaluates at initial conditions without autodiff but fails with autodiff then the error messages are not great #3034
Comments
Yea, so it seems that the current code assumes that if computing log probability succeeds then also computing the gradient will succeed. With ODEs however, it is possible that log_prob succeeds for a given max_num_steps, but to compute the gradient, an extended system has to be solved and it fails. There could be also other models that have this sort of behaviour. |
Yea, so it seems that the current code assumes that if computing log probability succeeds then also computing the gradient will succeed. With ODEs however, it is possible that log_prob succeeds for a given max_num_steps, but to compute the gradient, an extended system has to be solved and it fails. There could be also other models that have this sort of behaviour.
The code does not make this assumption.
As detailed in https://github.com/stan-dev/stan/blob/develop/src/stan/services/util/initialize.hpp the initialization begins by checking the double evaluation of log_prob (line 128). The "Error evaluating the log probability at the initial value” message is passed after the double evaluation but before the gradient is considered, which means that the double evaluation alone is failing in @jtimonen ’s model (I’m guessing the replication of the message is due to four chains all dumping the same error). The autodiff evaluation of log_prob is then checked (line 166) in the same loop, and if the evaluation fails then the message "Gradient evaluated at the initial value is not finite.” is passed.
Again because "Error evaluating the log probability at the initial value” is passed to the interface here the initialization attempt is stopped _before the gradient is even considered_. The gradient is always checked but only after a double evaluation returns successfully.
|
It is throwing an error in log_prob_grad. Seems like it is printing the error twice. Not sure why that's happening. I modified the initialize.hpp code and added some stuff around line 160: std::cout << "meow" << std::endl << std::flush;
std::stringstream log_prob_msg;
std::vector<double> gradient;
auto start = std::chrono::steady_clock::now();
try {
// we evaluate this with propto=true since we're
// evaluating with autodiff variables
log_prob = stan::model::log_prob_grad<true, Jacobian>(
model, unconstrained, disc_vector, gradient, &log_prob_msg);
} catch (const std::exception& e) {
if (log_prob_msg.str().length() > 0)
logger.info(log_prob_msg);
logger.info(e.what());
throw;
}
std::cout << "woof" << std::endl << std::flush; Output is:
|
You need to separate out loop iterations, for example by printing "num_init_tries". When "Error evaluating the log probability at the initial value." is passed to the logger the gradient evaluation isn't even attempted due to the "continue" statement on line 140. My guess is that the first attempt fails with a bad function evaluation and then on the second loop iteration the function evolution passes but then the gradient evolution fails which triggers the throw on line 172. If this is the case then all we need to do is add a logger message before 171 to clarify the failure of
|
We could throw a more informative error but the code will behave more robustly if we just initialize where gradients also work so it's possible to start running HMC. I believe that is the intention of this code (just find anywhere to start), and the difficulty of doing the ODE solves causes this problem. |
I believe that is the intention of this code (just find anywhere to start), and the difficulty of doing the ODE solves causes this problem.
The code is very clearly laid out and is symmetric between double and var evaluations. If the evaluation completes successfully and returns well-defined values then the initialization terminates successful. if it completes successfully and returns an ill-defined value then it repeats. If it throws an error (other than a domain error which only needs to be checked once since double and var evaluations have the same domain) then the initialization terminates unsuccessfully.
CVODES throws a non-domain exception when the integration of the sensitivity states fails, triggering the initialization to terminate as expected from the current implementation.
Changing the initialization logic to retry on var-evaluation exceptions requires a more careful discussion. I recommend closing this issue and opening a new issue on the proper topic.
|
I don't know what symmetry means here. If you mean it's the same checks between the tries on line 124 and 163, they are not. The try on 163 does not catch domain errors and continue running. Any exception it catches it re-throws.
It's a domain exception here. You can change the code to: auto start = std::chrono::steady_clock::now();
try {
// we evaluate this with propto=true since we're
// evaluating with autodiff variables
log_prob = stan::model::log_prob_grad<true, Jacobian>(
model, unconstrained, disc_vector, gradient, &log_prob_msg);
} catch (const std::domain_error& e) {
std::cout << "It's a domain error" << std::endl;
if (log_prob_msg.str().length() > 0)
logger.info(log_prob_msg);
logger.info(e.what());
throw;
} catch (const std::exception& e) {
if (log_prob_msg.str().length() > 0)
logger.info(log_prob_msg);
logger.info(e.what());
throw;
}
auto end = std::chrono::steady_clock::now(); And you'll see in the output when this problem comes up:
Anyway, this is the problem that I want to fix. These are recoverable errors and we should be recovering from them in the same way as we do already. We should pick an initialization for the sampler where it is possible to both evaluate the log density and the gradients. Right now we do not check that we are able to evaluate the gradients. I do not think that is correct considering we need the gradients to do the sampling. That is the change I want to make over here. |
Bump. @betanalpha this looks all good to me. The intent to error out when the chosen initial leads to a nan for the log-likelihood or it's gradient is sensible, since the sampler needs both in order to move. I am happy to review the PR for this so that we get this into the release. |
The intent to error out when the chosen initial leads to a nan for the log-likelihood or it's gradient is sensible, since the sampler needs both in order to move.
Yes, it is, which is why the code _has always checked both_.
Right now we do not check that we are able to evaluate the gradients. I do not think that is correct considering we need the gradients to do the sampling.
Yes, we do.
That is the change I want to make over here.
The description of this PR is incorrect.
Let me try to run through the code flow one more time:
For _each_ of the 100 initialization attempts we check:
1. That the initial values are consistent with the parameter constraints.
If the failure is due to a domain error on the init var_context then a new attempt is made (L 149).
Otherwise the initialization fails immediately.
2. That we can evaluate log_prob for doubles at the initial value.
If the failure is due to a domain error then a new attempt is made (L 140).
Otherwise the initialization fails immediately.
3. That the returned log_prob is finite.
If it’s not finite then a new attempt is made (L 158).
4. That we can evaluate log_prob_grad for vars at the initial values.
If there is any failure in the evaluation of log_prob_grad then the initialization fails immediately.
5. If the Euclidean norm of the gradient is finite.
If it’s not finite then the initialization fails when it reaches the end of the loop body.
6. If all checks pass then the function returns before the end of the loop body (L 211).
Initialization returns successfully _only_ if the both the value and gradient of the log target density can be evaluated.
All of this happens in one loop and, again, the double and var evaluations are checked for _each_ attempt.
There are certainly a few changes that can be considered.
1. Adding a logger message when the gradient evaluation throws an exception beyond the log_prob message (L 170) and the exception message (L 171). Pull #3035 does not address this but instead tries to completely change the initialization evaluation logic.
2. Attempting new initializations under some gradient evaluation failures, namely domain errors. As I said before given the unforeseen behavior of CVODES errors when the initialization code was written this is reasonable but goes beyond the scope of this issue.
|
Summary:
For instance, with an ODE model @jtimonen found output like this:
We would expect an error message like:
Here is a model that exhibits this behavior at certain seeds (model below):
Json data:
Current Version:
v2.26.1
The text was updated successfully, but these errors were encountered: