-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serious bug in HillClimbingEstimator? Causing only first iteration to matter #13
Comments
Hi there, thanks for the report. There's definitely something fishy going on with the estimator. It seems like I was trying to implement some kind of early stopping: theta values are checked from left to right and, when the log-likelihood of these values went up, but then started going down, the next candidate theta values didn't need to be checked. However, the implementation seemed to be wrong. Also, the new intervals I generated in subsequent rounds were also wrong, as you pointed out. All of this made the estimator not be as precise as it can be. I decided to remove the early stopping idea and implemented something a little more straightforward. After each iteration, the new candidate theta values to be checked are generated around the current best theta, but in smaller intervals. I added some comments in the code and used more descriptive variable names, since it has been a long time since I last worked in this code. I committed a possible fix to the testing branch. Take a look if it makes sense to you. The lines you were interested in are now here. An example verbose output from the hill climbing estimator can be seen below. Values with larger log-likelihood are printed and we can now see that subsequent passes do generate better thetas.
|
Thanks for your quick response! I think the way you did it previously, going...
seemed like a good, slightly more efficient strategy, given that the likelihood is concave (at least for 2PL model, off-the-cuff I think this is the case for 3 and 4PL? Could be very wrong about this?). Once the next candidate has a lower likelihood, then the likelihood has started going down, so the rest will also be lower (and don't need to be checked). As long as the best theta is reset each round it should be okay (since it will pass over the previously best theta with increasing granularity). |
I see what you mean. If I reset
The thing is, I never assumed this condition. If you have observed it, it was there by coincidence. I also didn't want to reset I decided to just explicitly keep the value of the current log-likelihood ( The implementation of early stopping is here. I can think of ways to do it better, like search once to the left and once to the right of the current best theta, in order to know which side we actually have to search. This can cut the search space in half and it's not hard to implement. |
I'll close this as it appears the issue has been fixed. If you find other anomalies, please report them and I'll try to fix them as quickly as possible. We can continue discussion on this bug here, nevertheless. |
Thanks for the fix! That's fair that there could be some small improvements. I guess knowing whether the crnt best theta is to the left or right of the best theta out of the crnt round's values contains a lot of information for limiting the search range. As is, is super helpful--thanks again for being so responsive! |
Hey, I think that oftentimes the HillClimbingEstimator only does 1 round of estimation (rather than 10, iteratively more precise rounds). The issue is...
I need to patch within the next couple days on DataCamp/catsim, so can submit a PR to fix!
Solution
Code here:
https://github.com/douglasrizzo/catsim/blob/master/catsim/estimation.py#L138-L139
seems like the solution is that max_ll needs to be reset each round?
The text was updated successfully, but these errors were encountered: