Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file_refit = "on_change" will work weirdly when renaming factor levels #1128

Closed
martinmodrak opened this issue Mar 22, 2021 · 7 comments
Closed
Labels
Milestone

Comments

@martinmodrak
Copy link
Contributor

This is a minor issue I realized about the current implementation of file_refit = "on_change", not sure what to do about it, but posting here for discussion/future reference. If the only change I do to my data is to rename factor levels (while keeping the order the same),the data as passed to Stan will not change, neither will the model code change. So this means that an old fit will be loaded. However, this old fit could break subsequent code as the names of the parameters exposed by the fit will have changed.

@paul-buerkner
Copy link
Owner

I see. Could it make sense to also compare the processed data which each other, that is, fit$data with the new output of validate_data?

@paul-buerkner paul-buerkner added this to the brms 2.15.0++ milestone Mar 24, 2021
@paul-buerkner
Copy link
Owner

Currently, we check the Stan data via all.equal(sdata, cached_sdata, check.attributes = FALSE). I think setting check.attributes = TRUE should fix your problem. Is there, in your opinion, a downside of also checking attributes?

@hsbadr
Copy link
Contributor

hsbadr commented Jun 13, 2021

Is there, in your opinion, a downside of also checking attributes?

Depending on the data type and how it's created (and even the environment), checking the attributes can unnecessarily trigger a refit. For example, reading a data from CSV file as a data frame, data table, or tibble could change the attributes; same for how the factors are created (if using a different function that store extra attributes). Probably it's safer to add checks for the factor variables (levels and reference level).

@paul-buerkner
Copy link
Owner

Good points, thank you!

@martinmodrak
Copy link
Contributor Author

@hsbadr is right. Actually, I did add the check.attributes = FALSE right after I saw some unnecessary refits, I don't remember exactly what was the cause. I currently think the problem is that maybe we are caching at the wrong level of abstraction (the underlying stan fit is the same, but we store all sorts of stuff that could change). But we probably don't want to change the format how we store the fits... Maybe the correct solution is to have backup the original parameter names when doing rename_pars, so that one could load the fit, map the names back using the stored version and then map them again....

@hsbadr
Copy link
Contributor

hsbadr commented Jun 18, 2021

Take a look at waldo::compare(), which is inspired by all.equal() and generate actionable insights by:

  • Ordering the differences from most important to least important.
  • Displaying the values of atomic vectors that are actually different.
  • Carefully using colour to emphasise changes (while still being readable when colour isn’t available).
  • Using R code (not a text description) to show where differences arise.
  • Where possible, comparing elements by name, rather than by position.
  • Erring on the side of producing too much output, rather than too little.

@paul-buerkner
Copy link
Owner

Factor levels should now be checked as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants