Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSSION] -- Change defaults of draws and tune in pm.sample? #3854

Closed
AlexAndorra opened this issue Mar 25, 2020 · 21 comments
Closed

[DISCUSSION] -- Change defaults of draws and tune in pm.sample? #3854

AlexAndorra opened this issue Mar 25, 2020 · 21 comments

Comments

@AlexAndorra
Copy link
Contributor

Hi guys!
From what I understood, tuning samples tend to be more important than draws in the development steps of the Bayesian framework. This is not reflected in the current defaults of draws and tune in pm.sample -- both default to 500.

So, why don't we change these to draws=1000 and tune=2000? These are usually good defaults that @fonnesbeck often advise, and it would drive home the point that, at least in development, one should care more about tuning samples than draws.

If you think this would be a sensible change, I would be happy to make a PR.
Stay home and stay healthy,
PyMCheers ✌️

@fonnesbeck
Copy link
Member

These seem like more sensible defaults for a wider range of models.

@ColCarroll
Copy link
Member

+1

My misgiving is I think one of pymc3's strengths is "time to first sample", especially with smaller models, and this hurts that. Conversely, this'll make everything take ~3x as long, but most small models take ~1s to fit, and there is a lot more we can do with 1k samples (and 2k warmup steps!)

@ColCarroll
Copy link
Member

As a corollary, I think we should use some of the work @eigenfoo and @dfm did for mass matrix adaptation, and at least use an expanding window to estimate the posterior variance. I have some experiments I was doing with 1000 tuning steps to figure out a good set of default parameters, and I will see if I can dig that up...

@junpenglao
Copy link
Member

I think that's sensible but we probably would need to specify the parameter for lots of tests (otherwise they might time out)

@AlexAndorra
Copy link
Contributor Author

Good to hear! I will start working on the PR right away. Changing the defaults seems quite easy. Playing with mass matrix adaptation and adapting tests will be harder but I'm happy to work on that under Colin's and Junpeng's supervision.

I agree with @ColCarroll that this increases "time to first sample", but:

  • Simple models won't even notice it, as you said, because they usually sample in a matter of seconds.
  • This could be a problem for bigger models, but I'd argue that it's precisely for these models that you need more draws and sample (even more than 1000 and 2000).
  • Currently, I hardly ever see pm.sample called with the default parameters, which is probably an indication that they should evolve.

@twiecki
Copy link
Member

twiecki commented Mar 27, 2020

Many models are totally fine with 500 tuning steps and give reasonable posterior estimations with 500 samples. Going from 500 to 2000 tuning seems like quite a dramatic shift and make the biggest downside of Bayes (slowness) even worse. I could see an argument for going to 1000 tuning and 1000 samples.

@AlexAndorra
Copy link
Contributor Author

  • I understand what you're saying, but even more models are totally fine with 2000 tuning steps and give better posterior estimates with 1000 samples. IMO changing the defaults would raise the threshold on the number of models you can run with just pm.sample(), which makes using PyMC easier -- on the Discourse I hardly ever see anyone using the defaults; but you often see pm.sample(10_000, tune=1000).
  • I think this change would have been dramatic 3-4 years ago, but it seems to me like modern samplers and computers are so much faster that you can sample most simple models in <3 seconds with parallel chains -- showing that you can get 12_000 samples (4*(1000+2000)) in under 3 seconds, I'd say it's an argument against Bayes being slow, not for.

@twiecki
Copy link
Member

twiecki commented Mar 27, 2020

@AlexAndorra No model I ever built samples that quickly. However, you do point to a valid argument:

Slow and complex models almost always require more tuning. Fast and simple models might be OK with 500 tuning steps, but in that case you don't really care if you're sampling 2000 instead because it's fast.

However, even for slow and complex models do I rarely need more than 1000 tuning steps.

Would like to hear @aseyboldt's perspective too.

@junpenglao
Copy link
Member

I am in favor of increasing tuning to 1000, and keeping samples at 500, or even tuning at 1500, samples at 500.
The opinion I personally hold for good default is short chain and more parallel chains (ie., if you want more samples, increase number of chain instead of number of samples)

@AlexAndorra
Copy link
Contributor Author

AlexAndorra commented Mar 27, 2020

Ha ha yeah I agree @twiecki: none of my real models sample that quickly either. For those, I always have to use more tuning steps and more chains. But that's my point: defaults don't really matter for these models -- they do for simple models however, and you perfectly summed up my argument :)
@junpenglao's suggestion (500 / 1500) sounds good to me -- achieves the same goals as my proposal but requires less samples 👌
Once we reach a consensus I'll be happy to update the PR.

@aseyboldt
Copy link
Member

aseyboldt commented Mar 27, 2020

I always felt both the tuning and draws default is a bit on the low side. One problem with only few draws is that the quality control only really works well if the number of draws isn't too small. It's quite possible for example to miss divergences when you only have 500 samples and two chains. If you have too few tuning steps the sampler should tell you that your model did not converge. If you have too few draws, the sampler might not tell you that, and we should care about that case much more.
I think a 1000/1000 default or maybe 500 tune / 1000 sample would be ok.
And maybe we should also default to at least 4 chains instead to the current 2.

@AlexAndorra
Copy link
Contributor Author

  • Interesting, thanks @aseyboldt! From the different comments, I think 1000 draws / 1000 tune or 1000 / 1500 would be appropriate defaults.
  • I have a preference for the second option because my assumption is that the overwhelming majority of beginners don't read / understand the sampler's warnings. So, we kind of have to be "libertarian paternalists" to nudge them -- and 1500 tune would prehemptively solve more issues than 1000, without dragging down speed.
  • I agree on the 4 chains default.

@aseyboldt
Copy link
Member

aseyboldt commented Mar 27, 2020 via email

@fonnesbeck
Copy link
Member

I don't think this is a terribly important decision because I would never ever recommend that users (new or experienced) call sample without explicitly passing both of these arguments.

@AlexAndorra
Copy link
Contributor Author

AlexAndorra commented Mar 27, 2020 via email

@twiecki
Copy link
Member

twiecki commented Mar 28, 2020

@fonnesbeck But that's the inference button (TM) ;).

I think 1000 tuning and samples at 500 is reasonable. With 2 cores you get 1000 and with 4 2000 which "should be enough for everyone".

@ColCarroll
Copy link
Member

After reading this, I'm changing my vote to be 1000 tune/1000 sample. I realized our warmup does ok after 500, and usually stops improving before 1000. (I'd like to change that, but we aren't there yet!)

I think spending more draws tuning than sampling feels too unintuitive to be a default.

@fonnesbeck
Copy link
Member

If we really want it to be an inference button, then it should be dynamic, making the choice of tuning according to the complexity of the model and the amount of data. Having a fixed value gives the impression that there is a rule.of thumb for the minimum number of draws, which of course there isn't.

@AlexAndorra
Copy link
Contributor Author

Well that would indeed be awesome! Is that even possible, in theory? Are there papers laying out this possibility?

Regarding the defaults, we entered territories where I'm no longer qualified to give my opinion. It seems like 500 draws / 1000 tune or 1000 / 1000 are the most popular. Once you reach a consensus I'll implement it.

And what about @aseyboldt's proposal to default to 4 chains instead of 2?

@twiecki
Copy link
Member

twiecki commented Mar 29, 2020

Let's do 1000 tune / 1000 samples with 2 chains.

@AlexAndorra
Copy link
Contributor Author

Noted, I'll amend the PR 👌

@twiecki twiecki closed this as completed Mar 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants