Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional profiling #1137

Open
adrian-lison opened this issue Jan 16, 2023 · 5 comments
Open

Optional profiling #1137

adrian-lison opened this issue Jan 16, 2023 · 5 comments
Labels

Comments

@adrian-lison
Copy link

adrian-lison commented Jan 16, 2023

Profiling is a great tool for debugging and performance optimization, but it also adds a bit of overhead to the execution of the Stan program. So it would be nice to skip the profiling statements whenever you don't need them and want the model to run as fast as possible - without the need to manually remove them from the stan program or to always keep two version of the same program (one with and one without profiling statements).

Ideal solution
Ideally, there would be an additonal profile = FALSE/TRUE argument to the sample function of a compiled stan model that would be passed on to CmdStan.

Workaround
As a workaround, one can automatically create a temporary copy of the stan program with all profiling statements removed, compile this temporary copy, and sample from this model without profiling. This requires no changes to CmdStan. The disadvantage is of course that this requires a second compiling of the stan model.

I have implemented this workaround as a wrapper of cmdstan_model in this repo. We have been using it during the development work for epinowcast and it has been quite robust. It also supports included files. It would also be possible to use this workaround for a profile = FALSE/TRUE argument in the sample function.

Context
Regarding the ideal solution of integrating this functionality with CmdStan, I lack the expertise to judge how difficult that would be and probably could not do it myself. If people think a workaround like above is acceptable, I'm happy to help implementing it for cmdstanr, but would need some guidance (and a few details still need to be sorted out).

@andrjohns
Copy link
Contributor

As you've identified, this is a change that would have to be made in cmdstan (but I also see the benefit!). I'll transfer this issue to the cmdstan repo so that you can get feedback from the right people!

@andrjohns andrjohns transferred this issue from stan-dev/cmdstanr Jan 17, 2023
@bob-carpenter
Copy link
Contributor

Hi, @adrian-lison. Is there a performance hit in Stan runs that don't include profiling statements, and if so, how much? If it's beyond a couple percent, then I'd be in favor of only conditionally including anything to do with profiling.

@WardBrian
Copy link
Member

@bob-carpenter profiling is done with an object that uses the constructor and destructor to registers some callbacks on the autodiff stack for timing, so it should be 0 cost if you don't use it (minus the cost of creating an empty std::map at model class initialization).

@adrian-lison do you have an example model where profiling itself is a non trivial overhead compared to the evaluation of the density itself? It seems like it should be very cheap, glancing at the code, but I'm happy to be proven wrong and look into this feature request more.

@adrian-lison
Copy link
Author

Hey @WardBrian, thanks for your reply. I would agree that in many cases, the overhead is probably negligible compared to the density evaluation. It should only become an issue when lots of profiling callbacks are made. A very contrived example would be the following linear regression model.

data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
}
transformed parameters {
  vector[N] mu;
  for (i in 1:N) {
    profile("predictor") { // comment out for no profiling
      mu[i] = alpha + beta * x[i];
    }
  }
}
model {
  alpha ~ std_normal();
  beta ~ std_normal();
  sigma ~ std_normal(); // half-normal by constraint
  y ~ normal(mu, sigma);
}

With this model, I get an approx. 5-6x runtime increase per chain when profiling (tested for 1000, 2500 and 5000 data points). Obviously, this model implementation is very stupid. The question is if there are many realistic scenarios in which people would be forced to use a lengthy loop and want to profile a specific subsection of that loop. This could then be a relevant case where the profiling overhead starts to matter.

One potential example from my domain is infectious disease modeling, where we often require loops to model infection and ascertainment processes, e.g. here, and if the individual computations within the loop become more complicated, it might be interesting to profile them (we haven't specifically assessed profiling overhead there however). I also wonder how profiling within map(-reduce) function calls would play out.

@WardBrian
Copy link
Member

I see two potential solutions to this:

  1. Wrap the body of the constructor of the profile class and the entire destructor in an #ifdef
  2. Add a flag to the Stan compiler to skip the emitting of the profile objects entirely.

The first seems more in line with how we currently do things (e.g. STAN_NO_RANGE_CHECKS) however this also necessitates some makefile hackery to try to get the file to re-build if you change the setting, and it does not always work. You can see this now with something like STAN_THREADS

$ make clean 
$ make examples/bernoulli/bernoulli 
# builds
$ make STAN_THREADS=true examples/bernoulli/bernoulli 
# triggers a re-build
$ make examples/bernoulli/bernoulli
make: 'examples/bernoulli/bernoulli' is up to date.

A compiler option would skip the object creation entirely, and in theory avoids this re-compilation issue (which can actually be even more subtle than the above if make clean isn't run first), but in practice stanc is not re-run if the stancflags change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants