-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding back in rvalue checks on 1d indexes #849
Conversation
Looks like @rok-cesnovar has a reproducible example here: #656 (comment) Let me start a couple cmdstan tarball builds and I'll compare:
|
Alright the timings are (for N = 10000 in the model above and using the same seed and verifying the last samples are the same):
And then:
So this change looks like less of a performance hit now, though it looks like everything overall is slower but that's a different issue. The binaries I'm using are: |
Less of performance hit? This PR is the slowest of all. Can we make the checks optional with some compiler or stanc3 flag? |
It's 0.3s slower instead of 0.6s slower, so in terms of performance hit this is less. As far as future performance stuff goes, there are probably a few options, but those things can come in the future. This segfault behavior is not good and we can fix it now easily. Any performance optimization stuff can stack onto this in the future (as more flags, patches to Math, etc. etc). |
I'm 50/50 on this. Why don't we wait for @rok-cesnovar 's cleanup of rvalue and assign before doing this? @wds15 I think we should just use the |
Does the rvalue cleanup fix this? I was looking at the generated code and it doesn't seem like it to me (for instance here). If it does we can wait, but otherwise better to just get this issue out of the way and then the focus can be on speeding things up and there's only one thing to worry about (rather than having performance and a segfault hanging over out heads -- just performance, which is sort of business as usual). If there's a change needed in the rvalue refactor pull, it will presumably be like one or two lines (there's only 5 lines of new code here and the rest is changed generated code). |
No, just cleans up thing a bit. I think the order in which we merge does not make a difference. I agree that the baseline should be no-segfault, then worry about performance. This same check was present in stanc2. I propose merging this and working on the no-debug stuff. @SteveBronder this would have to be a stancflag or do you propose avoiding the range check based on C++ compiler flags? I am fine with the C++ flag since its already "a thing" in other cases. I was debating this with @t4c1 at the time of the revert and I think the conclusion was there is no performant way of doing this for std::vector. |
Yeah something that comes in at the interface level could address this in the future. Like |
Cool cool we can add the NDEBUG thing |
Also fixes #731 |
Just a user comment, the no-debug flag to recover the performance is critical to our use cases as Stan users. We have profiled a ~15% slowdown due to these bounds checks in the older version (in models that otherwise are fully performance optimized). Having access to the "production" flag to turns these checks off is very important to us. |
Maybe open an issue for that? |
Release notes
This is a revert of #656 to put the fix in #521 back in to fixes #489, fixes #776, fixes #847, fixes #731.
The revert in #656 was cause there was a substantial performance hit (like 30%) of including this fix and we were in code freeze and the simplest thing was revert.
I think it would be good to put this fix back in even without an immediate fix to the performance thing. We can re-evaluate the performance thing (lots of stuff has changed between error checks and indexing) and fix that later, but at least then the segfaults are out of the way.
Copyright and Licensing
By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)