-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared Memory Curve #846
Shared Memory Curve #846
Conversation
As of 2a8f983, this is looking on track to resolve the register use in CUDA 11.3+. CUDA 11.6, SM_86,
I've set the circles benchmark going on a 3080 + titan v in CUDA 11.2 so we can see the performance loss in that benchmark from using global curve only as an intermediate step (will be a while before they complete). |
This has resulted in an
I've pushed to the
Chucking this through Narrowed this down to Patching this resolves the titan v memory issue, but just moves the problem to the invalid confiugraiton error in |
3d2e30f
to
69a6a39
Compare
Have updated I feel to enforce this consistency further, I will need to apply the same change to |
85d5e33
to
17a302e
Compare
As of fb6c322, have built ran test suite in all 3 main configs on Windows (Debug + Release with/without seatbelts). USE_GLM enabled too, in each case all tests pass. |
d7469d7
to
c790a59
Compare
Final 2 commits (denoted PRs commits are structured such that it shouldn't be squashed. |
CUDA 11.4 A100 circles benchmark runs going for the last 3 commits (plus an alt version of the smem commit). Should finish overnight. Will show:
|
A100 Data for the vairous subcommits, with alpha.2 v100 data for reference. Shos it fixes the 11.3+ issue, and offers a good performance advantage. V100 data showing the < 11.3 performance relative to the alpha.2 base case. Global was a degredation for brute, but the smem 512 version is an improvement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe include/flamegpu/util/StringUint32Pair.h
can now be deleted, as you've removed all use of it (I think).
Otherwise looks good but I've left a few comments on lines that stood out. Most of them don't need anything doing, but it might be nice to address if not too much faff, or for the @todo
's promoting to issues might be worthwhile.
I went through relatively quick though so could have missed some bits.
Once done + a little history cleaning it'll be good to go IMO
@ptheywood Have applied your changes. You should in particular look at 7de1c16 as this was a substantial change due to your notes. I have then ran tests (Windows/Release) and all still pass. I will still need to rebase and clean up the commits if you're happy with all this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests all pass under linux (SEATBELTS=OFF) and profiling still works so the device resetting is still solid.
The other changes all look good. I'll let you merge in case you want to tidy history (the individual commits did make it very easy to review the changes though, so that's appreciated)
The only real caveat with device resetting is that i've removed tracking of which devices So if someone has 2 independent Likewise, if they have a |
Tests pass (Windows, Release, Seatbelts=ON, 1038 Pass, 5 Disabled) Note, Environment cache is still in constant memory.
Tests pass (Windows, Release, Seatbelts=ON, 1038 pass, 5 Disabled)
This involves some big changes to Curve and Environment Manager. Curve is now split into 3 classes Curve, DeviceCurve, HostCurve, refactored to remove features redundant to new use-case. EnvironmentManager has also been refactored to remove features redundant to new use-case. 1028 Tests pass, Debug, Windows, Seatbelts=ON 548 Python Tests pass, Release, Windows, Seatbelts=Off, 10 skip.
and update how cudaDeviceReset() is automatically triggered. Purge should no longer be required as device-wide singletons were removed in the previous commit.
…nal template parameter N. Passing 0 (by default) does no length checking, passing any other value is tested against the length for parity with device API agent variable methods.
Closes #560
Closes #571