extern/sector-storage: fix GPU usage overwrite bug #4627
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
GPU tracking currently is a bit broken on master because each subsequently scheduled task blindly overwrites the flag, irrelevant of what its current value is. E.g.:
The second C2 should not have been permitted to be allocated onto the same GPU, but since PC1 blindly overwritten the flag, a second C2 becomes schedulable. Repeat until the GPU is overloaded and starts going OOM (PC2 cannot recover from it).
The fix is that tasks should only set the flag if they themselves requested the GPU (and have been granted). If they didn't request the GPU, they should most definitely not set the GPU as unused.
Funky caveat: On my system at least, overlapping GPU allocation results in better resource use because I can run multiple PC2 or PC2+C2 on the same GPU. This fix will actually prevent that, but IMHO it's nonetheless necessary because a better GPU scheduling can't be written as long as tasks can exceed the available video RAM and crash.