-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add passes to eliminate dead output stores #4970
Conversation
This is the second attempt to add this capability, the first being #4909. I have started a new PR because the restructuring of the first PR was so extensive I thought it best to start anew. I will be closing #4909 shortly. The core capability of #4909 has been retained. I believe I have addressed all of @s-perron 's concerns in that PR, including the restructuring which was requested. That restructuring was the movement of the analysis code into a true SPIRV-Tools analysis component, as well as splitting the analysis and dead code removal passes into two distinct SPIRV-Tools passes. This PR is slightly more capable than #4909 as it also analyzes and removes dead stores of builtin output variables. But the general approach remains the same. |
778939f
to
ff70fd1
Compare
@dnovillo and other reviewers: I am adding this so that you understand the full context of these changes and changes to follow. This may duplicate information already given in this PR. This was adapted from similar text in #4909. The optimization passes in this PR are part of a larger set of optimization passes which can be used to accomplish dead code elimination across shaders through analysis of loads and stores through input and output interface variables. Possibly the most unique aspect of this optimization sequence is that it is performed on two shader modules instead of one. A downstream shader is analyzed for liveness and an upstream shader is optimized based on the liveness data. Since SPIRV-Tools only (currently) works on one module at a time, this optimization sequence will involve two invocations of SPIRV-Tools, with one analysis invocation creating live data to be used by the second optimizing invocation. Following are descriptions of the different passes used in this optimization: You may already be familiar with EliminateDeadInputComponents. This is mostly checked in already. It eliminates trailing components of input variables that are not used by a shader. Currently, EliminateDeadInputComponents can only be safely applied to a Vertex shader. If it is applied to other shaders, additional optimization will be needed on preceding shaders so that the interfaces match. A fix to EliminateDeadInputComponents is coming which will assure safe default behavior as well as allow it to be applied to other shaders as part of a larger optimization through a special option. This PR adds two passes, AnalyzeLiveInput and EliminateDeadOutputStores. The analyze phase creates a vector of the input locations of a shader which are live. The elimination phase is called on the preceding shader and eliminates stores to output variables and output components whose locations are not in the vector of live locations created by the analyze phase on the subsequent shader. I will then be adding a new phase called EliminateDeadOutputComponents, very similar to EliminateDeadInputComponents but for output variables. This can only be applied to a shader as part of the larger cross-shader optimization where EliminateDeadInputComponents was applied to the following shader. Finally, I will be adding an option to EliminateDeadCodeAggressive which permits elimination of dead output variables through a special option. This can only be applied to a shader as part of the larger cross-shader optimization where EliminateDeadCodeAggressive was applied to the following shader. Altogether, these passes will allow dead code elimination to propagate across shaders. As currently implemented, these passes will only be available through the API. They will not be available through the command line. This is partly because the client is only interested in access through the API. If someone else wishes to come up with a mechanism for them to work through the command line, they are welcome to do so. The user will have to exercise caution to not create incompatible interfaces between shaders These passes will work on vert, tese, tesc, geom and frag shaders only. So, dead code elimination across vertex shader V and frag shader F can be accomplished by performing the following passes: EliminateDeadCodeAggressive ( F ) Optimization of larger pipelines would be accomplished by extending this recipe logically. |
30e4c92
to
af6e362
Compare
@dnovillo I believe the smoketest failures are not related to this PR; several other SPIRV-Tools PRs are failing smoke test. Could be related to glslang changing the generator number this last release. It is part of the version string and we had to update nearly all glslang tests. |
@dnovillo I believe this is ready for your review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello!
Trying to jump in! There are a few questions here and there.
I believe I understand the overall principle, and seems fine, but don't quite follow why some cases are done or some other not. (See comments).
#ifndef SOURCE_OPT_LIVENESS_H_ | ||
#define SOURCE_OPT_LIVENESS_H_ | ||
|
||
#include <unordered_set> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might want to add #include <cstdint>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compilation error on my end:
$ cmake -H. -Bbuild -GNinja
$ ninja -C build
[...]
In file included from /usr/local/google/home/nathangauer/projects/SPIRV-Tools/source/opt/liveness.cpp:16:
/usr/local/google/home/nathangauer/projects/SPIRV-Tools/source/opt/liveness.h:37:39: error: ‘uint32_t’ was not declared in this scope
37 | void GetLiveness(std::unordered_set<uint32_t>* live_locs,
| ^~~~~~~~
/usr/local/google/home/nathangauer/projects/SPIRV-Tools/source/opt/liveness.h:20:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
19 | #include <unordered_set>
+++ |+#include <cstdint>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
const uint32_t kDecorationLocationInIdx = 2; | ||
const uint32_t kOpDecorateMemberMemberInIdx = 1; | ||
const uint32_t kOpDecorateBuiltInLiteralInIdx = 2; | ||
const uint32_t kOpDecorateMemberBuiltInLiteralInIdx = 3; | ||
const uint32_t kOpAccessChainIdx0InIdx = 1; | ||
const uint32_t kOpConstantValueInIdx = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: constexpr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is possible, but I wonder if doing so would make people think these will be used in a larger expression when they never are. What would the benefit be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK: const just says "during runtime, value doesn't change". Constexpr says "no matter what your toolchain is supposed to be, this is constant at build-time, no runtime initializer will be called to set it"
Aside from that, it doesn't bring drawbacks, and yes, I do believe it's a good point to enable people to use them in constexpr expressions 😀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is consistent with similar usage throughout SPIRV_Tools. I think inconsistency is confusing and I would prefer not to inject it. I would also prefer not to make a such a global change in this PR. If you wish to change all of such uses in separate PR, you are welcome.
kill_list_.clear(); | ||
} | ||
|
||
bool EliminateDeadOutputStoresPass::IsLiveBuiltin(uint32_t bi) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: replace bi with id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why. bi is a value of enumerated type SpvBuiltin_. It is not an id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, bi
stands for builtin
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
OpMemberDecorate %gl_PerVertex 0 BuiltIn Position | ||
OpMemberDecorate %gl_PerVertex 1 BuiltIn PointSize | ||
OpMemberDecorate %gl_PerVertex 2 BuiltIn ClipDistance | ||
OpMemberDecorate %gl_PerVertex 3 BuiltIn CullDistance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the shader has an output struct with 4 builtins. The comment above only shows a write to the gl_Position builtin. Is the comment up to date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The frontend generates all the members of gl_PerVertex whether they are stored to or not. I will be adding a pass that will eliminate these shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you using to generate this? Used DXC to compile those and it doesn't generate this struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glslangValidator from GLSL.
Do you have access to the build/test output? (but yes, failure is due to glslang version, checking that out) |
Yes. It is big. If you really need it, let me know and we can figure out a way to get it to you. |
ff1dae2
to
c490015
Compare
@Keenuts I think I have addressed all your concerns so far. Please take another look. |
No, I have it, was wondering if it was public 🙂 Will look into how to update it so we can have a passing CI |
Thanks Greg! Replies or resolved comments above.
|
Float8 is not supported for input/output in Vulkan. Float16 is treated the same as Float32. I have added logic to GetLocSize() and a test to handle Float16. |
Yes. This is part of an optimization sequence that is unlike any other we have created in that it involves two shader modules: we analyze the downstream shader D and optimize the upstream shader U. But because SPIRV-Tools only handles one shader module at a time, this needs to be done over two invocations of SPIRV-Tools instead of one. We invoke SPIRV-Tools on shader D to get the analysis into the live sets, and then we invoke SPIRV-Tools again on shader U to do the optimization based on the live sets. That is why we must externalize the Liveness analysis into external sets, because the Liveness analysis does not survive across invocations of SPIRV-Tools. When we are doing EliminateDeadOutputPass on shader U, we do not have access to shader D to do the analysis. Maybe at some point we will improve SPIRV_Tools to operate on multiple modules at a time. But that is more than I wanted to support today. |
I am not quite sure what you are asking for here. No doubt my code can be more elegant than it is today. If there is a section that is incorrect or too verbose or inefficient or complex for the normal developer to understand, I would be happy to try to fix it. |
2efa15b
to
9566ad0
Compare
This adds two passes to accomplish this: one pass to analyze a shader to determine the input slots that are live. The second pass is run on the preceding shader to eliminate any stores to output slots that are not consumed by the following shader. These passes support vert, tesc, tese, geom, and frag shaders. These passes are currently only available through the API. These passes together with dead code elimination, and elimination of dead input and output components and variables (WIP), will allow users to do dead code elimination across shader boundaries.
@Keenuts I believe I have responded as best I can to your latest concerns. Please take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Greg, sorry for the delay, been OOO.
Thanks for the fixes!
This adds two passes to accomplish this: one pass to analyze a shader to determine the input slots that are live. The second pass is run on the preceding shader to eliminate any stores to output slots that are not consumed by the following shader.
These passes support vert, tesc, tese, geom, and frag shaders.
These passes are currently only available through the API.
These passes together with dead code elimination, and elimination of dead input and output components and variables (WIP), will allow users to do dead code elimination across shader boundaries.