Add passes to eliminate dead output stores #4970

greg-lunarg · 2022-10-21T18:18:22Z

This adds two passes to accomplish this: one pass to analyze a shader to determine the input slots that are live. The second pass is run on the preceding shader to eliminate any stores to output slots that are not consumed by the following shader.

These passes support vert, tesc, tese, geom, and frag shaders.

These passes are currently only available through the API.

These passes together with dead code elimination, and elimination of dead input and output components and variables (WIP), will allow users to do dead code elimination across shader boundaries.

greg-lunarg · 2022-10-21T18:33:05Z

This is the second attempt to add this capability, the first being #4909. I have started a new PR because the restructuring of the first PR was so extensive I thought it best to start anew. I will be closing #4909 shortly.

The core capability of #4909 has been retained. I believe I have addressed all of @s-perron 's concerns in that PR, including the restructuring which was requested. That restructuring was the movement of the analysis code into a true SPIRV-Tools analysis component, as well as splitting the analysis and dead code removal passes into two distinct SPIRV-Tools passes.

This PR is slightly more capable than #4909 as it also analyzes and removes dead stores of builtin output variables. But the general approach remains the same.

greg-lunarg · 2022-10-21T19:59:16Z

@dnovillo and other reviewers: I am adding this so that you understand the full context of these changes and changes to follow. This may duplicate information already given in this PR. This was adapted from similar text in #4909.

The optimization passes in this PR are part of a larger set of optimization passes which can be used to accomplish dead code elimination across shaders through analysis of loads and stores through input and output interface variables.

Possibly the most unique aspect of this optimization sequence is that it is performed on two shader modules instead of one. A downstream shader is analyzed for liveness and an upstream shader is optimized based on the liveness data. Since SPIRV-Tools only (currently) works on one module at a time, this optimization sequence will involve two invocations of SPIRV-Tools, with one analysis invocation creating live data to be used by the second optimizing invocation.

Following are descriptions of the different passes used in this optimization:

You may already be familiar with EliminateDeadInputComponents. This is mostly checked in already. It eliminates trailing components of input variables that are not used by a shader. Currently, EliminateDeadInputComponents can only be safely applied to a Vertex shader. If it is applied to other shaders, additional optimization will be needed on preceding shaders so that the interfaces match. A fix to EliminateDeadInputComponents is coming which will assure safe default behavior as well as allow it to be applied to other shaders as part of a larger optimization through a special option.

This PR adds two passes, AnalyzeLiveInput and EliminateDeadOutputStores. The analyze phase creates a vector of the input locations of a shader which are live. The elimination phase is called on the preceding shader and eliminates stores to output variables and output components whose locations are not in the vector of live locations created by the analyze phase on the subsequent shader.

I will then be adding a new phase called EliminateDeadOutputComponents, very similar to EliminateDeadInputComponents but for output variables. This can only be applied to a shader as part of the larger cross-shader optimization where EliminateDeadInputComponents was applied to the following shader.

Finally, I will be adding an option to EliminateDeadCodeAggressive which permits elimination of dead output variables through a special option. This can only be applied to a shader as part of the larger cross-shader optimization where EliminateDeadCodeAggressive was applied to the following shader.

Altogether, these passes will allow dead code elimination to propagate across shaders.

As currently implemented, these passes will only be available through the API. They will not be available through the command line. This is partly because the client is only interested in access through the API. If someone else wishes to come up with a mechanism for them to work through the command line, they are welcome to do so.

The user will have to exercise caution to not create incompatible interfaces between shaders
with these passes. These phases and options are not meant for stand-alone use like many previous passes. The user must be sure they have eliminated dead inputs in a shader before they eliminate the corresponding outputs in the preceding shader. Warnings will be added to these passes to this effect. These passes will not be added to any of the existing recipes.

These passes will work on vert, tese, tesc, geom and frag shaders only.

So, dead code elimination across vertex shader V and frag shader F can be accomplished by performing the following passes:

EliminateDeadCodeAggressive ( F )
EliminateDeadInputComponents ( F )
LiveInputSetsOfF = AnalyzeLiveInput ( F )
EliminateDeadOutputStores ( V, LiveInputSetsOfF )
EliminateDeadOutputComponents ( V )
EliminateDeadCodeAggressive ( V )
EliminateDeadInputComponents ( V )

Optimization of larger pipelines would be accomplished by extending this recipe logically.

greg-lunarg · 2022-10-21T22:51:53Z

@dnovillo I believe the smoketest failures are not related to this PR; several other SPIRV-Tools PRs are failing smoke test.

Could be related to glslang changing the generator number this last release. It is part of the version string and we had to update nearly all glslang tests.

greg-lunarg · 2022-10-24T16:02:47Z

@dnovillo I believe this is ready for your review now.

Keenuts

Hello!
Trying to jump in! There are a few questions here and there.
I believe I understand the overall principle, and seems fine, but don't quite follow why some cases are done or some other not. (See comments).

Keenuts · 2022-10-25T15:47:49Z

source/opt/liveness.h

+#ifndef SOURCE_OPT_LIVENESS_H_
+#define SOURCE_OPT_LIVENESS_H_
+
+#include <unordered_set>


you might want to add #include <cstdint>

Curious why?

Compilation error on my end:

$ cmake -H. -Bbuild -GNinja $ ninja -C build [...] In file included from /usr/local/google/home/nathangauer/projects/SPIRV-Tools/source/opt/liveness.cpp:16: /usr/local/google/home/nathangauer/projects/SPIRV-Tools/source/opt/liveness.h:37:39: error: ‘uint32_t’ was not declared in this scope 37 | void GetLiveness(std::unordered_set<uint32_t>* live_locs, | ^~~~~~~~ /usr/local/google/home/nathangauer/projects/SPIRV-Tools/source/opt/liveness.h:20:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’? 19 | #include <unordered_set> +++ |+#include <cstdint>

Keenuts · 2022-10-25T16:00:02Z

source/opt/eliminate_dead_output_stores_pass.cpp

+const uint32_t kDecorationLocationInIdx = 2;
+const uint32_t kOpDecorateMemberMemberInIdx = 1;
+const uint32_t kOpDecorateBuiltInLiteralInIdx = 2;
+const uint32_t kOpDecorateMemberBuiltInLiteralInIdx = 3;
+const uint32_t kOpAccessChainIdx0InIdx = 1;
+const uint32_t kOpConstantValueInIdx = 0;


nit: constexpr

I guess this is possible, but I wonder if doing so would make people think these will be used in a larger expression when they never are. What would the benefit be?

AFAIK: const just says "during runtime, value doesn't change". Constexpr says "no matter what your toolchain is supposed to be, this is constant at build-time, no runtime initializer will be called to set it"

Aside from that, it doesn't bring drawbacks, and yes, I do believe it's a good point to enable people to use them in constexpr expressions 😀

This code is consistent with similar usage throughout SPIRV_Tools. I think inconsistency is confusing and I would prefer not to inject it. I would also prefer not to make a such a global change in this PR. If you wish to change all of such uses in separate PR, you are welcome.

Keenuts · 2022-10-25T16:05:49Z

source/opt/eliminate_dead_output_stores_pass.cpp

+  kill_list_.clear();
+}
+
+bool EliminateDeadOutputStoresPass::IsLiveBuiltin(uint32_t bi) {


nit: replace bi with id

Not sure why. bi is a value of enumerated type SpvBuiltin_. It is not an id.

oh, bi stands for builtin?

source/opt/eliminate_dead_output_stores_pass.cpp

Keenuts · 2022-10-26T15:11:51Z

test/opt/eliminate_dead_output_stores_test.cpp

+               OpMemberDecorate %gl_PerVertex 0 BuiltIn Position
+               OpMemberDecorate %gl_PerVertex 1 BuiltIn PointSize
+               OpMemberDecorate %gl_PerVertex 2 BuiltIn ClipDistance
+               OpMemberDecorate %gl_PerVertex 3 BuiltIn CullDistance


It seems like the shader has an output struct with 4 builtins. The comment above only shows a write to the gl_Position builtin. Is the comment up to date?

Yes. The frontend generates all the members of gl_PerVertex whether they are stored to or not. I will be adding a pass that will eliminate these shortly.

What are you using to generate this? Used DXC to compile those and it doesn't generate this struct.

glslangValidator from GLSL.

source/opt/eliminate_dead_output_stores_pass.cpp

source/opt/liveness.cpp

Keenuts · 2022-10-26T16:48:05Z

@dnovillo I believe the smoketest failures are not related to this PR; several other SPIRV-Tools PRs are failing smoke test.

Do you have access to the build/test output? (but yes, failure is due to glslang version, checking that out)

greg-lunarg · 2022-10-26T18:50:33Z

Do you have access to the build/test output? (but yes, failure is due to glslang version, checking that out)

Yes. It is big. If you really need it, let me know and we can figure out a way to get it to you.

greg-lunarg · 2022-10-27T00:56:34Z

@Keenuts I think I have addressed all your concerns so far. Please take another look.

Keenuts · 2022-10-27T09:20:33Z

If you really need it, let me know and we can figure out a way to get it to you.

No, I have it, was wondering if it was public 🙂 Will look into how to update it so we can have a passing CI

Keenuts · 2022-10-27T10:25:47Z

Thanks Greg!

Replies or resolved comments above.
So overall this seems OK, there are just a few points I am sad about, or not sure (@dnovillo for a more experiences opinion)

location size computation
The size of a location is determined by walking the type hierarchy. It does seem to work, and it's probably what the compiler does to generate the correct location indices in the first place. Just wondering how robust is that walk at our level. Example, we handle float32 and float64, but what about float8?
Another way would be to check at what is the next location's offset, but that doesn't work for the last element, so maybe this solution is the only one.
Analysis format
I was bitten a bit by the manager pattern which can modify the code when doing a get, so not a fan of it. But it seems like this is the canonical way to have analysis.
In this PR, there is a liveness manager, which can return live locs and live built-ins. But somehow, you don't use it like other managers. You added an additional pass (AnalyzeLiveInputPass), which calls the manager, and forward results to the next pass. By storing liveness data outside of the pass, you disassociate it from the analysis validity state in the ir_context, meaning you are now responsible to make sure it's still up-to-date.
Is there a reason not to call this manager in EliminateDeadOutputPass?
type walk
At several places, you have this kind of AsArray() -> element_type, AsStruct() -> do something on the inner element
Could this be refactored? Maybe a function taking a lambda and applying on the struct if the top-level type has one?

greg-lunarg · 2022-10-27T17:01:05Z

Just wondering how robust is that walk at our level. Example, we handle float32 and float64, but what about float8?

Float8 is not supported for input/output in Vulkan. Float16 is treated the same as Float32. I have added logic to GetLocSize() and a test to handle Float16.

greg-lunarg · 2022-10-27T18:31:21Z

Is there a reason not to call this manager in EliminateDeadOutputPass?

Yes. This is part of an optimization sequence that is unlike any other we have created in that it involves two shader modules: we analyze the downstream shader D and optimize the upstream shader U. But because SPIRV-Tools only handles one shader module at a time, this needs to be done over two invocations of SPIRV-Tools instead of one. We invoke SPIRV-Tools on shader D to get the analysis into the live sets, and then we invoke SPIRV-Tools again on shader U to do the optimization based on the live sets. That is why we must externalize the Liveness analysis into external sets, because the Liveness analysis does not survive across invocations of SPIRV-Tools. When we are doing EliminateDeadOutputPass on shader U, we do not have access to shader D to do the analysis.

Maybe at some point we will improve SPIRV_Tools to operate on multiple modules at a time. But that is more than I wanted to support today.

greg-lunarg · 2022-10-27T18:36:04Z

At several places, you have this kind of AsArray() -> element_type, AsStruct() -> do something on the inner element
Could this be refactored? Maybe a function taking a lambda and applying on the struct if the top-level type has one?

I am not quite sure what you are asking for here. No doubt my code can be more elegant than it is today. If there is a section that is incorrect or too verbose or inefficient or complex for the normal developer to understand, I would be happy to try to fix it.

This adds two passes to accomplish this: one pass to analyze a shader to determine the input slots that are live. The second pass is run on the preceding shader to eliminate any stores to output slots that are not consumed by the following shader. These passes support vert, tesc, tese, geom, and frag shaders. These passes are currently only available through the API. These passes together with dead code elimination, and elimination of dead input and output components and variables (WIP), will allow users to do dead code elimination across shader boundaries.

greg-lunarg · 2022-10-31T16:36:16Z

@Keenuts I believe I have responded as best I can to your latest concerns. Please take another look.

Keenuts

Hi Greg, sorry for the delay, been OOO.
Thanks for the fixes!

greg-lunarg force-pushed the edos4 branch from 5b85fd0 to 1b6f693 Compare October 21, 2022 18:22

greg-lunarg force-pushed the edos4 branch 2 times, most recently from 778939f to ff70fd1 Compare October 21, 2022 19:46

greg-lunarg mentioned this pull request Oct 21, 2022

Add passes to eliminate dead output stores #4909

Closed

greg-lunarg force-pushed the edos4 branch 8 times, most recently from 30e4c92 to af6e362 Compare October 21, 2022 22:15

greg-lunarg requested a review from dnovillo October 21, 2022 22:49

Keenuts self-assigned this Oct 25, 2022

Keenuts reviewed Oct 26, 2022

View reviewed changes

greg-lunarg force-pushed the edos4 branch 2 times, most recently from ff1dae2 to c490015 Compare October 27, 2022 00:55

greg-lunarg force-pushed the edos4 branch from c490015 to a3875ea Compare October 27, 2022 16:55

greg-lunarg force-pushed the edos4 branch 4 times, most recently from 2efa15b to 9566ad0 Compare October 28, 2022 00:24

Keenuts mentioned this pull request Oct 28, 2022

tests: update tests' glslangvalidator version google/shaderc#1275

Merged

greg-lunarg force-pushed the edos4 branch from 9566ad0 to 2c93c1d Compare October 28, 2022 19:40

greg-lunarg force-pushed the edos4 branch from 2c93c1d to 8f9fee1 Compare October 28, 2022 23:12

Keenuts approved these changes Nov 2, 2022

View reviewed changes

greg-lunarg merged commit c8e1588 into KhronosGroup:master Nov 2, 2022

s-perron mentioned this pull request Jun 16, 2023

Support pipeline level optimization #2016

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add passes to eliminate dead output stores #4970

Add passes to eliminate dead output stores #4970

greg-lunarg commented Oct 21, 2022

greg-lunarg commented Oct 21, 2022 •

edited

Loading

greg-lunarg commented Oct 21, 2022 •

edited

Loading

greg-lunarg commented Oct 21, 2022

greg-lunarg commented Oct 24, 2022

Keenuts left a comment

Keenuts Oct 25, 2022

greg-lunarg Oct 26, 2022

Keenuts Oct 27, 2022

greg-lunarg Oct 27, 2022

Keenuts Oct 25, 2022

greg-lunarg Oct 26, 2022

Keenuts Oct 27, 2022

greg-lunarg Oct 27, 2022

Keenuts Oct 25, 2022

greg-lunarg Oct 26, 2022

Keenuts Oct 27, 2022

greg-lunarg Oct 27, 2022

Keenuts Oct 26, 2022

greg-lunarg Oct 26, 2022 •

edited

Loading

Keenuts Oct 27, 2022

greg-lunarg Oct 27, 2022

Keenuts commented Oct 26, 2022 •

edited

Loading

greg-lunarg commented Oct 26, 2022 •

edited

Loading

greg-lunarg commented Oct 27, 2022

Keenuts commented Oct 27, 2022

Keenuts commented Oct 27, 2022

greg-lunarg commented Oct 27, 2022

greg-lunarg commented Oct 27, 2022

greg-lunarg commented Oct 27, 2022

greg-lunarg commented Oct 31, 2022

Keenuts left a comment

Add passes to eliminate dead output stores #4970

Add passes to eliminate dead output stores #4970

Conversation

greg-lunarg commented Oct 21, 2022

greg-lunarg commented Oct 21, 2022 • edited Loading

greg-lunarg commented Oct 21, 2022 • edited Loading

greg-lunarg commented Oct 21, 2022

greg-lunarg commented Oct 24, 2022

Keenuts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greg-lunarg Oct 26, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Keenuts commented Oct 26, 2022 • edited Loading

greg-lunarg commented Oct 26, 2022 • edited Loading

greg-lunarg commented Oct 27, 2022

Keenuts commented Oct 27, 2022

Keenuts commented Oct 27, 2022

greg-lunarg commented Oct 27, 2022

greg-lunarg commented Oct 27, 2022

greg-lunarg commented Oct 27, 2022

greg-lunarg commented Oct 31, 2022

Keenuts left a comment

Choose a reason for hiding this comment

greg-lunarg commented Oct 21, 2022 •

edited

Loading

greg-lunarg commented Oct 21, 2022 •

edited

Loading

greg-lunarg Oct 26, 2022 •

edited

Loading

Keenuts commented Oct 26, 2022 •

edited

Loading

greg-lunarg commented Oct 26, 2022 •

edited

Loading