Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add passes to eliminate dead output stores #4970

Merged
merged 1 commit into from
Nov 2, 2022

Conversation

greg-lunarg
Copy link
Contributor

This adds two passes to accomplish this: one pass to analyze a shader to determine the input slots that are live. The second pass is run on the preceding shader to eliminate any stores to output slots that are not consumed by the following shader.

These passes support vert, tesc, tese, geom, and frag shaders.

These passes are currently only available through the API.

These passes together with dead code elimination, and elimination of dead input and output components and variables (WIP), will allow users to do dead code elimination across shader boundaries.

@greg-lunarg
Copy link
Contributor Author

greg-lunarg commented Oct 21, 2022

This is the second attempt to add this capability, the first being #4909. I have started a new PR because the restructuring of the first PR was so extensive I thought it best to start anew. I will be closing #4909 shortly.

The core capability of #4909 has been retained. I believe I have addressed all of @s-perron 's concerns in that PR, including the restructuring which was requested. That restructuring was the movement of the analysis code into a true SPIRV-Tools analysis component, as well as splitting the analysis and dead code removal passes into two distinct SPIRV-Tools passes.

This PR is slightly more capable than #4909 as it also analyzes and removes dead stores of builtin output variables. But the general approach remains the same.

@greg-lunarg
Copy link
Contributor Author

greg-lunarg commented Oct 21, 2022

@dnovillo and other reviewers: I am adding this so that you understand the full context of these changes and changes to follow. This may duplicate information already given in this PR. This was adapted from similar text in #4909.

The optimization passes in this PR are part of a larger set of optimization passes which can be used to accomplish dead code elimination across shaders through analysis of loads and stores through input and output interface variables.

Possibly the most unique aspect of this optimization sequence is that it is performed on two shader modules instead of one. A downstream shader is analyzed for liveness and an upstream shader is optimized based on the liveness data. Since SPIRV-Tools only (currently) works on one module at a time, this optimization sequence will involve two invocations of SPIRV-Tools, with one analysis invocation creating live data to be used by the second optimizing invocation.

Following are descriptions of the different passes used in this optimization:

You may already be familiar with EliminateDeadInputComponents. This is mostly checked in already. It eliminates trailing components of input variables that are not used by a shader. Currently, EliminateDeadInputComponents can only be safely applied to a Vertex shader. If it is applied to other shaders, additional optimization will be needed on preceding shaders so that the interfaces match. A fix to EliminateDeadInputComponents is coming which will assure safe default behavior as well as allow it to be applied to other shaders as part of a larger optimization through a special option.

This PR adds two passes, AnalyzeLiveInput and EliminateDeadOutputStores. The analyze phase creates a vector of the input locations of a shader which are live. The elimination phase is called on the preceding shader and eliminates stores to output variables and output components whose locations are not in the vector of live locations created by the analyze phase on the subsequent shader.

I will then be adding a new phase called EliminateDeadOutputComponents, very similar to EliminateDeadInputComponents but for output variables. This can only be applied to a shader as part of the larger cross-shader optimization where EliminateDeadInputComponents was applied to the following shader.

Finally, I will be adding an option to EliminateDeadCodeAggressive which permits elimination of dead output variables through a special option. This can only be applied to a shader as part of the larger cross-shader optimization where EliminateDeadCodeAggressive was applied to the following shader.

Altogether, these passes will allow dead code elimination to propagate across shaders.

As currently implemented, these passes will only be available through the API. They will not be available through the command line. This is partly because the client is only interested in access through the API. If someone else wishes to come up with a mechanism for them to work through the command line, they are welcome to do so.

The user will have to exercise caution to not create incompatible interfaces between shaders
with these passes. These phases and options are not meant for stand-alone use like many previous passes. The user must be sure they have eliminated dead inputs in a shader before they eliminate the corresponding outputs in the preceding shader. Warnings will be added to these passes to this effect. These passes will not be added to any of the existing recipes.

These passes will work on vert, tese, tesc, geom and frag shaders only.

So, dead code elimination across vertex shader V and frag shader F can be accomplished by performing the following passes:

EliminateDeadCodeAggressive ( F )
EliminateDeadInputComponents ( F )
LiveInputSetsOfF = AnalyzeLiveInput ( F )
EliminateDeadOutputStores ( V, LiveInputSetsOfF )
EliminateDeadOutputComponents ( V )
EliminateDeadCodeAggressive ( V )
EliminateDeadInputComponents ( V )

Optimization of larger pipelines would be accomplished by extending this recipe logically.

@greg-lunarg greg-lunarg force-pushed the edos4 branch 8 times, most recently from 30e4c92 to af6e362 Compare October 21, 2022 22:15
@greg-lunarg greg-lunarg requested a review from dnovillo October 21, 2022 22:49
@greg-lunarg
Copy link
Contributor Author

@dnovillo I believe the smoketest failures are not related to this PR; several other SPIRV-Tools PRs are failing smoke test.

Could be related to glslang changing the generator number this last release. It is part of the version string and we had to update nearly all glslang tests.

@greg-lunarg
Copy link
Contributor Author

@dnovillo I believe this is ready for your review now.

@Keenuts Keenuts self-assigned this Oct 25, 2022
Copy link
Contributor

@Keenuts Keenuts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello!
Trying to jump in! There are a few questions here and there.
I believe I understand the overall principle, and seems fine, but don't quite follow why some cases are done or some other not. (See comments).

#ifndef SOURCE_OPT_LIVENESS_H_
#define SOURCE_OPT_LIVENESS_H_

#include <unordered_set>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want to add #include <cstdint>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compilation error on my end:

$ cmake -H. -Bbuild -GNinja
$ ninja -C build
[...]
In file included from /usr/local/google/home/nathangauer/projects/SPIRV-Tools/source/opt/liveness.cpp:16:
/usr/local/google/home/nathangauer/projects/SPIRV-Tools/source/opt/liveness.h:37:39: error: ‘uint32_t’ was not declared in this scope
   37 |   void GetLiveness(std::unordered_set<uint32_t>* live_locs,
      |                                       ^~~~~~~~
/usr/local/google/home/nathangauer/projects/SPIRV-Tools/source/opt/liveness.h:20:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
   19 | #include <unordered_set>
  +++ |+#include <cstdint>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Comment on lines +23 to +28
const uint32_t kDecorationLocationInIdx = 2;
const uint32_t kOpDecorateMemberMemberInIdx = 1;
const uint32_t kOpDecorateBuiltInLiteralInIdx = 2;
const uint32_t kOpDecorateMemberBuiltInLiteralInIdx = 3;
const uint32_t kOpAccessChainIdx0InIdx = 1;
const uint32_t kOpConstantValueInIdx = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: constexpr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is possible, but I wonder if doing so would make people think these will be used in a larger expression when they never are. What would the benefit be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK: const just says "during runtime, value doesn't change". Constexpr says "no matter what your toolchain is supposed to be, this is constant at build-time, no runtime initializer will be called to set it"

Aside from that, it doesn't bring drawbacks, and yes, I do believe it's a good point to enable people to use them in constexpr expressions 😀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is consistent with similar usage throughout SPIRV_Tools. I think inconsistency is confusing and I would prefer not to inject it. I would also prefer not to make a such a global change in this PR. If you wish to change all of such uses in separate PR, you are welcome.

kill_list_.clear();
}

bool EliminateDeadOutputStoresPass::IsLiveBuiltin(uint32_t bi) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: replace bi with id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why. bi is a value of enumerated type SpvBuiltin_. It is not an id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, bi stands for builtin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

source/opt/eliminate_dead_output_stores_pass.cpp Outdated Show resolved Hide resolved
source/opt/eliminate_dead_output_stores_pass.cpp Outdated Show resolved Hide resolved
OpMemberDecorate %gl_PerVertex 0 BuiltIn Position
OpMemberDecorate %gl_PerVertex 1 BuiltIn PointSize
OpMemberDecorate %gl_PerVertex 2 BuiltIn ClipDistance
OpMemberDecorate %gl_PerVertex 3 BuiltIn CullDistance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the shader has an output struct with 4 builtins. The comment above only shows a write to the gl_Position builtin. Is the comment up to date?

Copy link
Contributor Author

@greg-lunarg greg-lunarg Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The frontend generates all the members of gl_PerVertex whether they are stored to or not. I will be adding a pass that will eliminate these shortly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you using to generate this? Used DXC to compile those and it doesn't generate this struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glslangValidator from GLSL.

source/opt/eliminate_dead_output_stores_pass.cpp Outdated Show resolved Hide resolved
source/opt/liveness.cpp Outdated Show resolved Hide resolved
source/opt/liveness.cpp Show resolved Hide resolved
@Keenuts
Copy link
Contributor

Keenuts commented Oct 26, 2022

@dnovillo I believe the smoketest failures are not related to this PR; several other SPIRV-Tools PRs are failing smoke test.

Do you have access to the build/test output? (but yes, failure is due to glslang version, checking that out)

@greg-lunarg
Copy link
Contributor Author

greg-lunarg commented Oct 26, 2022

Do you have access to the build/test output? (but yes, failure is due to glslang version, checking that out)

Yes. It is big. If you really need it, let me know and we can figure out a way to get it to you.

@greg-lunarg greg-lunarg force-pushed the edos4 branch 2 times, most recently from ff1dae2 to c490015 Compare October 27, 2022 00:55
@greg-lunarg
Copy link
Contributor Author

@Keenuts I think I have addressed all your concerns so far. Please take another look.

@Keenuts
Copy link
Contributor

Keenuts commented Oct 27, 2022

If you really need it, let me know and we can figure out a way to get it to you.

No, I have it, was wondering if it was public 🙂 Will look into how to update it so we can have a passing CI

@Keenuts
Copy link
Contributor

Keenuts commented Oct 27, 2022

Thanks Greg!

Replies or resolved comments above.
So overall this seems OK, there are just a few points I am sad about, or not sure (@dnovillo for a more experiences opinion)

  1. location size computation
    The size of a location is determined by walking the type hierarchy. It does seem to work, and it's probably what the compiler does to generate the correct location indices in the first place. Just wondering how robust is that walk at our level. Example, we handle float32 and float64, but what about float8?
    Another way would be to check at what is the next location's offset, but that doesn't work for the last element, so maybe this solution is the only one.

  2. Analysis format
    I was bitten a bit by the manager pattern which can modify the code when doing a get, so not a fan of it. But it seems like this is the canonical way to have analysis.
    In this PR, there is a liveness manager, which can return live locs and live built-ins. But somehow, you don't use it like other managers. You added an additional pass (AnalyzeLiveInputPass), which calls the manager, and forward results to the next pass. By storing liveness data outside of the pass, you disassociate it from the analysis validity state in the ir_context, meaning you are now responsible to make sure it's still up-to-date.
    Is there a reason not to call this manager in EliminateDeadOutputPass?

  3. type walk
    At several places, you have this kind of AsArray() -> element_type, AsStruct() -> do something on the inner element
    Could this be refactored? Maybe a function taking a lambda and applying on the struct if the top-level type has one?

@greg-lunarg
Copy link
Contributor Author

Just wondering how robust is that walk at our level. Example, we handle float32 and float64, but what about float8?

Float8 is not supported for input/output in Vulkan. Float16 is treated the same as Float32. I have added logic to GetLocSize() and a test to handle Float16.

@greg-lunarg
Copy link
Contributor Author

Is there a reason not to call this manager in EliminateDeadOutputPass?

Yes. This is part of an optimization sequence that is unlike any other we have created in that it involves two shader modules: we analyze the downstream shader D and optimize the upstream shader U. But because SPIRV-Tools only handles one shader module at a time, this needs to be done over two invocations of SPIRV-Tools instead of one. We invoke SPIRV-Tools on shader D to get the analysis into the live sets, and then we invoke SPIRV-Tools again on shader U to do the optimization based on the live sets. That is why we must externalize the Liveness analysis into external sets, because the Liveness analysis does not survive across invocations of SPIRV-Tools. When we are doing EliminateDeadOutputPass on shader U, we do not have access to shader D to do the analysis.

Maybe at some point we will improve SPIRV_Tools to operate on multiple modules at a time. But that is more than I wanted to support today.

@greg-lunarg
Copy link
Contributor Author

At several places, you have this kind of AsArray() -> element_type, AsStruct() -> do something on the inner element
Could this be refactored? Maybe a function taking a lambda and applying on the struct if the top-level type has one?

I am not quite sure what you are asking for here. No doubt my code can be more elegant than it is today. If there is a section that is incorrect or too verbose or inefficient or complex for the normal developer to understand, I would be happy to try to fix it.

This adds two passes to accomplish this: one pass to analyze a shader
to determine the input slots that are live. The second pass is run on
the preceding shader to eliminate any stores to output slots that are
not consumed by the following shader.

These passes support vert, tesc, tese, geom, and frag shaders.

These passes are currently only available through the API.

These passes together with dead code elimination, and elimination of
dead input and output components and variables (WIP), will allow users
to do dead code elimination across shader boundaries.
@greg-lunarg
Copy link
Contributor Author

@Keenuts I believe I have responded as best I can to your latest concerns. Please take another look.

Copy link
Contributor

@Keenuts Keenuts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Greg, sorry for the delay, been OOO.
Thanks for the fixes!

@greg-lunarg greg-lunarg merged commit c8e1588 into KhronosGroup:master Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants