-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce function size in fast & dfast #2863
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
terrelln
added a commit
to terrelln/zstd
that referenced
this pull request
Nov 16, 2021
Improve the codegen for zstd fast & double fast by shrinking the function size, which also improves the stack usage on some compilers. See PR facebook#2863 [0] for details. [0] facebook#2863
Looks good to me! Seems there is one last preprocessor issue to take care of... |
terrelln
force-pushed
the
fast-dfast-size
branch
from
November 16, 2021 02:58
908951a
to
802ea88
Compare
Take the same approach as in PR facebook#2828 [0] to remove functions that force inline many function bodies and `switch`. Instead, create one function per "template" combination, and then switch between these functions. This allows the compiler to break the large function into many small functions, which generally helps codegen. Also, in the `extDict` modes when there is no ext-dict, call the top level function instead of the force inlined one, to save on code size. I'm specifically doing this because gcc on the parisc architecture doesn't handle the large function body well, and ends up using a lot of excess stack space. Outlining these functions fixes it.
Cyan4973
approved these changes
Nov 16, 2021
terrelln
added a commit
to terrelln/linux
that referenced
this pull request
Nov 16, 2021
Backport of upstream PR #2863 [0]. Large functions with excessive force inlining can cause trouble for compilers, and can sometimes take excess stack space because the compiler isn't able to fully analyze the function. This commit splits functions that have multiple copies of the same body into multiple smaller functions, which can help the compiler. This was specifically causing issues on the parisc architecture [1]. In this configuration, especially with UBSAN enabled, these functions stack usage could get quite large. This is because the compiler was doing a poor job handling the extremely large function which had multiple copies of the function body inlined into it. After this commit we see: [0] facebook/zstd#2863 [1] https://lkml.org/lkml/2021/11/15/710 Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Nick Terrell <terrelln@fb.com>
terrelln
added a commit
to terrelln/zstd
that referenced
this pull request
Nov 16, 2021
Improve the codegen for zstd fast & double fast by shrinking the function size, which also improves the stack usage on some compilers. See PR facebook#2863 [0] for details. [0] facebook#2863
buzzcut-s
pushed a commit
to buzzcut-s/kernel-x86-5.15
that referenced
this pull request
Nov 18, 2021
Backport of upstream PR #2863 [0]. Large functions with excessive force inlining can cause trouble for compilers, and can sometimes take excess stack space because the compiler isn't able to fully analyze the function. This commit splits functions that have multiple copies of the same body into multiple smaller functions, which can help the compiler. This was specifically causing issues on the parisc architecture [1]. In this configuration, especially with UBSAN enabled, these functions stack usage could get quite large. This is because the compiler was doing a poor job handling the extremely large function which had multiple copies of the function body inlined into it. After this commit we see: [0] facebook/zstd#2863 [1] https://lkml.org/lkml/2021/11/15/710 Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Nick Terrell <terrelln@fb.com>
buzzcut-s
pushed a commit
to buzzcut-s/kernel-x86-5.17
that referenced
this pull request
Mar 26, 2022
Backport of upstream PR #2863 [0]. Large functions with excessive force inlining can cause trouble for compilers, and can sometimes take excess stack space because the compiler isn't able to fully analyze the function. This commit splits functions that have multiple copies of the same body into multiple smaller functions, which can help the compiler. This was specifically causing issues on the parisc architecture [1]. In this configuration, especially with UBSAN enabled, these functions stack usage could get quite large. This is because the compiler was doing a poor job handling the extremely large function which had multiple copies of the function body inlined into it. After this commit we see: [0] facebook/zstd#2863 [1] https://lkml.org/lkml/2021/11/15/710 Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Nick Terrell <terrelln@fb.com>
ptr1337
pushed a commit
to CachyOS/linux
that referenced
this pull request
Jun 4, 2022
Backport of upstream PR #2863 [0]. Large functions with excessive force inlining can cause trouble for compilers, and can sometimes take excess stack space because the compiler isn't able to fully analyze the function. This commit splits functions that have multiple copies of the same body into multiple smaller functions, which can help the compiler. This was specifically causing issues on the parisc architecture [1]. In this configuration, especially with UBSAN enabled, these functions stack usage could get quite large. This is because the compiler was doing a poor job handling the extremely large function which had multiple copies of the function body inlined into it. After this commit we see: [0] facebook/zstd#2863 [1] https://lkml.org/lkml/2021/11/15/710 Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Nick Terrell <terrelln@fb.com>
Dark-Matter7232
pushed a commit
to Dark-Matter7232/kernel_x86_laptop
that referenced
this pull request
Oct 2, 2022
Backport of upstream PR #2863 [0]. Large functions with excessive force inlining can cause trouble for compilers, and can sometimes take excess stack space because the compiler isn't able to fully analyze the function. This commit splits functions that have multiple copies of the same body into multiple smaller functions, which can help the compiler. This was specifically causing issues on the parisc architecture [1]. In this configuration, especially with UBSAN enabled, these functions stack usage could get quite large. This is because the compiler was doing a poor job handling the extremely large function which had multiple copies of the function body inlined into it. After this commit we see: [0] facebook/zstd#2863 [1] https://lkml.org/lkml/2021/11/15/710 Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Nick Terrell <terrelln@fb.com>
Dark-Matter7232
pushed a commit
to Dark-Matter7232/kernel_x86_laptop
that referenced
this pull request
Oct 2, 2022
Backport of upstream PR #2863 [0]. Large functions with excessive force inlining can cause trouble for compilers, and can sometimes take excess stack space because the compiler isn't able to fully analyze the function. This commit splits functions that have multiple copies of the same body into multiple smaller functions, which can help the compiler. This was specifically causing issues on the parisc architecture [1]. In this configuration, especially with UBSAN enabled, these functions stack usage could get quite large. This is because the compiler was doing a poor job handling the extremely large function which had multiple copies of the function body inlined into it. After this commit we see: [0] facebook/zstd#2863 [1] https://lkml.org/lkml/2021/11/15/710 Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Nick Terrell <terrelln@fb.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Take the same approach as in PR #2828 [0] to remove functions that force
inline many function bodies and
switch
. Instead, create one function per"template" combination, and then switch between these functions. This
allows the compiler to break the large function into many small
functions, which generally helps codegen.
Also, in the
extDict
modes when there is no ext-dict, call the toplevel function instead of the force inlined one, to save on code size.
I'm specifically doing this because gcc on the parisc architecture doesn't
handle the large function body well, and ends up using a lot of excess
stack space. Outlining these functions fixes it.
I've measured neutral performance on gcc & clang, with maybe a very
slight speed win on gcc. I've also measured a 3 KB binary size win
on gcc, and 12 KB binary size win on clang from calling the outlined
ZSTD_compressBlock_{doubleF,f}ast()
in the extDict variant.