Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce function size in fast & dfast #2863

Merged
merged 1 commit into from
Nov 16, 2021
Merged

Conversation

terrelln
Copy link
Contributor

@terrelln terrelln commented Nov 16, 2021

Take the same approach as in PR #2828 [0] to remove functions that force
inline many function bodies and switch. Instead, create one function per
"template" combination, and then switch between these functions. This
allows the compiler to break the large function into many small
functions, which generally helps codegen.

Also, in the extDict modes when there is no ext-dict, call the top
level function instead of the force inlined one, to save on code size.

I'm specifically doing this because gcc on the parisc architecture doesn't
handle the large function body well, and ends up using a lot of excess
stack space. Outlining these functions fixes it.

I've measured neutral performance on gcc & clang, with maybe a very
slight speed win on gcc. I've also measured a 3 KB binary size win
on gcc, and 12 KB binary size win on clang from calling the outlined
ZSTD_compressBlock_{doubleF,f}ast() in the extDict variant.

terrelln added a commit to terrelln/zstd that referenced this pull request Nov 16, 2021
Improve the codegen for zstd fast & double fast by shrinking the
function size, which also improves the stack usage on some compilers.
See PR facebook#2863 [0] for details.

[0] facebook#2863
@Cyan4973
Copy link
Contributor

Looks good to me!

Seems there is one last preprocessor issue to take care of...

Take the same approach as in PR facebook#2828 [0] to remove functions that force
inline many function bodies and `switch`. Instead, create one function per
"template" combination, and then switch between these functions. This
allows the compiler to break the large function into many small
functions, which generally helps codegen.

Also, in the `extDict` modes when there is no ext-dict, call the top
level function instead of the force inlined one, to save on code size.

I'm specifically doing this because gcc on the parisc architecture doesn't
handle the large function body well, and ends up using a lot of excess
stack space. Outlining these functions fixes it.
@terrelln terrelln merged commit 8dba88c into facebook:dev Nov 16, 2021
terrelln added a commit to terrelln/linux that referenced this pull request Nov 16, 2021
Backport of upstream PR #2863 [0].

Large functions with excessive force inlining can cause trouble for
compilers, and can sometimes take excess stack space because the
compiler isn't able to fully analyze the function. This commit splits
functions that have multiple copies of the same body into multiple
smaller functions, which can help the compiler.

This was specifically causing issues on the parisc architecture [1].
In this configuration, especially with UBSAN enabled, these functions
stack usage could get quite large. This is because the compiler was
doing a poor job handling the extremely large function which had
multiple copies of the function body inlined into it. After this commit
we see:

[0] facebook/zstd#2863
[1] https://lkml.org/lkml/2021/11/15/710

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Nick Terrell <terrelln@fb.com>
terrelln added a commit to terrelln/zstd that referenced this pull request Nov 16, 2021
Improve the codegen for zstd fast & double fast by shrinking the
function size, which also improves the stack usage on some compilers.
See PR facebook#2863 [0] for details.

[0] facebook#2863
buzzcut-s pushed a commit to buzzcut-s/kernel-x86-5.15 that referenced this pull request Nov 18, 2021
Backport of upstream PR #2863 [0].

Large functions with excessive force inlining can cause trouble for
compilers, and can sometimes take excess stack space because the
compiler isn't able to fully analyze the function. This commit splits
functions that have multiple copies of the same body into multiple
smaller functions, which can help the compiler.

This was specifically causing issues on the parisc architecture [1].
In this configuration, especially with UBSAN enabled, these functions
stack usage could get quite large. This is because the compiler was
doing a poor job handling the extremely large function which had
multiple copies of the function body inlined into it. After this commit
we see:

[0] facebook/zstd#2863
[1] https://lkml.org/lkml/2021/11/15/710

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Nick Terrell <terrelln@fb.com>
buzzcut-s pushed a commit to buzzcut-s/kernel-x86-5.17 that referenced this pull request Mar 26, 2022
Backport of upstream PR #2863 [0].

Large functions with excessive force inlining can cause trouble for
compilers, and can sometimes take excess stack space because the
compiler isn't able to fully analyze the function. This commit splits
functions that have multiple copies of the same body into multiple
smaller functions, which can help the compiler.

This was specifically causing issues on the parisc architecture [1].
In this configuration, especially with UBSAN enabled, these functions
stack usage could get quite large. This is because the compiler was
doing a poor job handling the extremely large function which had
multiple copies of the function body inlined into it. After this commit
we see:

[0] facebook/zstd#2863
[1] https://lkml.org/lkml/2021/11/15/710

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Nick Terrell <terrelln@fb.com>
ptr1337 pushed a commit to CachyOS/linux that referenced this pull request Jun 4, 2022
Backport of upstream PR #2863 [0].

Large functions with excessive force inlining can cause trouble for
compilers, and can sometimes take excess stack space because the
compiler isn't able to fully analyze the function. This commit splits
functions that have multiple copies of the same body into multiple
smaller functions, which can help the compiler.

This was specifically causing issues on the parisc architecture [1].
In this configuration, especially with UBSAN enabled, these functions
stack usage could get quite large. This is because the compiler was
doing a poor job handling the extremely large function which had
multiple copies of the function body inlined into it. After this commit
we see:

[0] facebook/zstd#2863
[1] https://lkml.org/lkml/2021/11/15/710

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Nick Terrell <terrelln@fb.com>
Dark-Matter7232 pushed a commit to Dark-Matter7232/kernel_x86_laptop that referenced this pull request Oct 2, 2022
Backport of upstream PR #2863 [0].

Large functions with excessive force inlining can cause trouble for
compilers, and can sometimes take excess stack space because the
compiler isn't able to fully analyze the function. This commit splits
functions that have multiple copies of the same body into multiple
smaller functions, which can help the compiler.

This was specifically causing issues on the parisc architecture [1].
In this configuration, especially with UBSAN enabled, these functions
stack usage could get quite large. This is because the compiler was
doing a poor job handling the extremely large function which had
multiple copies of the function body inlined into it. After this commit
we see:

[0] facebook/zstd#2863
[1] https://lkml.org/lkml/2021/11/15/710

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Nick Terrell <terrelln@fb.com>
Dark-Matter7232 pushed a commit to Dark-Matter7232/kernel_x86_laptop that referenced this pull request Oct 2, 2022
Backport of upstream PR #2863 [0].

Large functions with excessive force inlining can cause trouble for
compilers, and can sometimes take excess stack space because the
compiler isn't able to fully analyze the function. This commit splits
functions that have multiple copies of the same body into multiple
smaller functions, which can help the compiler.

This was specifically causing issues on the parisc architecture [1].
In this configuration, especially with UBSAN enabled, these functions
stack usage could get quite large. This is because the compiler was
doing a poor job handling the extremely large function which had
multiple copies of the function body inlined into it. After this commit
we see:

[0] facebook/zstd#2863
[1] https://lkml.org/lkml/2021/11/15/710

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Nick Terrell <terrelln@fb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants