Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basis_universal: Unbundle jpgd to fix symbol conflict, use our newer copy with SSE2 support #88508

Merged
merged 1 commit into from
Feb 20, 2024

Conversation

akien-mga
Copy link
Member

Should fix the basis_universal LTO warning from #88504:

thirdparty/basis_universal/encoder/jpgd.h:123:15: warning: type 'struct jpeg_decoder' violates the C++ One Definition Rule [-Wodr]
  123 |         class jpeg_decoder
      |               ^
thirdparty/jpeg-compressor/jpgd.h:124: note: a different type is defined in another translation unit
  124 |         class jpeg_decoder
      | 
thirdparty/jpeg-compressor/jpgd.h:276: note: the first difference of corresponding definitions is field 'm_has_sse2'
  276 |                 bool m_has_sse2;
      | 
thirdparty/jpeg-compressor/jpgd.h:124: note: a type with different number of fields is defined in another translation unit
  124 |         class jpeg_decoder
      |

Our thirdparty/jpeg-compressor copy is the latest from https://github.com/richgel999/jpeg-compressor, which has some additional commits not present in the basisu version. I'll see if Rich is interested in updating this dep upstream too.

CC @BlueCube3310

@akien-mga akien-mga added enhancement topic:thirdparty cherrypick:4.2 Considered for cherry-picking into a future 4.2.x release labels Feb 18, 2024
@akien-mga akien-mga added this to the 4.3 milestone Feb 18, 2024
@akien-mga akien-mga requested review from a team as code owners February 18, 2024 17:52
@akien-mga akien-mga force-pushed the basisu_unbundled_jpgd branch from 75ba36d to 48ed047 Compare February 18, 2024 17:55
Copy link
Contributor

@BlueCube3310 BlueCube3310 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I don't think Godot even uses BasisU's file loading feature, so nothing should break.

@Calinou
Copy link
Member

Calinou commented Feb 18, 2024

This seems to make the binary slightly larger for some reason (Linux x86_64 editor build with optimize=speed lto=full and stripped symbols):

107,091,040  godot.linuxbsd.editor.x86_64  <-- This PR
107,086,912  godot.linuxbsd.editor.x86_64.master

I'd have expected the binary to be slightly smaller instead.

@Riteo
Copy link
Contributor

Riteo commented Feb 18, 2024

@Calinou it's probably because this version has more changes and thus has probably some new code additions which translate into the binary size.

@akien-mga
Copy link
Member Author

akien-mga commented Feb 18, 2024

That's pretty weird, but I can reproduce the size difference too with scons production=yes builds on Fedora 39:

$ ls -l bin/godot.linuxbsd.editor.x86_64*
-rwxr-xr-x. 1 akien akien 100693632 Feb 18 22:10 bin/godot.linuxbsd.editor.x86_64.master
-rwxr-xr-x. 1 akien akien 100701824 Feb 18 22:11 bin/godot.linuxbsd.editor.x86_64.pr88508

It's a bit counterintuitive...

@akien-mga
Copy link
Member Author

So I think the size difference can be explained by the SSE2 support code present in thirdparty/jpeg-compressor which didn't exist in the basisu copy.

I made a test build of this PR + the SSE2 support disabled, and the size exactly the same as master:

-rwxr-xr-x. 1 akien akien 100693632 Feb 18 22:19 bin/godot.linuxbsd.editor.x86_64.master
-rwxr-xr-x. 1 akien akien 100701824 Feb 18 22:11 bin/godot.linuxbsd.editor.x86_64.pr88508
-rwxr-xr-x. 1 akien akien 100693632 Feb 18 22:25 bin/godot.linuxbsd.editor.x86_64.pr88508+no_sse2

So this means this LTO warning wasn't benign, but actually a sign that the basisu copy of jpgd was taking precedence over the copy in jpeg-compressor, and thus we were losing the (enabled by default) SSE2 support.

So this should fix a bug for regular JPEG decompression, which may become faster? The size increase comes from the added SSE2 code.

@akien-mga akien-mga added bug and removed enhancement labels Feb 18, 2024
@akien-mga akien-mga changed the title basis_universal: Unbundle jpgd, use our newer copy basis_universal: Unbundle jpgd, use our newer copy with SSE2 support Feb 18, 2024
@akien-mga akien-mga changed the title basis_universal: Unbundle jpgd, use our newer copy with SSE2 support basis_universal: Unbundle jpgd to fix symbol conflict, use our newer copy with SSE2 support Feb 18, 2024
@akien-mga
Copy link
Member Author

Ran a test import 500 JPGs with master, this PR, and this PR + JPG_USE_SSE2=0 which should be equivalent to master...

Running on a Linux laptop with balanced CPU profile, on AC. The results aren't exactly clear.

$ hyperfine -iw3 -p "rm -rf .godot" "~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.master -e" "~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.pr88508 -e" "~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.pr88508+no_sse2 -e"
Benchmark 1: ~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.master -e
  Time (mean ± σ):     20.479 s ±  0.232 s    [User: 175.239 s, System: 8.655 s]
  Range (min … max):   20.079 s … 20.864 s    10 runs
 
Benchmark 2: ~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.pr88508 -e
  Time (mean ± σ):     19.680 s ±  0.384 s    [User: 167.595 s, System: 8.489 s]
  Range (min … max):   18.658 s … 20.009 s    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.
 
Benchmark 3: ~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.pr88508+no_sse2 -e
  Time (mean ± σ):     19.678 s ±  0.203 s    [User: 168.102 s, System: 8.064 s]
  Range (min … max):   19.456 s … 20.125 s    10 runs

Summary
  ~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.pr88508+no_sse2 -e ran
    1.00 ± 0.02 times faster than ~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.pr88508 -e
    1.04 ± 0.02 times faster than ~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.master -e

Same benchmark with performance CPU profile:

$ hyperfine -iw2 -r5 -p "rm -rf .godot" "~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.master -e" "~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.pr88508 -e" "~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.pr88508+no_sse2 -e"
Benchmark 1: ~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.master -e
  Time (mean ± σ):     16.587 s ±  0.103 s    [User: 143.993 s, System: 6.877 s]
  Range (min … max):   16.523 s … 16.766 s    5 runs
 
Benchmark 2: ~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.pr88508 -e
  Time (mean ± σ):     16.158 s ±  0.145 s    [User: 138.665 s, System: 6.785 s]
  Range (min … max):   15.975 s … 16.380 s    5 runs
 
Benchmark 3: ~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.pr88508+no_sse2 -e
  Time (mean ± σ):     16.084 s ±  0.333 s    [User: 138.148 s, System: 7.016 s]
  Range (min … max):   15.512 s … 16.323 s    5 runs
 
Summary
  ~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.pr88508+no_sse2 -e ran
    1.00 ± 0.02 times faster than ~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.pr88508 -e
    1.03 ± 0.02 times faster than ~/Godot/godot/bin/godot.linuxbsd.editor.x86_64.master -e

So there seems to be a significant JPG import performance gain between this PR (with or without SSE2) and master, though I can't really explain it. The difference between pr88508 and pr88508+no_sse2 doesn't seem meaningful, but then it seems to mean the SSE2 code doesn't bring significant performance improvements for me (on x86_64).

@akien-mga akien-mga merged commit b167113 into godotengine:master Feb 20, 2024
16 checks passed
@akien-mga akien-mga deleted the basisu_unbundled_jpgd branch February 20, 2024 09:36
@akien-mga
Copy link
Member Author

Cherry-picked for 4.2.2.

@akien-mga akien-mga removed the cherrypick:4.2 Considered for cherry-picking into a future 4.2.x release label Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants