-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix huge inline bloat msvc sv inline h #22667
Open
bulk88
wants to merge
3
commits into
Perl:blead
Choose a base branch
from
bulk88:fix_huge_inline_bloat_msvc_sv_inline_h
base: blead
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+3,539
−36
Commits on Oct 14, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c593dad - Browse repository at this point
Copy the full SHA c593dadView commit details -
Configuration menu - View commit details
-
Copy full SHA for a98b094 - Browse repository at this point
Copy the full SHA a98b094View commit details
Commits on Oct 18, 2024
-
wip#3 success!!!, new rule, "struct body_details bodies_by_type[]" on…
…ly in sv.c -svfix.pl is quick throwaway garbage done in 20 mins and probably doesnt regen sv_inline.h properly and copy pasting/replace regexps were was used anyway to fix up the code, it can be rewritten correctly tho and put in as a official regen.pl script how to finish this fix, options set and unset sv_type as a CPP macro then -#include a header 17 times that contains ONLY only Perl_newSV_type() and its mortal() sister creating 17*2 static inline fns (basically what I did here), code is stepable, extra ms'es of I/O build times perf degrade debate may or may not come up, I hope the CC has a sane in memory cache for .h files and doesn't go back to the kernel or put the entire Perl_newSV_type() fnc in a #define "#define PETS blah(foo(myarg)) + \ cat(dog(fur)) + \ laser(ball(toy)) " then execute that macro 17 times or write 17 Perl_newSV_type() copies into sv_inline.h with /regen.pl infrastructure (fastest for build speed core and build speed CPAN and code is c dbg stepable) OR is a dedicated "sv_newg.h" for regen.pl needed? does the master Perl_newSV_type() template live in a .pl or a .h? i dont have an opinion or against concept of sv_inline.h just have 5-10 hand written versions sv type specific of Perl_newSV_type(), its a cheap gimick fix to keep all 17 types together mashed with if/else/switch in 1 func and expecting bug free perfection from LTO engines of various C compilers, and expecting perfection from an single vendor LTO engine is very against the spirit of portable code -todo ideas, turn those super long #define ==?:==?:==?: into char array/struct initializers, stored in macros, one faux-string per each column of struct body_details, use that macro as c auto stk rw array initializer, then do the U32 len [3] = "\x01\x02\x03"[sv_type]; or U8 sizes [3] = {1,2,3}; U32 len = sizes[sv_type]; which in perl core would look U32 arena_size = SVDB_AR_SZ_DECL; U32 len = arena_size[sv_type]; maybe VC will optimize those since no global memory is used. Only Perl_newSV_typeX() needs this. in this commit static inline Perl_newSV_typeX(pTHX_ const svtype type) which is the ONLY Perl_newSV_type*() variant that take an arbitrary svtype arg, this is the fallback for gv_pvn_add_by() since I couldn't "const" that call to newSV_type() cuz gv_pvn_add_by() is only place in the whole core that takes a random SV type number. Internals of Perl_newSV_typeX() are trashy, here is an example, MSVC DID not turn this into a jump table but instead 17 test/cond_jump ops. v5 = (char *)S_new_body(v2); v6 = 40i64; if ( v2 == 15 ) v6 = 136i64; if ( v2 == 14 ) v6 = 104i64; if ( v2 == 13 ) v6 = 104i64; if ( v2 == 12 ) v6 = 32i64; if ( v2 == 11 ) v6 = 40i64; if ( v2 == 10 ) v6 = 80i64; if ( v2 == 9 ) v6 = 48i64; if ( v2 == 8 ) v6 = 224i64; if ( v2 == 7 ) v6 = 48i64; if ( v2 == 6 ) v6 = 32i64; if ( v2 == 5 ) v6 = 24i64; if ( v2 == 4 ) v6 = 40i64; if ( v2 == 3 ) v6 = 16i64; if ( v2 == 2 ) v6 = 0i64; if ( v2 == 1 ) v6 = 0i64; memset(v5, 0, v6 & -(signed __int64)(v2 != 0)); Solution is move Perl_newSV_typeX() to sv.c, and let it be struct body_details driven. Cuz it only purpose is when newSV_type() absolutly CAN NOT be constant folded (random number input). it only has 1 caller in core. S_new_body() properly const folded away in 99% of cases except for TWO callers Perl_newSV_typeX() and Perl_make_trie(). Perl_make_trie() failure to inline is bizzare, since Perl_make_trie() internally does "v9 = S_new_body(SVt_PVAV);" and DID inline Perl_newSV_typeSVt_PVAV() !!! and therefore Perl_make_trie() has the AV field initing/nulling code. Here is the "optimized" contents of S_new_body(), its junk performance/design wise (but runtime correct/no bugs) void **__fastcall S_new_body(svtype sv_type) { svtype v1; // er9 __int64 v2; // rbx void **result; // rax signed int v4; // ecx signed __int64 v5; // rax v1 = sv_type; v2 = sv_type; result = (void **)PL_body_roots[sv_type]; if ( !result ) { v4 = 4080; if ( v1 == 15 ) v4 = 3264; if ( v1 == 14 ) v4 = 2080; if ( v1 == 13 ) v4 = 4056; if ( v1 == 12 ) v4 = 4064; if ( v1 == 11 ) v4 = 4080; if ( v1 == 10 ) v4 = 4080; if ( v1 == 9 ) v4 = 4080; if ( v1 == 8 ) v4 = 4032; if ( v1 == 7 ) v4 = 4080; if ( v1 == 6 ) v4 = 3296; if ( v1 == 5 ) v4 = 3424; if ( v1 == 4 ) v4 = 3120; if ( v1 == 3 ) v4 = 3536; if ( v1 == 2 ) v4 = 0; if ( v1 == 1 ) v4 = 0; v5 = 40i64; if ( v1 == 15 ) v5 = 136i64; if ( v1 == 14 ) v5 = 104i64; if ( v1 == 13 ) v5 = 104i64; if ( v1 == 12 ) v5 = 32i64; if ( v1 == 11 ) v5 = 40i64; if ( v1 == 10 ) v5 = 80i64; if ( v1 == 9 ) v5 = 48i64; if ( v1 == 8 ) v5 = 224i64; if ( v1 == 7 ) v5 = 48i64; if ( v1 == 6 ) v5 = 32i64; if ( v1 == 5 ) v5 = 24i64; if ( v1 == 4 ) v5 = 40i64; if ( v1 == 3 ) v5 = 16i64; if ( v1 == 2 ) v5 = 0i64; if ( v1 == 1 ) v5 = 0i64; result = (void **)Perl_more_bodies(v1, v5 & -(signed __int64)(v1 != 0), v4 & (unsigned int)-(v1 != 0)); } PL_body_roots[v2] = *result; return result; } ------------ disassembly view of S_new_body() ------------ cmp r9d, 0Fh lea edi, [rbp+28h] mov r8d, 0FF0h lea r11d, [rbp+20h] mov edx, 0CC0h lea r10d, [rbp+30h] mov ecx, r8d mov eax, r9d cmovz ecx, edx cmp r9d, 0Eh mov edx, 820h cmovz ecx, edx cmp r9d, 0Dh lea edx, [r8-18h] cmovz ecx, edx cmp r9d, 0Ch lea edx, [r8-10h] cmovz ecx, edx cmp r9d, 0Bh lea edx, [r8-30h] cmovz ecx, r8d cmp r9d, 0Ah cmovz ecx, r8d cmp r9d, 9 cmovz ecx, r8d cmp r9d, 8 cmovz ecx, edx cmp r9d, 7 mov edx, 0CE0h cmovz ecx, r8d cmp r9d, 6 cmovz ecx, edx cmp r9d, 5 mov edx, 0D60h cmovz ecx, edx cmp r9d, 4 mov edx, 0C30h cmovz ecx, edx cmp r9d, 3 mov edx, 0DD0h cmovz ecx, edx cmp r9d, 2 lea edx, [rdi+60h] cmovz ecx, ebp cmp r9d, 1 cmovz ecx, ebp neg eax sbb eax, eax and eax, ecx mov ecx, r9d mov r8d, eax cmp r9d, 0Fh mov eax, edi cmovz eax, edx cmp r9d, 0Eh lea edx, [rbp+68h] cmovz eax, edx cmp r9d, 0Dh cmovz eax, edx cmp r9d, 0Ch lea edx, [rbp+50h] cmovz eax, r11d cmp r9d, 0Bh cmovz eax, edi cmp r9d, 0Ah cmovz eax, edx cmp r9d, 9 mov edx, 0E0h cmovz eax, r10d cmp r9d, 8 cmovz eax, edx cmp r9d, 7 lea edx, [rbp+18h] cmovz eax, r10d cmp r9d, 6 cmovz eax, r11d cmp r9d, 5 cmovz eax, edx cmp r9d, 4 lea edx, [rbp+10h] cmovz eax, edi cmp r9d, 3 cmovz eax, edx cmp r9d, 2 cmovz eax, ebp cmp r9d, 1 cmovz eax, ebp neg ecx mov ecx, r9d sbb rdx, rdx and rdx, rax call Perl_more_bodies ---------------------- 17 test ops and 17 conditional_move_constant_8_bits ops solution, turn S_new_body() back into a macro so no CC ever tries to ref-inline it. It was a macro before sv_inline.h branch was merged TODO add XSApitest.xs that worlds longest macros are identical to the master correct copy (struct body_details). byte size drops from before these 3 commits to this "success commit" mp.exe 0x1241AC-0x1224EC=7360 0x19D3D8-0x19B8E8=6896 p541.dll 0x154886-0x1532A6=5600 0x1AA19E-0x1A862E=7024 BEFORE Dump of file ..\miniperl.exe SECTION HEADER Perl#1 .text name 1241AC virtual size SECTION HEADER Perl#2 .rdata name 19D3D8 virtual size Dump of file ..\perl541.dll SECTION HEADER Perl#1 .text name 154886 virtual size SECTION HEADER Perl#2 .rdata name 1AA19E virtual size AFTER Dump of file ..\perl541.dll SECTION HEADER Perl#1 .text name 1532A6 virtual size SECTION HEADER Perl#2 .rdata name 1A862E virtual size Dump of file ..\miniperl.exe SECTION HEADER Perl#1 .text name 1224EC virtual size SECTION HEADER Perl#2 .rdata name 19B8E8 virtual size
Configuration menu - View commit details
-
Copy full SHA for afe1551 - Browse repository at this point
Copy the full SHA afe1551View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.