-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perl_sv_clear: unrolled SvREFCNT_dec and sv_free2....isn't that?! #22798
Comments
@iabyn, it appears that much of this section of code entered the codebase back in 2010 in this commit:
Can you take a look? |
I've never liked svrefcountdec's or sv_free2()'s prototype, Its been on my mind since forever that it looks sketchy, and its putting too much logic and bloat in its one billion caller fns. Why can't it read the refcount field itself? why do all callers have to push that refcount arg on the reg-stack? Yes by chance AMD64 register R8 sneaks into the CPU's Win64 prototype and passes thru to callee frame without a dedicated mov op, but that just one cpu and one os. and doesn't apply for Win32 X86. Not going to git dig it, but I think that badly designed 3th arg, is only used for process-exiting, yes proc termination, why do all the caller's need that 3rd arg to be inlined into them? |
Another thought, perl does
Intel is playing catch up adding 3 part operators to x64 in 2023 https://en.wikipedia.org/wiki/X86#APX_(Advanced_Performance_Extensions) So perl's sv_ref_dec() should really be doing current blead
better not perfect
perfection would be just like MS COM ABI does its ref counting since x86/x64 is CISC and includes a free store to RAM, inside
If sv_free2() needs to ++ it for perl api global memory sanity, oh well just ++ again, don't burden the one billion callers that inline sv_ref_dec() . The ++ is meaningless for perf if your about to call the fat on any OS libc Your not saving anything perf wise trying to skip the assignment to RAM. The cache line is already sitting in secret registers, the cache line addr in DDR ram is already SMP lock-held in the "northbridge". Writes from a core to DDR are already async for decades. |
The interp has an addiction to malloc mem and mortal stack when C autos are appropriate. examples https://github.com/Perl/perl5/blob/blead/malloc.c#L2127 [ Why on earth is this not # ifdef rmved on Win32 when Perl's Unix putenv just calls Perl Win32's P5P controlled backend putenv at https://github.com/Perl/perl5/blob/blead/win32/win32.c#L2415 ? why does the backend win32 P5P putenv malloc and copy the string before the searching it? not even knowing if it will ever use the modified 2nd copy !!! why does the perl api act like perl's front end p5p putenv() must accept RO stored const strings when we knows its RW memory and trappable perl exceptions cant happen in syscalls? why does Win32 Perl, even bother converting from Microsoft-ese PP Native putenv API ( Perl's malloc addiction is is made exponentially worse by khw's semi-recent fixes to serialize and de-race condition Unix and Win32 libc locale API vs interp's locale API vs OS getcwd() and friends. example https://github.com/Perl/perl5/blob/blead/locale.c#L5206 khw's fixes are constantly making new malloc buffers nested as layers of strings process tools/fn calls get applied to get the final correct behavior needed. All these new malloc buffers are obviously added to save stack or mortal, then tossed/libc free() within a dozen microseconds. Note khw's code is implementing Unix On my Win32 blead perl, if I look through a process memory dump of Another really big perl XS C api design problem is, perl's keywords are correctly POSIX analogs, but over the last 25 years perl is gaining more and more bug fixes and new features to the PP keywords. Perl's middle layer C code, and XS author facing code/API/func call prototypes, keep adding 1970 Unix, C prototype arguments in new no-CPAN not exported functions, and sometimes in the CPAN-approved API. 1970 Unix API mandates 2 horrible API requirements, #1 all incoming char *s, are assumed to be immutable RO C strings owned by the fn's caller. #2 its sacrilegious desecration of a holy book to spend a precious 1, 2 or 4 bytes to record a strings length in RAM or spinning rust. Pascal is hellbent on genocide of the unix people, its a struggle for survival. Over the last 25 years Perl's middle C layer keeps implementing/adding more and more code, using 1970 Unix C strings, instead of moving around SV *s, which SOLVE all those 1970 problems. Its just bad to keep extending this, instead of leaving creation of the Perl C API does need some more thinking tho on SV* API for "RO" caller owned SV* arg-mode vs ownership takeover SV* arg-mode. ISO C I've profiled gmake as spending 15% of all CPU usage in libc strlen(), and 24% of all CPU usage in its string hash algorithm loop. gmake's code will never write I'd advise for all P5 core devs to once in a while, get a hot cup of coffee, start your C debugger, disable profiling/stepping of libc.so in your C debugger, put a rock on key F11, and watch what perl C code flashes by for a good 3 or 5 minutes while sipping something. And think are those lines of C code "justified" or not. You never know where that train will take you in perl C core. Thats how I write all my misc core PRs. |
I really want this issue to just be about the code following To achieve changes on those other topics, focussed Issues or Pull Requests (or topics on the mailing list) seem like better vehicles to me.
It's not immediately obvious that this is safe. From looking at |
I think that section of code made sense at the time. Changes to |
FWIW This builds and the test harness passes, but it might not be the best way to update this call site:
|
+1
For me this passed on an unthreaded build on Linux using both Tail of
|
Those compiler error both indicate that 2 arguments were passed where 3 were expected. Basically, Perl_sv_free2(aTHX_ sv, 0); |
On Fri, Nov 29, 2024 at 05:02:30PM -0800, Richard Leach wrote:
**Description**
Towards the bottom of `Perl_sv_clear`, which is frequently a very hot function, there is the following comment:
`/* unrolled SvREFCNT_dec and sv_free2 follows: */`
Sounds good, makes sense that it would unroll `SvREFCNT_dec` and `sv_free2`.
But that's not actually what it does:
```
/* unrolled SvREFCNT_dec and sv_free2 follows: */
if (!sv)
continue;
if (!SvREFCNT(sv)) {
sv_free(sv);
continue;
}
.....
```
At least nowadays, `sv_free` means calling `Perl_sv_free`:
```
void
Perl_sv_free(pTHX_ SV *const sv)
{
SvREFCNT_dec(sv);
}
```
`SvREFCNT_dec` is an inline function in `sv_inline.h`:
```
PERL_STATIC_INLINE void
Perl_SvREFCNT_dec(pTHX_ SV *sv)
{
if (LIKELY(sv != NULL)) {
U32 rc = SvREFCNT(sv);
if (LIKELY(rc > 1))
SvREFCNT(sv) = rc - 1;
else
Perl_sv_free2(aTHX_ sv, rc);
}
}
```
So instead of unrolling `SvREFCNT_dec` and `sv_free2`, it calls a function (probably inlined)
to call `SvREFCNT_dec` which likely will call `sv_free2`. That's not unrolled at all!
The likely impact of this is some SV freeing is taking longer than it strictly has to. Looking at
gcov coverage when running the test harness, there are 35099459 calls to `Perl_sv_free` and
797851111 calls to `Perl_sv_free2`, so maybe ~4% of cleared SVs.
It does in fact partially unroll it. Let me explain...
Normally when you do SvREFCNT_dec(sv), you expect SvREFCNT(sv) to be >= 1.
If the ref count is zero at that point it usually implies either that
something's gone horribly wrong, or that you're in global cleanup and have
artificially lowered the refcnt (and set SVf_BREAK) to force the freeing
of even things which are in reference loops etc.
So the commonly-used SvREFCNT_dec() macro (and its variants) have two
levels of "handle the common thing" optimisation. Rather than (as was the
case long ago) just being a wrapper around a call to sv_free(), they check
for the reasonably common case of RC > 1, and if so, update the RC without
calling a function. For the two cases of RC == 1 and RC == 0, they call
sv_free2(), which was written specifically as a helper function for
SvREFCNT_dec(), and which expects to be called only for the 0 and 1 cases.
sv_free2() does the second level of "the common thing" optimisation: it
assumes that's very likely that RC==1. In this case it does the minimum
handling needed: resurrect immortals, handle SvTEMP being on, etc, then
clears the SV directly with sv_clear() and del_SV().
Only for the exceptional case of RC == 0 does it check for lots more
special cases, such as SVf_BREAK, PL_in_clean_all etc.
Now, at same point I heavily tweaked sv_clear() to make it free aggregates
such as arrays and hashes in an iterative rather than recursive manner.
This used to mean that deeply-nested perl data structures could blow the C
stack when being freed.
Part of that effort was to replace a call to SvREFCNT_dec() - which freed
the elements of an array/hash which itself was being freed - with the main
body of sv_free2(). The code comment "unrolled SvREFCNT_dec and sv_free2
follows" actually refers to the next 22 or so lines, not to just the next
6 or so lines you quoted. I.e it's referring to inlining the common checks
for immortal, SvTEMP etc, and doing a recursive call to sv_free() only for
the *exceptional* case of RC==0. For the normal RC==1 case, it breaks,
returning control to the outer loop of sv_clear(), which clears that sv.
Perhaps that code comment should be changed to something like:
/* Do the equivalent of SvREFCNT_dec(sv), except:
- for the case of RC==1, inline the actions normally taken
by sv_free2() prior it calling sv_clear(), and handle the
sv_clear() actions ourselves (without needing to
recurse).
- For the exceptional case of RC==0, do a traditional
recursive free.
*/
Arguably the sv_free() could be replaced with SvREFCNT_dec_NN() or even
sv_free2(sv, 0).
…--
The Enterprise is captured by a vastly superior alien intelligence which
does not put them on trial.
-- Things That Never Happen in "Star Trek" #10
|
Description
Towards the bottom of
Perl_sv_clear
, which is frequently a very hot function, there is the following comment:/* unrolled SvREFCNT_dec and sv_free2 follows: */
Sounds good, makes sense that it would unroll
SvREFCNT_dec
andsv_free2
.But that's not actually what it does:
At least nowadays,
sv_free
means callingPerl_sv_free
:SvREFCNT_dec
is an inline function insv_inline.h
:So instead of unrolling
SvREFCNT_dec
andsv_free2
, it calls a function (probably inlined)to call
SvREFCNT_dec
which likely will callsv_free2
. That's not unrolled at all!The likely impact of this is some SV freeing is taking longer than it strictly has to. Looking at
gcov coverage when running the test harness, there are 35099459 calls to
Perl_sv_free
and797851111 calls to
Perl_sv_free2
, so maybe ~4% of cleared SVs.Expected behavior
I haven't sat down to figure this out. It seems like the status quo in
Perl_sv_clear
must bewrong though and either the comment should be amended, (more likely) the call to
sv_free
is intended to be a recursive call back to
sv_free2
, or some other code change should happen.Besides
Perl_sv_clear
, only ext/Opcode/Opcode.xs and dist/Storable/Storable.xs seemto call
sv_free
directly. Probably they should be usingSvREFCNT_dec
or callingsv_free2
instead.
Perl configuration
blead
The text was updated successfully, but these errors were encountered: