-
-
Notifications
You must be signed in to change notification settings - Fork 746
Added micro-optimizations to std.stdio.LockingTextWriter.put #6091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added micro-optimizations to std.stdio.LockingTextWriter.put #6091
Conversation
|
Thanks for your pull request, @JackStouffer! Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. |
std/stdio.d
Outdated
| char[4] buf = void; | ||
| immutable len = encode(buf, c); | ||
| foreach (i ; 0 .. len) | ||
| immutable len = encode!(UseReplacementDchar.yes)(buf, c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I changed this for two reasons,
- to be in-line with the lines above which don't throw
- DMD can forgo adding the exception-handling code if the function is marked as nothrow, which encode is when using replacement dchars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should never be silently accepting/converting garbage! UseReplacementDchar.yes needs to be used with a lot of caution. If this does what I think it does (replace any kind of bad UTF with a single replacement character), then it's a bad idea. See https://issues.dlang.org/show_bug.cgi?id=14519
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I see you're just copying the above, which you've added in #5083 to replace toUtf8 which did the same thing, but I think we need to change this. Silently ignoring errors and throwing away information is never a good idea.
544bb7b to
0b1f00a
Compare
std/stdio.d
Outdated
| char[4] buf; | ||
| immutable size = encode!(UseReplacementDchar.yes)(buf, c); | ||
| foreach (i ; 0 .. size) | ||
| for (int i; i < size; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is for really faster than foreach here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
foreach makes three extra ints
int size = 5;
foreach (i; 0 .. size)
{
}
for (int j = 0; j < size; ++j)
{
}becomes
int size = 5;
{
int __key46 = 0;
int __limit47 = size;
for (; __key46 < __limit47; __key46 += 1)
{
int i = __key46;
}
}
{
int j = 0;
for (; j < size; j += 1)
{
{
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that actually affect generated code, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My knowledge of ASM is limited, but it sure looks like it does
foreach
.text._Dmain segment
assume CS:.text._Dmain
_Dmain:
push RBP
mov RBP,RSP
sub RSP,010h
mov EAX,5
mov -010h[RBP],EAX
mov dword ptr -0Ch[RBP],0
mov -8[RBP],EAX
L1A: mov ECX,-0Ch[RBP]
cmp ECX,-8[RBP]
jge L2D
mov EDX,-0Ch[RBP]
mov -4[RBP],EDX
inc dword ptr -0Ch[RBP]
jmp short L1A
L2D: xor EAX,EAX
leave
ret
add [RAX],AL
add [RAX],AL
.text._Dmain endsfor
.text._Dmain segment
assume CS:.text._Dmain
_Dmain:
push RBP
mov RBP,RSP
sub RSP,010h
mov dword ptr -8[RBP],5
mov dword ptr -4[RBP],0
L16: mov EAX,-4[RBP]
cmp EAX,-8[RBP]
jge L23
inc dword ptr -4[RBP]
jmp short L16
L23: xor EAX,EAX
leave
ret
add 0[RCX],AL
.text._Dmain endsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the number of int declarations may not have any correlation with the actual number of stack variables allocated by the asm. Given the simplicity of the loops above, I'm expecting that even dmd would enregister the loop variables, so the argument about the number of int declarations isn't even relevant. Again, please show the asm to prove that the for-loop is better than the foreach loop. It's a step backwards in readability/maintainability, and I don't think we should do it unless it's a clear win across the board for dmd/gdc/ldc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@quickfur , Jack posted some asm above, do you not see his comment? #6091 (comment)
@JackStouffer I'm guessing that's without -O? Phobos is built with -O -release, so we should be comparing disassemblies from code built with those flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I must have been looking at a stale page when I wrote my comment.
But yeah, it does look like that's without -O. For performance comparisons dmd -O is really the bare minimum; I would actually prefer to see what gdc/ldc do with the same code, because they have better loop optimizations than dmd. (Though in the latter case the challenge becomes how to prevent overly-aggressive optimizations like in ldc, where the optimizer just runs the whole thing at compile-time and substitutes the answer directly. :-P)
std/stdio.d
Outdated
| char[4] buf = void; | ||
| immutable len = encode(buf, c); | ||
| foreach (i ; 0 .. len) | ||
| immutable len = encode!(UseReplacementDchar.yes)(buf, c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should never be silently accepting/converting garbage! UseReplacementDchar.yes needs to be used with a lot of caution. If this does what I think it does (replace any kind of bad UTF with a single replacement character), then it's a bad idea. See https://issues.dlang.org/show_bug.cgi?id=14519
0b1f00a to
9a0e34a
Compare
|
@CyberShadow Removed the encoding change from this PR |
|
Thanks! I did not object to it considering the circumstances, but I do think it's not a direction we should be moving towards. |
| for (; !writeme.empty; writeme.popFront) | ||
| { | ||
| put(c); | ||
| put(writeme.front); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this actually makes a difference in the generated asm? I'd like to see the asm output before approving this.
Using raw for-loops instead of foreach IMO is a step backward, and should not be done unless there's a very good reason for it. My impression is that a properly-optimizing compiler would eliminate the supposedly-redundant loads easily. (And if not, it's the compiler that should be fixed; the user code should not be uglified to work around a compiler deficiency.) Don't be so quick to believe intermediate pseudo-source code representations of the compiled code; always check the asm to see what's actually being executed at runtime.
Also, I'd like to see what gdc/ldc outputs for both the foreach loop and the raw for loop. Dmd's optimizer isn't the best when it comes to loop optimizations.
|
@quickfur @CyberShadow DMD properly optimizes the However, with ranges DMD can't get rid of the copy because it would change the behavior of the code With this code import std.utf;
void main()
{
int res;
auto m = "test".byCodeUnit;
for (; !m.empty; m.popFront)
{
res += m.front;
}
}This is produced .text._Dmain segment
assume CS:.text._Dmain
_Dmain:
push RBP
mov RBP,RSP
sub RSP,010h
lea RDX,_TMP0@PC32[RIP]
mov EDI,4
mov RSI,RDX
call pure nothrow @nogc @safe std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[])@PLT32
mov -010h[RBP],RAX
mov -8[RBP],RDX
lea RDI,-010h[RBP]
call const pure nothrow @property @nogc @safe bool std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl.empty()@PLT32
xor AL,1
je L50
L31: lea RDI,-010h[RBP]
call inout pure nothrow ref @property @nogc @safe immutable(char) std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl.front()@PLT32
lea RDI,-010h[RBP]
call pure nothrow @nogc @safe void std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl.popFront()@PLT32
lea RDI,-010h[RBP]
call const pure nothrow @property @nogc @safe bool std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl.empty()@PLT32
xor AL,1
jne L31
L50: xor EAX,EAX
mov RSP,RBP
pop RBP
ret
add 0[RCX],AH
.text._Dmain endsbut this code import std.utf;
void main()
{
int res;
auto m = "test".byCodeUnit;
foreach (c; m)
{
res += c;
}
}produces this .text._Dmain segment
assume CS:.text._Dmain
_Dmain:
push RBP
mov RBP,RSP
sub RSP,020h
lea RDX,_TMP0@PC32[RIP]
mov EDI,4
mov RSI,RDX
call pure nothrow @nogc @safe std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[])@PLT32
mov -010h[RBP],RAX
mov -8[RBP],RDX
mov RAX,-010h[RBP]
mov -020h[RBP],RAX
mov RCX,-8[RBP]
mov -018h[RBP],RCX
lea RDI,-020h[RBP]
call const pure nothrow @property @nogc @safe bool std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl.empty()@PLT32
xor AL,1
je L60
L41: lea RDI,-020h[RBP]
call inout pure nothrow ref @property @nogc @safe immutable(char) std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl.front()@PLT32
lea RDI,-020h[RBP]
call pure nothrow @nogc @safe void std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl.popFront()@PLT32
lea RDI,-020h[RBP]
call const pure nothrow @property @nogc @safe bool std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl.empty()@PLT32
xor AL,1
jne L41
L60: xor EAX,EAX
mov RSP,RBP
pop RBP
ret
add 0[RCX],AH
.text._Dmain ends |
That's less unexpected; my question was about the - foreach (i ; 0 .. size)
+ for (int i; i < size; ++i) change, which seemed excessive. |
9a0e34a to
93129f5
Compare
|
You know, I just noticed that loop does auto-decoding on all wide strings. Initially I thought that it was only operating on ranges. I'm going to close this and open a new PR which fixes both this issue and the wide string issue. Thanks @CyberShadow and @quickfur for the review. |
Using the AST viewer on run.dlang.io, I noticed that foreach loops do a lot of extra copying that isn't always necessary. E.g. this
becomes
This PR changes some loops to be for loops without extra copying.