Added micro-optimizations to std.stdio.LockingTextWriter.put #6091

JackStouffer · 2018-01-29T16:05:09Z

Using the AST viewer on run.dlang.io, I noticed that foreach loops do a lot of extra copying that isn't always necessary. E.g. this

auto m = "test".takeExactly(3);
foreach (c; m)
{
}

immutable size = 5;
foreach (i; 0 .. size)
{
}

becomes

Result m = takeExactly("test", 3LU)
{
	Result __r71 = m;
	for (; !__r71.empty(); __r71.popFront())
	{
		dchar c = __r71.front();
	}
}

immutable immutable(int) size = 5;
{
	int __key72 = 0;
	int __limit73 = 5;
	for (; __key72 < __limit73; __key72 += 1)
	{
		int i = __key72;
	}
}

This PR changes some loops to be for loops without extra copying.

dlang-bot · 2018-01-29T16:05:09Z

Thanks for your pull request, @JackStouffer!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

JackStouffer · 2018-01-29T16:06:21Z

std/stdio.d

                        char[4] buf = void;
-                        immutable len = encode(buf, c);
-                        foreach (i ; 0 .. len)
+                        immutable len = encode!(UseReplacementDchar.yes)(buf, c);


Here, I changed this for two reasons,

to be in-line with the lines above which don't throw

DMD can forgo adding the exception-handling code if the function is marked as nothrow, which encode is when using replacement dchars

We should never be silently accepting/converting garbage! UseReplacementDchar.yes needs to be used with a lot of caution. If this does what I think it does (replace any kind of bad UTF with a single replacement character), then it's a bad idea. See https://issues.dlang.org/show_bug.cgi?id=14519

Well, I see you're just copying the above, which you've added in #5083 to replace toUtf8 which did the same thing, but I think we need to change this. Silently ignoring errors and throwing away information is never a good idea.

CyberShadow · 2018-01-29T16:39:33Z

std/stdio.d

                        char[4] buf;
                        immutable size = encode!(UseReplacementDchar.yes)(buf, c);
-                        foreach (i ; 0 .. size)
+                        for (int i; i < size; ++i)


Is for really faster than foreach here?

foreach makes three extra ints

int size = 5; foreach (i; 0 .. size) { } for (int j = 0; j < size; ++j) { }

becomes

int size = 5; { int __key46 = 0; int __limit47 = size; for (; __key46 < __limit47; __key46 += 1) { int i = __key46; } } { int j = 0; for (; j < size; j += 1) { { } } }

Does that actually affect generated code, though?

My knowledge of ASM is limited, but it sure looks like it does

foreach

.text._Dmain segment assume CS:.text._Dmain _Dmain: push RBP mov RBP,RSP sub RSP,010h mov EAX,5 mov -010h[RBP],EAX mov dword ptr -0Ch[RBP],0 mov -8[RBP],EAX L1A: mov ECX,-0Ch[RBP] cmp ECX,-8[RBP] jge L2D mov EDX,-0Ch[RBP] mov -4[RBP],EDX inc dword ptr -0Ch[RBP] jmp short L1A L2D: xor EAX,EAX leave ret add [RAX],AL add [RAX],AL .text._Dmain ends

for

.text._Dmain segment assume CS:.text._Dmain _Dmain: push RBP mov RBP,RSP sub RSP,010h mov dword ptr -8[RBP],5 mov dword ptr -4[RBP],0 L16: mov EAX,-4[RBP] cmp EAX,-8[RBP] jge L23 inc dword ptr -4[RBP] jmp short L16 L23: xor EAX,EAX leave ret add 0[RCX],AL .text._Dmain ends

Yeah, the number of int declarations may not have any correlation with the actual number of stack variables allocated by the asm. Given the simplicity of the loops above, I'm expecting that even dmd would enregister the loop variables, so the argument about the number of int declarations isn't even relevant. Again, please show the asm to prove that the for-loop is better than the foreach loop. It's a step backwards in readability/maintainability, and I don't think we should do it unless it's a clear win across the board for dmd/gdc/ldc.

@quickfur , Jack posted some asm above, do you not see his comment? #6091 (comment)

@JackStouffer I'm guessing that's without -O? Phobos is built with -O -release, so we should be comparing disassemblies from code built with those flags.

Sorry, I must have been looking at a stale page when I wrote my comment.

But yeah, it does look like that's without -O. For performance comparisons dmd -O is really the bare minimum; I would actually prefer to see what gdc/ldc do with the same code, because they have better loop optimizations than dmd. (Though in the latter case the challenge becomes how to prevent overly-aggressive optimizations like in ldc, where the optimizer just runs the whole thing at compile-time and substitutes the answer directly. :-P)

CyberShadow · 2018-01-29T16:42:52Z

std/stdio.d

                        char[4] buf = void;
-                        immutable len = encode(buf, c);
-                        foreach (i ; 0 .. len)
+                        immutable len = encode!(UseReplacementDchar.yes)(buf, c);


We should never be silently accepting/converting garbage! UseReplacementDchar.yes needs to be used with a lot of caution. If this does what I think it does (replace any kind of bad UTF with a single replacement character), then it's a bad idea. See https://issues.dlang.org/show_bug.cgi?id=14519

JackStouffer · 2018-01-29T17:20:37Z

@CyberShadow Removed the encoding change from this PR

CyberShadow · 2018-01-29T17:22:02Z

Thanks! I did not object to it considering the circumstances, but I do think it's not a direction we should be moving towards.

quickfur · 2018-01-29T17:29:33Z

std/stdio.d

+            for (; !writeme.empty; writeme.popFront)
            {
-                put(c);
+                put(writeme.front);


Are you sure this actually makes a difference in the generated asm? I'd like to see the asm output before approving this.

Using raw for-loops instead of foreach IMO is a step backward, and should not be done unless there's a very good reason for it. My impression is that a properly-optimizing compiler would eliminate the supposedly-redundant loads easily. (And if not, it's the compiler that should be fixed; the user code should not be uglified to work around a compiler deficiency.) Don't be so quick to believe intermediate pseudo-source code representations of the compiled code; always check the asm to see what's actually being executed at runtime.

Also, I'd like to see what gdc/ldc outputs for both the foreach loop and the raw for loop. Dmd's optimizer isn't the best when it comes to loop optimizations.

JackStouffer · 2018-01-29T17:54:52Z

@quickfur @CyberShadow DMD properly optimizes the foreach (i; 0 .. 5) case. There's no difference in the ASM of it and the for loop.

However, with ranges DMD can't get rid of the copy because it would change the behavior of the code

With this code

import std.utf;
void main()
{
    int res;
    auto m = "test".byCodeUnit;
    for (; !m.empty; m.popFront)
    {
        res += m.front;
    }
}

This is produced

.text._Dmain    segment
    assume  CS:.text._Dmain
_Dmain:
        push    RBP
        mov RBP,RSP
        sub RSP,010h
        lea RDX,_TMP0@PC32[RIP]
        mov EDI,4
        mov RSI,RDX
        call      pure nothrow @nogc @safe std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[])@PLT32
        mov -010h[RBP],RAX
        mov -8[RBP],RDX
        lea RDI,-010h[RBP]
        call      const pure nothrow @property @nogc @safe bool std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl.empty()@PLT32
        xor AL,1
        je  L50
L31:        lea RDI,-010h[RBP]
        call      inout pure nothrow ref @property @nogc @safe immutable(char) std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl.front()@PLT32
        lea RDI,-010h[RBP]
        call      pure nothrow @nogc @safe void std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl.popFront()@PLT32
        lea RDI,-010h[RBP]
        call      const pure nothrow @property @nogc @safe bool std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl.empty()@PLT32
        xor AL,1
        jne L31
L50:        xor EAX,EAX
        mov RSP,RBP
        pop RBP
        ret
        add 0[RCX],AH
.text._Dmain    ends

but this code

import std.utf;
void main()
{
    int res;
    auto m = "test".byCodeUnit;
    foreach (c; m)
    {
        res += c;
    }
}

produces this

.text._Dmain    segment
    assume  CS:.text._Dmain
_Dmain:
        push    RBP
        mov RBP,RSP
        sub RSP,020h
        lea RDX,_TMP0@PC32[RIP]
        mov EDI,4
        mov RSI,RDX
        call      pure nothrow @nogc @safe std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[])@PLT32
        mov -010h[RBP],RAX
        mov -8[RBP],RDX
        mov RAX,-010h[RBP]
        mov -020h[RBP],RAX
        mov RCX,-8[RBP]
        mov -018h[RBP],RCX
        lea RDI,-020h[RBP]
        call      const pure nothrow @property @nogc @safe bool std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl.empty()@PLT32
        xor AL,1
        je  L60
L41:        lea RDI,-020h[RBP]
        call      inout pure nothrow ref @property @nogc @safe immutable(char) std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl.front()@PLT32
        lea RDI,-020h[RBP]
        call      pure nothrow @nogc @safe void std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl.popFront()@PLT32
        lea RDI,-020h[RBP]
        call      const pure nothrow @property @nogc @safe bool std.utf.byCodeUnit!(immutable(char)[]).byCodeUnit(immutable(char)[]).ByCodeUnitImpl.empty()@PLT32
        xor AL,1
        jne L41
L60:        xor EAX,EAX
        mov RSP,RBP
        pop RBP
        ret
        add 0[RCX],AH
.text._Dmain    ends

CyberShadow · 2018-01-29T17:56:47Z

However, with ranges DMD can't get rid of the copy because it would change the behavior of the code

That's less unexpected; my question was about the

-                        foreach (i ; 0 .. size)
+                        for (int i; i < size; ++i)

change, which seemed excessive.

JackStouffer · 2018-01-29T18:02:19Z

You know, I just noticed that loop does auto-decoding on all wide strings. Initially I thought that it was only operating on ranges. I'm going to close this and open a new PR which fixes both this issue and the wide string issue.

Thanks @CyberShadow and @quickfur for the review.

JackStouffer requested review from CyberShadow and schveiguy as code owners January 29, 2018 16:05

JackStouffer commented Jan 29, 2018

View reviewed changes

JackStouffer force-pushed the lockingtextwriter-micro-optimization branch from 544bb7b to 0b1f00a Compare January 29, 2018 16:09

CyberShadow previously requested changes Jan 29, 2018

View reviewed changes

JackStouffer force-pushed the lockingtextwriter-micro-optimization branch from 0b1f00a to 9a0e34a Compare January 29, 2018 17:20

CyberShadow approved these changes Jan 29, 2018

View reviewed changes

JackStouffer added Severity:Optimization Merge:auto-merge labels Jan 29, 2018

quickfur reviewed Jan 29, 2018

View reviewed changes

CyberShadow removed the Merge:auto-merge label Jan 29, 2018

Added micro-optimizations to std.stdio.LockingTextWriter.put

93129f5

JackStouffer force-pushed the lockingtextwriter-micro-optimization branch from 9a0e34a to 93129f5 Compare January 29, 2018 17:58

JackStouffer closed this Jan 29, 2018

JackStouffer deleted the lockingtextwriter-micro-optimization branch January 29, 2018 18:02

JackStouffer mentioned this pull request Jan 29, 2018

Modified the behavior of std.stdio.LockingTextWriter.put #6092

Merged

Uh oh!

Added micro-optimizations to std.stdio.LockingTextWriter.put #6091

Added micro-optimizations to std.stdio.LockingTextWriter.put #6091

Uh oh!

Conversation

JackStouffer commented Jan 29, 2018

Uh oh!

dlang-bot commented Jan 29, 2018

Bugzilla references

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quickfur Jan 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CyberShadow Jan 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JackStouffer commented Jan 29, 2018

Uh oh!

CyberShadow commented Jan 29, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JackStouffer commented Jan 29, 2018

Uh oh!

CyberShadow commented Jan 29, 2018

Uh oh!

JackStouffer commented Jan 29, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

quickfur Jan 29, 2018 •

edited

Loading

CyberShadow Jan 29, 2018 •

edited

Loading