Removed auto-decoding from std.conv.parse #4597

JackStouffer · 2016-07-11T17:29:58Z

Like the title says, I removed auto-decoding from parse, which only checked for things in the ASCII range anyway.

The reason why this is so messy is parse takes its range by ref, so byCodeUnit cannot be used without using refRange, which requires using &r, which is unsafe. So the only solution left is to do it manually with a bunch of static ifs.

I tried putting the refRange.byCodeUnit in a @trusted block, but that resulted in a speed pessimization (I'm guessing due to the inability to inline at that point).

By my measurements, this is anywhere between 2.5-5x as fast as the original implementation.

UPDATE (Aug 8 2016): Changed benchmark at the behest of @9il

# using LDC 1.1.0-beta2

# New
$ ldc2 -O5 -release test.d && ./test
Total time: 30 ms, 598 μs, and 7 hnsecs

# Old
$ ldc2 -O5 -release test.d && ./test
Total time: 95 ms, 716 μs, and 6 hnsecs

Here's the code I used to benchmark

import std.stdio;
import std.range;
import std.algorithm;
import std.traits;
import std.container;
import std.meta;
import std.conv;
import std.datetime;

pragma(inline, false)
Target parse1(Target, Source)(ref Source s)
    if (isSomeChar!(ElementType!Source) &&
        isIntegral!Target && !is(Target == enum))
{
    static if (Target.sizeof < int.sizeof)
    {
        // smaller types are handled like integers
        auto v = .parse!(Select!(Target.min < 0, int, uint))(s);
        auto result = ()@trusted{ return cast(Target) v; }();
        if (result == v)
            return result;
        throw new ConvOverflowException("Overflow in integral conversion");
    }
    else
    {
        // int or larger types

        static if (Target.min < 0)
            bool sign = 0;
        else
            enum bool sign = 0;

        enum char maxLastDigit = Target.min < 0 ? 7 : 5;
        Unqual!(typeof(s.front)) c;

        if (s.empty)
            goto Lerr;

        static if (isAutodecodableString!Source)
            c = s[0];
        else
            c = s.front;

        static if (Target.min < 0)
        {
            switch (c)
            {
                case '-':
                    sign = true;
                    goto case '+';
                case '+':
                    static if (isAutodecodableString!Source)
                        s = s[1 .. $];
                    else
                        s.popFront();

                    if (s.empty)
                        goto Lerr;

                    static if (isAutodecodableString!Source)
                        c = s[0];
                    else
                        c = s.front;

                    break;

                default:
                    break;
            }
        }
        c -= '0';
        if (c <= 9)
        {
            Target v = cast(Target)c;

            static if (isAutodecodableString!Source)
                s = s[1 .. $];
            else
                s.popFront();

            while (!s.empty)
            {
                static if (isAutodecodableString!Source)
                    c = cast(typeof(c)) (s[0] - '0');
                else
                    c = cast(typeof(c)) (s.front - '0');

                if (c > 9)
                    break;

                if (v >= 0 && (v < Target.max/10 ||
                    (v == Target.max/10 && c <= maxLastDigit + sign)))
                {
                    // Note: `v` can become negative here in case of parsing
                    // the most negative value:
                    v = cast(Target) (v * 10 + c);

                    static if (isAutodecodableString!Source)
                        s = s[1 .. $];
                    else
                        s.popFront();
                }
                else
                    throw new ConvOverflowException("Overflow in integral conversion");
            }

            if (sign)
                v = -v;
            return v;
        }
Lerr:
        throw new Exception("a");
    }
}

__gshared char[] test = ['1', '2', '3', '4', '5', '6', '7', '8', '9'];

void main()
{
    enum n = 50_000_000;

    int result;
    StopWatch sw;
    Duration sum;
    TickDuration last = TickDuration.from!"seconds"(0);

    foreach(i; 0 .. n)
    {
        sw.start();

        //result = parse!int(test);
        result = parse1!int(test);

        sw.stop();

        test = ['1', '2', '3', '4', '5', '6', '7', '8', '9'];
        auto time = sw.peek() - last;
        last = sw.peek();

        sum += time.to!Duration();
    }

    writeln("Total time: ", sum);
}

JackStouffer · 2016-07-11T18:12:37Z

One benefit I forgot to mention is that this will also speed up std.conv.to for strings as it just forwards the string to parse.

This will give D a leg-up in a lot of benchmarks because to!int is an extremely common call I see in those programs.

JackStouffer · 2016-07-18T17:35:57Z

Ping.

This is a significant speed increase for a very common function.

9il · 2016-07-22T16:50:09Z

Thanks! Please add or refer in PR description a unittest that cover UTF (also note about UTF). Also please replace Unqual!(typeof(s.front)) c; with uint c, this will speed up code for some rare cases

9il · 2016-07-22T16:50:30Z

We need more such optimisations

dnadlinger · 2016-07-22T23:45:42Z

please replace Unqual!(typeof(s.front)) c; with uint c, this will speed up code for some rare cases

Shouldn't this usually be covered pretty well by peephole optimization/register allocation?

JackStouffer · 2016-07-24T02:27:03Z

Please add or refer in PR description a unittest that cover UTF (also note about UTF).

What exactly are you looking for? Parse already ignores everything that's not a number.

JackStouffer · 2016-07-27T21:07:19Z

Ping @9il

9il · 2016-07-28T06:19:10Z

Auto-merge toggled on

wilzbach · 2016-08-31T09:15:21Z

@JackStouffer FYI attached issue 16451 to this PR ;-)

9il self-assigned this Jul 22, 2016

Removed auto-decoding from std.conv.parse

01010c5

JackStouffer force-pushed the parse-decoding branch from ee630b7 to 01010c5 Compare July 24, 2016 02:27

9il merged commit ca1d015 into dlang:master Jul 28, 2016

JackStouffer mentioned this pull request Aug 8, 2016

Removed auto-decoding from std.conv.parse (part 2) #4674

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Removed auto-decoding from std.conv.parse #4597

Removed auto-decoding from std.conv.parse #4597

Uh oh!

JackStouffer commented Jul 11, 2016 •

edited

Loading

Uh oh!

JackStouffer commented Jul 11, 2016 •

edited

Loading

Uh oh!

JackStouffer commented Jul 18, 2016

Uh oh!

9il commented Jul 22, 2016

Uh oh!

9il commented Jul 22, 2016

Uh oh!

dnadlinger commented Jul 22, 2016

Uh oh!

JackStouffer commented Jul 24, 2016

Uh oh!

JackStouffer commented Jul 27, 2016

Uh oh!

9il commented Jul 28, 2016

Uh oh!

wilzbach commented Aug 31, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Removed auto-decoding from std.conv.parse #4597

Removed auto-decoding from std.conv.parse #4597

Uh oh!

Conversation

JackStouffer commented Jul 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackStouffer commented Jul 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackStouffer commented Jul 18, 2016

Uh oh!

9il commented Jul 22, 2016

Uh oh!

9il commented Jul 22, 2016

Uh oh!

dnadlinger commented Jul 22, 2016

Uh oh!

JackStouffer commented Jul 24, 2016

Uh oh!

JackStouffer commented Jul 27, 2016

Uh oh!

9il commented Jul 28, 2016

Uh oh!

wilzbach commented Aug 31, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JackStouffer commented Jul 11, 2016 •

edited

Loading

JackStouffer commented Jul 11, 2016 •

edited

Loading