Skip to content

Conversation

@JackStouffer
Copy link
Contributor

@JackStouffer JackStouffer commented Jul 11, 2016

Like the title says, I removed auto-decoding from parse, which only checked for things in the ASCII range anyway.

The reason why this is so messy is parse takes its range by ref, so byCodeUnit cannot be used without using refRange, which requires using &r, which is unsafe. So the only solution left is to do it manually with a bunch of static ifs.

I tried putting the refRange.byCodeUnit in a @trusted block, but that resulted in a speed pessimization (I'm guessing due to the inability to inline at that point).

By my measurements, this is anywhere between 2.5-5x as fast as the original implementation.

UPDATE (Aug 8 2016): Changed benchmark at the behest of @9il

# using LDC 1.1.0-beta2

# New
$ ldc2 -O5 -release test.d && ./test
Total time: 30 ms, 598 μs, and 7 hnsecs

# Old
$ ldc2 -O5 -release test.d && ./test
Total time: 95 ms, 716 μs, and 6 hnsecs

Here's the code I used to benchmark

import std.stdio;
import std.range;
import std.algorithm;
import std.traits;
import std.container;
import std.meta;
import std.conv;
import std.datetime;

pragma(inline, false)
Target parse1(Target, Source)(ref Source s)
    if (isSomeChar!(ElementType!Source) &&
        isIntegral!Target && !is(Target == enum))
{
    static if (Target.sizeof < int.sizeof)
    {
        // smaller types are handled like integers
        auto v = .parse!(Select!(Target.min < 0, int, uint))(s);
        auto result = ()@trusted{ return cast(Target) v; }();
        if (result == v)
            return result;
        throw new ConvOverflowException("Overflow in integral conversion");
    }
    else
    {
        // int or larger types

        static if (Target.min < 0)
            bool sign = 0;
        else
            enum bool sign = 0;

        enum char maxLastDigit = Target.min < 0 ? 7 : 5;
        Unqual!(typeof(s.front)) c;

        if (s.empty)
            goto Lerr;

        static if (isAutodecodableString!Source)
            c = s[0];
        else
            c = s.front;

        static if (Target.min < 0)
        {
            switch (c)
            {
                case '-':
                    sign = true;
                    goto case '+';
                case '+':
                    static if (isAutodecodableString!Source)
                        s = s[1 .. $];
                    else
                        s.popFront();

                    if (s.empty)
                        goto Lerr;

                    static if (isAutodecodableString!Source)
                        c = s[0];
                    else
                        c = s.front;

                    break;

                default:
                    break;
            }
        }
        c -= '0';
        if (c <= 9)
        {
            Target v = cast(Target)c;

            static if (isAutodecodableString!Source)
                s = s[1 .. $];
            else
                s.popFront();

            while (!s.empty)
            {
                static if (isAutodecodableString!Source)
                    c = cast(typeof(c)) (s[0] - '0');
                else
                    c = cast(typeof(c)) (s.front - '0');

                if (c > 9)
                    break;

                if (v >= 0 && (v < Target.max/10 ||
                    (v == Target.max/10 && c <= maxLastDigit + sign)))
                {
                    // Note: `v` can become negative here in case of parsing
                    // the most negative value:
                    v = cast(Target) (v * 10 + c);

                    static if (isAutodecodableString!Source)
                        s = s[1 .. $];
                    else
                        s.popFront();
                }
                else
                    throw new ConvOverflowException("Overflow in integral conversion");
            }

            if (sign)
                v = -v;
            return v;
        }
Lerr:
        throw new Exception("a");
    }
}

__gshared char[] test = ['1', '2', '3', '4', '5', '6', '7', '8', '9'];

void main()
{
    enum n = 50_000_000;

    int result;
    StopWatch sw;
    Duration sum;
    TickDuration last = TickDuration.from!"seconds"(0);

    foreach(i; 0 .. n)
    {
        sw.start();

        //result = parse!int(test);
        result = parse1!int(test);

        sw.stop();

        test = ['1', '2', '3', '4', '5', '6', '7', '8', '9'];
        auto time = sw.peek() - last;
        last = sw.peek();

        sum += time.to!Duration();
    }

    writeln("Total time: ", sum);
}

@JackStouffer
Copy link
Contributor Author

JackStouffer commented Jul 11, 2016

One benefit I forgot to mention is that this will also speed up std.conv.to for strings as it just forwards the string to parse.

This will give D a leg-up in a lot of benchmarks because to!int is an extremely common call I see in those programs.

@JackStouffer
Copy link
Contributor Author

Ping.

This is a significant speed increase for a very common function.

@9il
Copy link
Member

9il commented Jul 22, 2016

Thanks! Please add or refer in PR description a unittest that cover UTF (also note about UTF). Also please replace Unqual!(typeof(s.front)) c; with uint c, this will speed up code for some rare cases

@9il
Copy link
Member

9il commented Jul 22, 2016

We need more such optimisations

@9il 9il self-assigned this Jul 22, 2016
@dnadlinger
Copy link
Contributor

please replace Unqual!(typeof(s.front)) c; with uint c, this will speed up code for some rare cases

Shouldn't this usually be covered pretty well by peephole optimization/register allocation?

@JackStouffer
Copy link
Contributor Author

Please add or refer in PR description a unittest that cover UTF (also note about UTF).

What exactly are you looking for? Parse already ignores everything that's not a number.

@JackStouffer
Copy link
Contributor Author

Ping @9il

@9il
Copy link
Member

9il commented Jul 28, 2016

Auto-merge toggled on

@9il 9il merged commit ca1d015 into dlang:master Jul 28, 2016
@wilzbach
Copy link
Contributor

@JackStouffer FYI attached issue 16451 to this PR ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants