Use alias assignment in staticMap by andralex · Pull Request #8039 · dlang/phobos

andralex · 2021-05-03T22:23:45Z

There is no speedup in unittesting, but memory is reduced.

dlang-bot · 2021-05-03T22:23:46Z

Thanks for your pull request, @andralex!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8039"

adamdruppe · 2021-05-04T20:47:53Z

I'm a little bit more skeptical here because that old staticMap represented a lot of rounds of profiling and optimization. Did you test this with long items (like staticMap over 1000 items?) or with a variety of items (like 1000 different lengths)?

The old code is ugly but it did a good job in all these cases.

maxhaton · 2021-05-04T21:36:27Z

There is no speedup in unittesting, but memory is reduced.

@andralex #7756

See also the # of elements vs time graphs I link to in the thread.

adamdruppe · 2021-05-04T21:43:11Z

Those are different implementations to this one though. Do you still have the test rig so we can run it again this implementation?

andralex · 2021-05-13T18:20:52Z

@adamdruppe just tested on a large build (20 GB, 70 seconds) and saw no measurable change in time and 0.5% memory consumption reduction.

adamdruppe · 2021-05-13T18:46:51Z

Very interesting. Well, 0.5% of 20 GB isn't nothing... not bad.

maxhaton · 2021-07-21T02:04:02Z

@andralex @atilaneves AliasAssign is in the spec now, merge?

atilaneves · 2021-08-02T17:36:30Z

@andralex @atilaneves AliasAssign is in the spec now, merge?

Red CI.

RazvanN7 · 2021-08-27T10:07:36Z

The style checker now passes, however, there were other red projects in buildkite. I rebased this in the hope that it would be sufficient.

RazvanN7 · 2021-08-27T11:58:45Z

It seems that this PR breaks buildkite.

andralex · 2021-10-18T12:05:13Z

Any idea what could be happening with the buildkite failures?

adamdruppe · 2021-10-18T12:44:14Z

Add this unittest

template id(alias what) {
        enum id = __traits(identifier, what);
}

enum A { a }

static assert(staticMap!(id, A.a) == AliasSeq!("a"));

and it will catch it.

fix is to use index like this in the loop:

template staticMap(alias fun, args...)
{
    alias staticMap = AliasSeq!();
    static foreach (idx, arg; args)
        staticMap = AliasSeq!(staticMap, fun!(args[idx])); // notice the index use
}

It is because going over values is different than going over variables. enums are weird in that they're a little of both. (btw if i was dictator im not sure i'd support this case. like either make it one or the other, require user explictness somehow. but meh idk)

schveiguy · 2021-10-18T15:51:22Z

I tested with the cases that I was working with on #7884 and it was 3.85 seconds vs. 0.6 seconds with existing Phobos. I don't think this is faster.

adamdruppe · 2021-10-18T15:53:10Z

That's a significant performance pessimization... we should confirm that...

schveiguy · 2021-10-18T15:56:33Z

I removed the auto-merge to allow exploration of the performance. Here is the code I used to generate the test cases: #7884 (comment)

maxhaton · 2021-10-18T16:01:10Z

@schveiguy @adamdruppe @andralex

When I profiled AliasAssign way back when, it became clear that there were memory and runtime savings for a medium sized N, but detonated at high N. As such we should probably have both but branch at some suitable N.

i.e. if we have Time = a(N - b)^2 + c, AliasAssign has a lower c but a higher a

schveiguy · 2021-10-18T16:12:23Z

For comparison, here is the performance on my build box with the PR's implementation:

steves@homebuild:~/staticmaptest$ /usr/bin/time dmd testgenstr.d testsm.d -version=fast
4.03user 0.81system 0:04.84elapsed 99%CPU (0avgtext+0avgdata 2613176maxresident)k
0inputs+3704outputs (0major+665674minor)pagefaults 0swaps

schveiguy · 2021-10-18T16:15:47Z

Oh wait! I was misreading that memory usage. It's actually 10x more memory (2_613_176 vs. 278_796) not slightly less as I first thought. So the PR is not good on this test case.

andralex · 2021-10-18T16:27:50Z

Thanks for the detective work, @schveiguy! 10x is "good" because it's egregious enough to presumably have an easily identifiable source. I'll take a look at the test case.

andralex · 2021-10-18T19:43:03Z

Upon investigating, it seems that this idiom of growing an AliasSeq exhibits likely quadratic memory consumption and speed:

template staticMap(alias fun, args...)
{
    alias staticMap = AliasSeq!();
    static foreach (i; 0 .. args.length)
        staticMap = AliasSeq!(staticMap, fun!(args[i]));
}

I hypothesized that because I tried two related versions. one that iterates at the beginning of the creation process, and one that iterates at the end of the creation process:

template staticMap(alias fun, args...)
{
    static if (args.length <= 1)
    {
        static if (args.length == 0)
            alias staticMap = AliasSeq!();
        else
            alias staticMap = fun!args[0];
    }
    else
    {
        alias staticMap = AliasSeq!();
        static foreach (i; 0 .. args.length / 2)
            staticMap = AliasSeq!(staticMap, fun!(args[i]));
        staticMap = AliasSeq!(staticMap, .staticMap!(fun, args[$ / 2 .. $]));
    }
}

The version above creates the first half of the result iteratively and then recurses and concatenates. Contrast with the following version, which first recurses to create the first half of the result, and then iteratively adds to it:

template staticMap(alias fun, args...)
{
    static if (args.length <= 1)
    {
        static if (args.length == 0)
            alias staticMap = AliasSeq!();
        else
            alias staticMap = fun!args[0];
    }
    else
    {
        alias staticMap = .staticMap!(fun, args[0 .. $ / 2]);
        static foreach (i; args.length / 2 .. args.length)
            staticMap = AliasSeq!(staticMap, fun!(args[i]));
    }
}

The second version is TWICE as slow as the first, and takes TWICE as much memory. This indicates quadratic behavior.

andralex · 2021-10-18T19:45:57Z

The version currently proposed is reminiscent of quickSort, which uses a quadratic insertion algorithm for small arrays and otherwise recurses:

    template staticMap(alias fun, args...)
    {
        static if (args.length <= 8)
        {
            alias staticMap = AliasSeq!();
            static foreach (i; 0 .. args.length)
                staticMap = AliasSeq!(staticMap, fun!(args[i]));
        }
        else
        {
            alias staticMap = AliasSeq!(staticMap!(fun, args[0 .. $ / 2]), staticMap!(fun, args[$ / 2 .. $]));
        }
    }

This version beats the current version on @schveiguy's toughest test as follows. Existing version:

2.16user 0.29system 0:03.04elapsed 80%CPU (0avgtext+0avgdata 406452maxresident)k
45120inputs+4760outputs (821major+104640minor)pagefaults 0swaps

Proposed version:

2.12user 0.35system 0:03.04elapsed 81%CPU (0avgtext+0avgdata 345428maxresident)k
45120inputs+4760outputs (821major+89439minor)pagefaults 0swaps

andralex · 2021-10-18T19:47:25Z

Credit for the benchmarking and implementation effort goes to @schveiguy and @adr. Thank you!

schveiguy · 2021-10-18T20:17:10Z

Hate to say this, but isn't this the same as it was before the current version? That is, 215a494

We should find out why it was changed away from that in the first place. Maybe there's a case that we're not considering. Ping @John-Colvin

nordlow · 2021-10-18T20:18:51Z

Could the reason for the quadratic behavior be inadvertent syntaxCopying of the arguments to the AliasSeq-instance?

andralex · 2021-10-18T20:37:28Z

@schveiguy it's quite different innit? Unrolling is done manually, there are multiple tests etc. I wouldn't call the implementations comparable.

andralex · 2021-10-18T20:39:16Z

Could the reason for the quadratic behavior be inadvertent syntaxCopying of the arguments to the AliasSeq-instance?

I wouldn't know, but sounds plausible. Copying the entire list at every append step clearly leads to quadratic performance.

schveiguy · 2021-10-18T20:40:59Z

Well, what I mean is that the recursion threshold is the same as it was. If that has anything to do with why the current code was introduced in the first place, we should make sure we don't cause a regression. John changed the threshold for a reason.

andralex · 2021-10-18T22:31:00Z

@schveiguy so to make sure I understand there are three implementations being discussed:

The existing implementation, using string mixins and changes to divide & conquer past a threshold of 150
The implementation before that at 215a494, which uses a cascade of static if to implement manual unrolling up to 8 and divide & conquer beyond that
The implementation proposed in this PR, which uses static foreach for up to 8 elements and divide & conquer past that.

These are distinct and have different tradeoffs, so it seems we'd need to use benchmarking to assess them as opposed to qualitative similarity assessments.

Here's the code I tested the current PR with:

import std.stdio;
import std.conv;
import std.algorithm;
import std.range;

void main(string[] args)
{
    writeln("import testStaticMap;");
    writeln("void main() {}");
    writeln("alias pred(T) = const(T);");
    foreach(i; 0 .. args[1].to!int)
    {
        writefln("struct S%s {int v;}", i);
        version(triangular)
            writefln("void foo%s(staticMap!(pred, %-(%s,%))) {}", i, iota(i+1).map!(v => text("S", v)));
        else // rectangular
            writefln("void foo%s(staticMap!(pred, %-(%s,%))) {}", i, iota(500).map!(v => text("S", i)));
    }
}

With threshold 8, the code works about as good for triangular and better for rectangular. If I choose 4, it gets less good. If I choose 16, it becomes less good. That's why I chose 8.

andralex · 2021-10-18T22:36:36Z

Spoke to @WalterBright just now, he acknowledged that the growth pattern x = AliasSeq!(x, one_more_element) is quadratic. He has an idea on how to detect that pattern and optimize it so as to avoid copying and use exponential growth. However, he's busy with the C compiler so @nordlow if this is something you could look at that'd be positively awesome.

I propose we use this in the interim as an intermediate solution that is simple and effective.

andralex · 2021-10-18T23:16:13Z

Oh one more thing: it seems std.meta.Reverse can't be helped - it's quadratic right now and it will stay quadratic. We need to think of an implementation a la quicksort - reverse small lists by hand, larger by divide & conquer.

schveiguy · 2021-10-19T01:35:22Z

These are distinct and have different tradeoffs, so it seems we'd need to use benchmarking to assess them as opposed to qualitative similarity assessments.

Absolutely. My worry is that we don't have the right benchmark that moved us to the 150 threshold in the first place. I'd like to see how the new version does on that.

nordlow · 2021-10-19T07:02:12Z

Spoke to @WalterBright just now, he acknowledged that the growth pattern x = AliasSeq!(x, one_more_element) is quadratic. He has an idea on how to detect that pattern and optimize it so as to avoid copying and use exponential growth. However, he's busy with the C compiler so @nordlow if this is something you could look at that'd be positively awesome.

I propose we use this in the interim as an intermediate solution that is simple and effective.

Idea: Instead of eagerly creating a new sub-node-array of elements in the TemplateInstance.tiargs we should instead store it as a tree a la Lisp and have that tree be if needed be converted lazily to an array storage when the result of AliasSeq is actually read/iterated over.

Moreover, we could also make that iteration be realized using a depth-first tree traversal member function of named like AliasSeq.byArgument. I believe there's now a golden opportunity to specialize the storage of AliasSeq to a sub-class of a new Instance that TemplateInstance also inherits from. Instead of relying on the storage of the far too general TemplateInstance which an AliasSeq-instance is initially represented as. FYI, @maxhaton.

RazvanN7 · 2021-10-22T14:58:48Z

@andralex should we merge this?

andralex · 2021-10-22T15:41:03Z

Oh one more thing: it seems std.meta.Reverse can't be helped - it's quadratic right now and it will stay quadratic. We need to think of an implementation a la quicksort - reverse small lists by hand, larger by divide & conquer.

BTW I was wrong here. Reverse iterates its parameter backwards and builds the result by appending. So improving appending will improve Reverse as well. See #8301

andralex · 2021-10-22T15:45:24Z

@andralex should we merge this?

This should be up to @atilaneves. My take:

we get rid of the mixin trickery
we have an implementation that is good in practice and also theoretically (quasilinear, O(n log n) copies)
we have versioned code with breadcrumbs to follow for subsequent work

Seems like a step in the right direction.

schveiguy · 2021-10-26T21:23:41Z

should we merge this?

If we don't get an answer from @John-Colvin (who is likely really busy), I'd say merge it. He'll come back and figure it out if it really affects him. This doesn't change the API in any way, so we can always fix performance regressions later if needed.

std/meta.d

nordlow · 2022-08-07T08:58:08Z

Now that dlang/dmd#14332 has been merged it might be worth reevaluating the benchmarks and maybe switching to an alternative implementation of staticMap such as the one activated by version(__staticMap_simple_but_slow). Also note that underlying issue associated with version(__staticMap_simplest_but_buggy) has been fixed.

Does anybody have a reference to the benchmarks so I can rerun them?

Current definition of staticMap is at https://github.com/dlang/phobos/blob/master/std/meta.d#L628.

Update: I guess we can utilize the benchmark at the top of dlang/dmd#14332.

andralex requested a review from MetaLang as a code owner May 3, 2021 22:23

n8sh approved these changes May 4, 2021

View reviewed changes

RazvanN7 mentioned this pull request May 6, 2021

Rewrite std.meta.staticMap using AliasAssign #7756

Closed

maxhaton added the Review:Atila Neves label Jul 21, 2021

RazvanN7 force-pushed the staticMap branch from 582c617 to 377f671 Compare August 27, 2021 10:06

RazvanN7 added the Merge:auto-merge label Aug 27, 2021

RazvanN7 removed the Merge:auto-merge label Aug 27, 2021

pbackus mentioned this pull request Oct 4, 2021

Refactor std.traits.EnumMembers #8260

Merged

andralex force-pushed the staticMap branch from 377f671 to 47d4d66 Compare October 18, 2021 04:27

andralex force-pushed the staticMap branch 2 times, most recently from ce65ed5 to deddb72 Compare October 18, 2021 14:43

RazvanN7 approved these changes Oct 18, 2021

View reviewed changes

RazvanN7 added the Merge:auto-merge label Oct 18, 2021

schveiguy removed the Merge:auto-merge label Oct 18, 2021

andralex reopened this Oct 18, 2021

andralex force-pushed the staticMap branch 2 times, most recently from 4546342 to 7880c10 Compare October 18, 2021 19:38

Use alias assignment in staticMap

7880c10

andralex mentioned this pull request Oct 21, 2021

Replace quadratic Reverse with quasilinear #8301

Closed

schveiguy reviewed Oct 26, 2021

View reviewed changes

std/meta.d Outdated Show resolved Hide resolved

typo

e04ece5

andralex added the Merge:auto-merge-squash label Oct 26, 2021

dlang-bot merged commit b44c0c6 into dlang:master Oct 26, 2021

Uh oh!

Conversation

andralex commented May 3, 2021

Uh oh!

dlang-bot commented May 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bugzilla references

Testing this PR locally

Uh oh!

adamdruppe commented May 4, 2021

Uh oh!

maxhaton commented May 4, 2021

Uh oh!

adamdruppe commented May 4, 2021

Uh oh!

andralex commented May 13, 2021

Uh oh!

adamdruppe commented May 13, 2021

Uh oh!

maxhaton commented Jul 21, 2021

Uh oh!

atilaneves commented Aug 2, 2021

Uh oh!

RazvanN7 commented Aug 27, 2021

Uh oh!

RazvanN7 commented Aug 27, 2021

Uh oh!

andralex commented Oct 18, 2021

Uh oh!

adamdruppe commented Oct 18, 2021

Uh oh!

schveiguy commented Oct 18, 2021

Uh oh!

adamdruppe commented Oct 18, 2021

Uh oh!

schveiguy commented Oct 18, 2021

Uh oh!

maxhaton commented Oct 18, 2021

Uh oh!

schveiguy commented Oct 18, 2021

Uh oh!

schveiguy commented Oct 18, 2021

Uh oh!

andralex commented Oct 18, 2021

Uh oh!

andralex commented Oct 18, 2021

Uh oh!

andralex commented Oct 18, 2021

Uh oh!

andralex commented Oct 18, 2021

Uh oh!

schveiguy commented Oct 18, 2021

Uh oh!

nordlow commented Oct 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andralex commented Oct 18, 2021

Uh oh!

andralex commented Oct 18, 2021

Uh oh!

schveiguy commented Oct 18, 2021

Uh oh!

andralex commented Oct 18, 2021

Uh oh!

andralex commented Oct 18, 2021

Uh oh!

andralex commented Oct 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

schveiguy commented Oct 19, 2021

Uh oh!

nordlow commented Oct 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RazvanN7 commented Oct 22, 2021

Uh oh!

andralex commented Oct 22, 2021

Uh oh!

andralex commented Oct 22, 2021

Uh oh!

dlang-bot commented May 3, 2021 •

edited

Loading

nordlow commented Oct 18, 2021 •

edited

Loading

andralex commented Oct 18, 2021 •

edited

Loading

nordlow commented Oct 19, 2021 •

edited

Loading

nordlow commented Aug 7, 2022 •

edited

Loading