Skip to content

Use alias assignment in staticMap#8039

Merged
dlang-bot merged 2 commits intodlang:masterfrom
andralex:staticMap
Oct 26, 2021
Merged

Use alias assignment in staticMap#8039
dlang-bot merged 2 commits intodlang:masterfrom
andralex:staticMap

Conversation

@andralex
Copy link
Member

@andralex andralex commented May 3, 2021

There is no speedup in unittesting, but memory is reduced.

@andralex andralex requested a review from MetaLang as a code owner May 3, 2021 22:23
@dlang-bot
Copy link
Contributor

dlang-bot commented May 3, 2021

Thanks for your pull request, @andralex!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8039"

@adamdruppe
Copy link
Contributor

I'm a little bit more skeptical here because that old staticMap represented a lot of rounds of profiling and optimization. Did you test this with long items (like staticMap over 1000 items?) or with a variety of items (like 1000 different lengths)?

The old code is ugly but it did a good job in all these cases.

@maxhaton
Copy link
Member

maxhaton commented May 4, 2021

There is no speedup in unittesting, but memory is reduced.

@andralex #7756

See also the # of elements vs time graphs I link to in the thread.

@adamdruppe
Copy link
Contributor

Those are different implementations to this one though. Do you still have the test rig so we can run it again this implementation?

@andralex
Copy link
Member Author

@adamdruppe just tested on a large build (20 GB, 70 seconds) and saw no measurable change in time and 0.5% memory consumption reduction.

@adamdruppe
Copy link
Contributor

Very interesting. Well, 0.5% of 20 GB isn't nothing... not bad.

@maxhaton
Copy link
Member

@andralex @atilaneves AliasAssign is in the spec now, merge?

@atilaneves
Copy link
Contributor

@andralex @atilaneves AliasAssign is in the spec now, merge?

Red CI.

@RazvanN7
Copy link
Collaborator

The style checker now passes, however, there were other red projects in buildkite. I rebased this in the hope that it would be sufficient.

@RazvanN7
Copy link
Collaborator

It seems that this PR breaks buildkite.

@andralex
Copy link
Member Author

Any idea what could be happening with the buildkite failures?

@adamdruppe
Copy link
Contributor

Add this unittest

template id(alias what) {
        enum id = __traits(identifier, what);
}

enum A { a }

static assert(staticMap!(id, A.a) == AliasSeq!("a"));

and it will catch it.

fix is to use index like this in the loop:

template staticMap(alias fun, args...)
{
    alias staticMap = AliasSeq!();
    static foreach (idx, arg; args)
        staticMap = AliasSeq!(staticMap, fun!(args[idx])); // notice the index use
}

It is because going over values is different than going over variables. enums are weird in that they're a little of both. (btw if i was dictator im not sure i'd support this case. like either make it one or the other, require user explictness somehow. but meh idk)

@andralex andralex force-pushed the staticMap branch 2 times, most recently from ce65ed5 to deddb72 Compare October 18, 2021 14:43
@schveiguy
Copy link
Member

I tested with the cases that I was working with on #7884 and it was 3.85 seconds vs. 0.6 seconds with existing Phobos. I don't think this is faster.

@adamdruppe
Copy link
Contributor

That's a significant performance pessimization... we should confirm that...

@schveiguy
Copy link
Member

I removed the auto-merge to allow exploration of the performance. Here is the code I used to generate the test cases: #7884 (comment)

@maxhaton
Copy link
Member

@schveiguy @adamdruppe @andralex

When I profiled AliasAssign way back when, it became clear that there were memory and runtime savings for a medium sized N, but detonated at high N. As such we should probably have both but branch at some suitable N.

i.e. if we have Time = a(N - b)^2 + c, AliasAssign has a lower c but a higher a

@andralex andralex reopened this Oct 18, 2021
@schveiguy
Copy link
Member

For comparison, here is the performance on my build box with the PR's implementation:

steves@homebuild:~/staticmaptest$ /usr/bin/time dmd testgenstr.d testsm.d -version=fast
4.03user 0.81system 0:04.84elapsed 99%CPU (0avgtext+0avgdata 2613176maxresident)k
0inputs+3704outputs (0major+665674minor)pagefaults 0swaps

@schveiguy
Copy link
Member

Oh wait! I was misreading that memory usage. It's actually 10x more memory (2_613_176 vs. 278_796) not slightly less as I first thought. So the PR is not good on this test case.

@andralex
Copy link
Member Author

Thanks for the detective work, @schveiguy! 10x is "good" because it's egregious enough to presumably have an easily identifiable source. I'll take a look at the test case.

@andralex andralex force-pushed the staticMap branch 2 times, most recently from 4546342 to 7880c10 Compare October 18, 2021 19:38
@andralex
Copy link
Member Author

Upon investigating, it seems that this idiom of growing an AliasSeq exhibits likely quadratic memory consumption and speed:

template staticMap(alias fun, args...)
{
    alias staticMap = AliasSeq!();
    static foreach (i; 0 .. args.length)
        staticMap = AliasSeq!(staticMap, fun!(args[i]));
}

I hypothesized that because I tried two related versions. one that iterates at the beginning of the creation process, and one that iterates at the end of the creation process:

template staticMap(alias fun, args...)
{
    static if (args.length <= 1)
    {
        static if (args.length == 0)
            alias staticMap = AliasSeq!();
        else
            alias staticMap = fun!args[0];
    }
    else
    {
        alias staticMap = AliasSeq!();
        static foreach (i; 0 .. args.length / 2)
            staticMap = AliasSeq!(staticMap, fun!(args[i]));
        staticMap = AliasSeq!(staticMap, .staticMap!(fun, args[$ / 2 .. $]));
    }
}

The version above creates the first half of the result iteratively and then recurses and concatenates. Contrast with the following version, which first recurses to create the first half of the result, and then iteratively adds to it:

template staticMap(alias fun, args...)
{
    static if (args.length <= 1)
    {
        static if (args.length == 0)
            alias staticMap = AliasSeq!();
        else
            alias staticMap = fun!args[0];
    }
    else
    {
        alias staticMap = .staticMap!(fun, args[0 .. $ / 2]);
        static foreach (i; args.length / 2 .. args.length)
            staticMap = AliasSeq!(staticMap, fun!(args[i]));
    }
}

The second version is TWICE as slow as the first, and takes TWICE as much memory. This indicates quadratic behavior.

@andralex
Copy link
Member Author

The version currently proposed is reminiscent of quickSort, which uses a quadratic insertion algorithm for small arrays and otherwise recurses:

    template staticMap(alias fun, args...)
    {
        static if (args.length <= 8)
        {
            alias staticMap = AliasSeq!();
            static foreach (i; 0 .. args.length)
                staticMap = AliasSeq!(staticMap, fun!(args[i]));
        }
        else
        {
            alias staticMap = AliasSeq!(staticMap!(fun, args[0 .. $ / 2]), staticMap!(fun, args[$ / 2 .. $]));
        }
    }

This version beats the current version on @schveiguy's toughest test as follows. Existing version:

2.16user 0.29system 0:03.04elapsed 80%CPU (0avgtext+0avgdata 406452maxresident)k
45120inputs+4760outputs (821major+104640minor)pagefaults 0swaps

Proposed version:

2.12user 0.35system 0:03.04elapsed 81%CPU (0avgtext+0avgdata 345428maxresident)k
45120inputs+4760outputs (821major+89439minor)pagefaults 0swaps

@andralex
Copy link
Member Author

Credit for the benchmarking and implementation effort goes to @schveiguy and @adr. Thank you!

@schveiguy
Copy link
Member

Hate to say this, but isn't this the same as it was before the current version? That is, 215a494

We should find out why it was changed away from that in the first place. Maybe there's a case that we're not considering. Ping @John-Colvin

@nordlow
Copy link
Contributor

nordlow commented Oct 18, 2021

Could the reason for the quadratic behavior be inadvertent syntaxCopying of the arguments to the AliasSeq-instance?

@andralex
Copy link
Member Author

@schveiguy it's quite different innit? Unrolling is done manually, there are multiple tests etc. I wouldn't call the implementations comparable.

@andralex
Copy link
Member Author

Could the reason for the quadratic behavior be inadvertent syntaxCopying of the arguments to the AliasSeq-instance?

I wouldn't know, but sounds plausible. Copying the entire list at every append step clearly leads to quadratic performance.

@schveiguy
Copy link
Member

Well, what I mean is that the recursion threshold is the same as it was. If that has anything to do with why the current code was introduced in the first place, we should make sure we don't cause a regression. John changed the threshold for a reason.

@andralex
Copy link
Member Author

@schveiguy so to make sure I understand there are three implementations being discussed:

  • The existing implementation, using string mixins and changes to divide & conquer past a threshold of 150
  • The implementation before that at 215a494, which uses a cascade of static if to implement manual unrolling up to 8 and divide & conquer beyond that
  • The implementation proposed in this PR, which uses static foreach for up to 8 elements and divide & conquer past that.

These are distinct and have different tradeoffs, so it seems we'd need to use benchmarking to assess them as opposed to qualitative similarity assessments.

Here's the code I tested the current PR with:

import std.stdio;
import std.conv;
import std.algorithm;
import std.range;

void main(string[] args)
{
    writeln("import testStaticMap;");
    writeln("void main() {}");
    writeln("alias pred(T) = const(T);");
    foreach(i; 0 .. args[1].to!int)
    {
        writefln("struct S%s {int v;}", i);
        version(triangular)
            writefln("void foo%s(staticMap!(pred, %-(%s,%))) {}", i, iota(i+1).map!(v => text("S", v)));
        else // rectangular
            writefln("void foo%s(staticMap!(pred, %-(%s,%))) {}", i, iota(500).map!(v => text("S", i)));
    }
}

With threshold 8, the code works about as good for triangular and better for rectangular. If I choose 4, it gets less good. If I choose 16, it becomes less good. That's why I chose 8.

@andralex
Copy link
Member Author

Spoke to @WalterBright just now, he acknowledged that the growth pattern x = AliasSeq!(x, one_more_element) is quadratic. He has an idea on how to detect that pattern and optimize it so as to avoid copying and use exponential growth. However, he's busy with the C compiler so @nordlow if this is something you could look at that'd be positively awesome.

I propose we use this in the interim as an intermediate solution that is simple and effective.

@andralex
Copy link
Member Author

andralex commented Oct 18, 2021

Oh one more thing: it seems std.meta.Reverse can't be helped - it's quadratic right now and it will stay quadratic. We need to think of an implementation a la quicksort - reverse small lists by hand, larger by divide & conquer.

@schveiguy
Copy link
Member

These are distinct and have different tradeoffs, so it seems we'd need to use benchmarking to assess them as opposed to qualitative similarity assessments.

Absolutely. My worry is that we don't have the right benchmark that moved us to the 150 threshold in the first place. I'd like to see how the new version does on that.

@nordlow
Copy link
Contributor

nordlow commented Oct 19, 2021

Spoke to @WalterBright just now, he acknowledged that the growth pattern x = AliasSeq!(x, one_more_element) is quadratic. He has an idea on how to detect that pattern and optimize it so as to avoid copying and use exponential growth. However, he's busy with the C compiler so @nordlow if this is something you could look at that'd be positively awesome.

I propose we use this in the interim as an intermediate solution that is simple and effective.

Idea: Instead of eagerly creating a new sub-node-array of elements in the TemplateInstance.tiargs we should instead store it as a tree a la Lisp and have that tree be if needed be converted lazily to an array storage when the result of AliasSeq is actually read/iterated over.

Moreover, we could also make that iteration be realized using a depth-first tree traversal member function of named like AliasSeq.byArgument. I believe there's now a golden opportunity to specialize the storage of AliasSeq to a sub-class of a new Instance that TemplateInstance also inherits from. Instead of relying on the storage of the far too general TemplateInstance which an AliasSeq-instance is initially represented as. FYI, @maxhaton.

@RazvanN7
Copy link
Collaborator

@andralex should we merge this?

@andralex
Copy link
Member Author

Oh one more thing: it seems std.meta.Reverse can't be helped - it's quadratic right now and it will stay quadratic. We need to think of an implementation a la quicksort - reverse small lists by hand, larger by divide & conquer.

BTW I was wrong here. Reverse iterates its parameter backwards and builds the result by appending. So improving appending will improve Reverse as well. See #8301

@andralex
Copy link
Member Author

@andralex should we merge this?

This should be up to @atilaneves. My take:

  • we get rid of the mixin trickery
  • we have an implementation that is good in practice and also theoretically (quasilinear, O(n log n) copies)
  • we have versioned code with breadcrumbs to follow for subsequent work

Seems like a step in the right direction.

@schveiguy
Copy link
Member

should we merge this?

If we don't get an answer from @John-Colvin (who is likely really busy), I'd say merge it. He'll come back and figure it out if it really affects him. This doesn't change the API in any way, so we can always fix performance regressions later if needed.

@dlang-bot dlang-bot merged commit b44c0c6 into dlang:master Oct 26, 2021
@nordlow
Copy link
Contributor

nordlow commented Aug 7, 2022

Now that dlang/dmd#14332 has been merged it might be worth reevaluating the benchmarks and maybe switching to an alternative implementation of staticMap such as the one activated by version(__staticMap_simple_but_slow). Also note that underlying issue associated with version(__staticMap_simplest_but_buggy) has been fixed.

Does anybody have a reference to the benchmarks so I can rerun them?

Current definition of staticMap is at https://github.com/dlang/phobos/blob/master/std/meta.d#L628.

Update: I guess we can utilize the benchmark at the top of dlang/dmd#14332.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants

Comments