Use alias assignment in staticMap#8039
Conversation
|
Thanks for your pull request, @andralex! Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "master + phobos#8039" |
|
I'm a little bit more skeptical here because that old staticMap represented a lot of rounds of profiling and optimization. Did you test this with long items (like staticMap over 1000 items?) or with a variety of items (like 1000 different lengths)? The old code is ugly but it did a good job in all these cases. |
|
Those are different implementations to this one though. Do you still have the test rig so we can run it again this implementation? |
|
@adamdruppe just tested on a large build (20 GB, 70 seconds) and saw no measurable change in time and 0.5% memory consumption reduction. |
|
Very interesting. Well, 0.5% of 20 GB isn't nothing... not bad. |
|
@andralex @atilaneves |
Red CI. |
|
The style checker now passes, however, there were other red projects in buildkite. I rebased this in the hope that it would be sufficient. |
|
It seems that this PR breaks buildkite. |
|
Any idea what could be happening with the buildkite failures? |
|
Add this unittest and it will catch it. fix is to use index like this in the loop: It is because going over values is different than going over variables. enums are weird in that they're a little of both. (btw if i was dictator im not sure i'd support this case. like either make it one or the other, require user explictness somehow. but meh idk) |
ce65ed5 to
deddb72
Compare
|
I tested with the cases that I was working with on #7884 and it was 3.85 seconds vs. 0.6 seconds with existing Phobos. I don't think this is faster. |
|
That's a significant performance pessimization... we should confirm that... |
|
I removed the auto-merge to allow exploration of the performance. Here is the code I used to generate the test cases: #7884 (comment) |
|
@schveiguy @adamdruppe @andralex When I profiled AliasAssign way back when, it became clear that there were memory and runtime savings for a medium sized N, but detonated at high N. As such we should probably have both but branch at some suitable N. i.e. if we have Time = a(N - b)^2 + c, AliasAssign has a lower c but a higher a |
|
For comparison, here is the performance on my build box with the PR's implementation: steves@homebuild:~/staticmaptest$ /usr/bin/time dmd testgenstr.d testsm.d -version=fast
4.03user 0.81system 0:04.84elapsed 99%CPU (0avgtext+0avgdata 2613176maxresident)k
0inputs+3704outputs (0major+665674minor)pagefaults 0swaps |
|
Oh wait! I was misreading that memory usage. It's actually 10x more memory (2_613_176 vs. 278_796) not slightly less as I first thought. So the PR is not good on this test case. |
|
Thanks for the detective work, @schveiguy! 10x is "good" because it's egregious enough to presumably have an easily identifiable source. I'll take a look at the test case. |
4546342 to
7880c10
Compare
|
Upon investigating, it seems that this idiom of growing an template staticMap(alias fun, args...)
{
alias staticMap = AliasSeq!();
static foreach (i; 0 .. args.length)
staticMap = AliasSeq!(staticMap, fun!(args[i]));
}I hypothesized that because I tried two related versions. one that iterates at the beginning of the creation process, and one that iterates at the end of the creation process: template staticMap(alias fun, args...)
{
static if (args.length <= 1)
{
static if (args.length == 0)
alias staticMap = AliasSeq!();
else
alias staticMap = fun!args[0];
}
else
{
alias staticMap = AliasSeq!();
static foreach (i; 0 .. args.length / 2)
staticMap = AliasSeq!(staticMap, fun!(args[i]));
staticMap = AliasSeq!(staticMap, .staticMap!(fun, args[$ / 2 .. $]));
}
}The version above creates the first half of the result iteratively and then recurses and concatenates. Contrast with the following version, which first recurses to create the first half of the result, and then iteratively adds to it: template staticMap(alias fun, args...)
{
static if (args.length <= 1)
{
static if (args.length == 0)
alias staticMap = AliasSeq!();
else
alias staticMap = fun!args[0];
}
else
{
alias staticMap = .staticMap!(fun, args[0 .. $ / 2]);
static foreach (i; args.length / 2 .. args.length)
staticMap = AliasSeq!(staticMap, fun!(args[i]));
}
}The second version is TWICE as slow as the first, and takes TWICE as much memory. This indicates quadratic behavior. |
|
The version currently proposed is reminiscent of template staticMap(alias fun, args...)
{
static if (args.length <= 8)
{
alias staticMap = AliasSeq!();
static foreach (i; 0 .. args.length)
staticMap = AliasSeq!(staticMap, fun!(args[i]));
}
else
{
alias staticMap = AliasSeq!(staticMap!(fun, args[0 .. $ / 2]), staticMap!(fun, args[$ / 2 .. $]));
}
}This version beats the current version on @schveiguy's toughest test as follows. Existing version: Proposed version: |
|
Credit for the benchmarking and implementation effort goes to @schveiguy and @adr. Thank you! |
|
Hate to say this, but isn't this the same as it was before the current version? That is, 215a494 We should find out why it was changed away from that in the first place. Maybe there's a case that we're not considering. Ping @John-Colvin |
|
Could the reason for the quadratic behavior be inadvertent |
|
@schveiguy it's quite different innit? Unrolling is done manually, there are multiple tests etc. I wouldn't call the implementations comparable. |
I wouldn't know, but sounds plausible. Copying the entire list at every append step clearly leads to quadratic performance. |
|
Well, what I mean is that the recursion threshold is the same as it was. If that has anything to do with why the current code was introduced in the first place, we should make sure we don't cause a regression. John changed the threshold for a reason. |
|
@schveiguy so to make sure I understand there are three implementations being discussed:
These are distinct and have different tradeoffs, so it seems we'd need to use benchmarking to assess them as opposed to qualitative similarity assessments. Here's the code I tested the current PR with: import std.stdio;
import std.conv;
import std.algorithm;
import std.range;
void main(string[] args)
{
writeln("import testStaticMap;");
writeln("void main() {}");
writeln("alias pred(T) = const(T);");
foreach(i; 0 .. args[1].to!int)
{
writefln("struct S%s {int v;}", i);
version(triangular)
writefln("void foo%s(staticMap!(pred, %-(%s,%))) {}", i, iota(i+1).map!(v => text("S", v)));
else // rectangular
writefln("void foo%s(staticMap!(pred, %-(%s,%))) {}", i, iota(500).map!(v => text("S", i)));
}
}With threshold 8, the code works about as good for triangular and better for rectangular. If I choose 4, it gets less good. If I choose 16, it becomes less good. That's why I chose 8. |
|
Spoke to @WalterBright just now, he acknowledged that the growth pattern I propose we use this in the interim as an intermediate solution that is simple and effective. |
|
Oh one more thing: it seems |
Absolutely. My worry is that we don't have the right benchmark that moved us to the 150 threshold in the first place. I'd like to see how the new version does on that. |
Idea: Instead of eagerly creating a new sub-node-array of elements in the Moreover, we could also make that iteration be realized using a depth-first tree traversal member function of named like |
|
@andralex should we merge this? |
BTW I was wrong here. |
This should be up to @atilaneves. My take:
Seems like a step in the right direction. |
If we don't get an answer from @John-Colvin (who is likely really busy), I'd say merge it. He'll come back and figure it out if it really affects him. This doesn't change the API in any way, so we can always fix performance regressions later if needed. |
|
Now that dlang/dmd#14332 has been merged it might be worth reevaluating the benchmarks and maybe switching to an alternative implementation of Does anybody have a reference to the benchmarks so I can rerun them? Current definition of Update: I guess we can utilize the benchmark at the top of dlang/dmd#14332. |
There is no speedup in unittesting, but memory is reduced.