Skip to content

Improve static map precalculated basic cases.#7884

Closed
schveiguy wants to merge 1 commit intodlang:masterfrom
schveiguy:betterstaticmap
Closed

Improve static map precalculated basic cases.#7884
schveiguy wants to merge 1 commit intodlang:masterfrom
schveiguy:betterstaticmap

Conversation

@schveiguy
Copy link
Member

Instead of storing individual mixins for all the cases from 0 to n, store a single string with all the Args from 0 to n, and then just slice off the ones needed.

This allows storing more elements with less space.

I tested with just upping the threshold to 5008 by default instead of 150, and I got some weird results. On code that did subsequent staticMap on 1 to 500 items, the compiler used 50% more memory and took slightly longer to use strictly the mixin instead of recursion. So I left the threshold of 150 by default, and people can use the other overload if they want to experiment with using less/no recursion. I also did not forward from one to the other to avoid adding another defaulted compile-time parameter to existing uses.

@schveiguy schveiguy requested a review from MetaLang as a code owner March 18, 2021 16:06
@dlang-bot
Copy link
Contributor

Thanks for your pull request, @schveiguy!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#7884"

@schveiguy
Copy link
Member Author

ping @John-Colvin since Adam told me you were the one who found the 150 threshold to be superior. Maybe this gives some possibility of improvement for your test case? My test case was pretty artificial.

and less memory, we can store 50x as many linear expansions for
staticMap. Provide a mechanism to set the threshold for avoiding
recursion.
@n8sh
Copy link
Member

n8sh commented Mar 21, 2021

I don't think an explicit expandFactor parameter is the way to go for a public-facing version of staticMap. It seems like the only reason to set a custom expandFactor is if one is developing Phobos itself and searching for a good default setting.

@schveiguy
Copy link
Member Author

I don't think an explicit expandFactor parameter is the way to go for a public-facing version of staticMap. It seems like the only reason to set a custom expandFactor is if one is developing Phobos itself and searching for a good default setting.

I don't exactly agree. Even with a "good default" there may be cases where the default isn't as good as you can get.

Honestly, when I started doing this, I expected to just set the default to 5000 and have good performance. But the testing I did shows that there are cases where it consumes way more memory (because of the caching system in DMD). It seems a waste to be able to store 5000 elements in the same space as 150 elements, and not provide a mechanism to use it.

So I put it in there as a mechanism to allow people to play with it, maybe to find a case where it makes sense.

Perhaps it needs to be behind a version? This is all static parameters, so we can do that reasonably easy.

@maxhaton
Copy link
Member

I think the solution is to have a dumb (some sensible default) staticMap which is an instantiation of a encapsulated parameterized implementation, then expose both properly but separately - the latter could probably have a long enough name to suggest it's not meant to be thrown around without caution?.

@schveiguy
Copy link
Member Author

staticMap is pretty simple. The actual implementation is only 4 lines because of the requirement to recurse.

Maybe I can turn the staticMapBasicCases into a public string, and then people can build what they want with that?

@RazvanN7
Copy link
Collaborator

RazvanN7 commented Apr 7, 2021

I don't think an explicit expandFactor parameter is the way to go for a public-facing version of staticMap. It seems like the only reason to set a custom expandFactor is if one is developing Phobos itself and searching for a good default setting.

Why not? You can choose to not use it and go with the default.

Maybe I can turn the staticMapBasicCases into a public string, and then people can build what they want with that?

I think that we can merge this as is and add this if folks require it.

@RazvanN7
Copy link
Collaborator

Does anyone have any objections to merging this?

@RazvanN7 RazvanN7 added the Merge:72h no objection -> merge The PR will be merged if there are no objections raised. label Apr 15, 2021
@RazvanN7 RazvanN7 closed this Apr 16, 2021
@RazvanN7 RazvanN7 reopened this Apr 16, 2021
@RazvanN7
Copy link
Collaborator

RazvanN7 commented Apr 16, 2021

@CyberShadow any way in which we can restart the documentation tester when it timeouts (besides re-pushing or closing and reopening the PR)?

@n8sh
Copy link
Member

n8sh commented Apr 16, 2021

I don't think an explicit expandFactor parameter is the way to go for a public-facing version of staticMap. It seems like the only reason to set a custom expandFactor is if one is developing Phobos itself and searching for a good default setting.

Why not? You can choose to not use it and go with the default.

My objections to adding a marginally useful or non-useful template parameter are:

  • Once it has been added it will take years to remove (if ever).
  • It increases the cognitive load of using the API.

The reasons I am not yet convinced this template parameter is useful are:

  • It does not affect the result, only evaluation time and/or memory usage of compilation.
  • No guidance is provided as to when a user might benefit from setting the new parameter to a non-default value.
  • No places where the tuning parameter could be profitably used were identified in uses of staticMap in Phobos.
  • No benchmarks were shared demonstrating that it is ever beneficial to adjust the parameter or if so by how much.

@RazvanN7 RazvanN7 removed the Merge:72h no objection -> merge The PR will be merged if there are no objections raised. label Apr 16, 2021
@RazvanN7
Copy link
Collaborator

@n8sh Those are all valid points. @schveiguy Can you provide any benchmarks or places in phobos that would benefit from this addition?

@RazvanN7
Copy link
Collaborator

ping @schveiguy

@schveiguy
Copy link
Member Author

Sorry, this got lost in my inbox. My code to generate the test code for my use cases was:

import std.stdio;
import std.conv;
import std.algorithm;
import std.range;

void main(string[] args)
{
    writeln("import std.meta;");
    writeln("void main() {}");
    writeln("alias pred(T) = const(T);");
    foreach(i; 0 .. args[1].to!int)
    {
        writefln("struct S%s {int v;}", i);
        writefln("void foo%s(staticMap!(pred, %-(%s,%))) {}", i, iota(i+1).map!(v => text("S", v)));
    }
}

This should generate a given number of static map cases which should trigger a lot of caching (when recursion is used). In my tests, the version in this PR used slightly less compile-time than the current phobos code, and less memory. As I said, upping the threshold resulted in slower compile times, probably due to the lack of caching (but I did not prove this out). In real-world cases, I don't know how much this would affect compile times. I would expect less memory usage, but I'm not 100% sure.

@schveiguy
Copy link
Member Author

The reasons I am not yet convinced this template parameter is useful are:

  • It does not affect the result, only evaluation time and/or memory usage of compilation.

True, but that is a significant problem to solve with these kinds of meta templates. Compile times matter to people. In fact, the current code was used to significantly speed up compile times, and so was considered a useful change, even though it didn't change the result.

  • No guidance is provided as to when a user might benefit from setting the new parameter to a non-default value.

My original intention was that "always" would be the answer. But testing shows that it's not always the case. My original intention was not to even have the parameter, as I assumed less CTFE = faster/better. As it stands, I don't want to make things slower, but I also don't want to limit the possibility of people finding places where code can be sped up.

An entirely reasonable compromise would be to limit the cases to 150 without a parameter, and just avoid storing so much string data in the compiler.

  • No places where the tuning parameter could be profitably used were identified in uses of staticMap in Phobos.

staticMap with >150 elements is probably not common anyway.

  • No benchmarks were shared demonstrating that it is ever beneficial to adjust the parameter or if so by how much.

This is true, I don't know of cases where it would speed up compilation. As I said, the original expectation I have was that it would always be faster/better. There are a lot of factors interacting together. My hope was that providing the tuning parameter at least gives a possibility of playing with optimization when it might be important.

If it's determined that the parameter shouldn't be provided, perhaps just removing the parameterized version is needed. Let me know what I should do.

@RazvanN7
Copy link
Collaborator

cc @atilaneves

@atilaneves
Copy link
Contributor

I read the whole thing again and still don't know if this speeds up the general case or not.

@RazvanN7
Copy link
Collaborator

@atilaneves I think that the point @schveiguy was trying to make was that we don't know if it speeds up the general case, but we offer a possibility to the user to play with the expand factor and choose the best configuration for his scenario.

@schveiguy
Copy link
Member Author

schveiguy commented Apr 29, 2021

What I do know is that we use enough compile-time memory to store 5000 elements that can be sliced, to store the same repeated string + 1 more element for 150 elements. So the new code uses no more memory than the original, and the CTFE cost is comparable.

What isn't clear is why this doesn't speed up larger static maps. Probably it has to do with the test cases, and caching. But at 150 elements, the new code is comparable, while giving the possibility of trying larger static map mixins.

It is difficult to test this because my firm belief is that a straight-up mixin is more performant on an individual case than recursing down to 150 elements, but that when you give the cache a lot of instances to work with (because you have populated it with the 1 to n-1 instances), then more instantiations can be faster than processing a large mixin. However, this is really hard to test because you can't repeat the same thing over and over, it will just hit the cache.

Hm... this gives me an idea, maybe instead of testing things that would be cached, I should test things that wouldn't be (like instead of 500 different structs, do a list of 500 of the same structs, but do that over and over for 100 different structs).

I'll get back to you. If that doesn't perform better, then I think this probably, while intuitively making sense as being faster, doesn't fit with the current compiler architecture.

@schveiguy
Copy link
Member Author

Running that test, it's still worse (and noticeably so). I'll close this, as it seems not to be an improvement.

I still feel like the compiler internals are having some sort of pathological case with this, but I don't know enough about the compiler to understand why. The tradeoffs don't make sense when you look at the results.

Someone mentioned perhaps mixin(a, b, c) is really just doing a CTFE concatenation, if that is the case, then this might explain why there is an issue.

@schveiguy schveiguy closed this Apr 29, 2021
@schveiguy
Copy link
Member Author

Looking at dmd source, I think it is running a CTFE engine for each parameter to mixin to evaluate the expression and append the stringified result to an OutBuffer. So I think, ironically, concatenation is actually faster (only one invocation of the CTFE engine), but I don't know 100%.

It's still a mystery to me why this is less performant than the mixin of a single string (but which still needs CTFE to evaluate the index). I have a feeling a better CTFE engine would make the differences between these two solutions negligible. Even the recursive version that stops at 150 elements -- it doesn't take too many steps to get down to 150, even from a really large number (4 levels would need a static map of 2400 elements). So perhaps the 150 is a reasonable compromise. Maybe I can adjust this to just generate 150 elements, and see what happens.

@RazvanN7
Copy link
Collaborator

cc @UplinkCoder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants

Comments