-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Inline delegate invocation #11578
Comments
You might want to ask that in the coreclr repo as method inlining is a concern for the JIT compiler, not the C# compiler. |
@hypeartist Do you mean things like |
@hypeartist Some of your pain might go away with local functions which are (presumably) able to be inlined. string MyFunc()
{
// Lambda/delegate (cannot be inlined, allocates memory)
Func<string, int> getLen = (str => str.Length);
// local function using arrow syntax (potential new feature)
int getLen(string str) => str.Length;
} |
@orthoxerox public override void Check(Func<int> fn, out string result, out string expected)
{
result = fn().ToString();
expected = (1 + 2 + 3 + 4).ToString();
} Make fn's body to be inlined (if it meets the inlining rules) |
I don't think this is possible at all. Here is why, (Forgive me if I tell you things your already know). Inlining is only done at the JIT(or native compiler) level today and will probably always be done there. The JIT compiles the method The only way to take advantage of inlining is with data that is known at compile time (technically JIT time). So if Hypothetically we could make this work if you are willing to re-JIT Interpreting is almost always slower in general than JITing and might not be worthwhile in this case because either:
Does this answer your question? |
@AlgorithmsAreCool |
I think this is possible yes. I believe it is called Data Flow Analysis and is a powerful optimization technique seen in c++ compilers typically. Lets look at a optimizable case: void Main()
{
var data = new[] { 1.0, 2.0, 5.0 ... };
Func<double, double> scale = Math.Tanh;
ScaleData(data, scale);
}
void ScaleData(double[] data, Func<double, double> scale)
{
for(int i = 0; i < data.Length; i++)
data[i] = scale(data[i]);
} In this example the human eye can see that
void Main()
{
var data = new[] { 1.0, 2.0, 5.0 ... };
//Compiler removed
//Func<double, double> scale = Math.Tanh;
//ScaleData(data, scale);
//compiler generated call
ScaleData_tanh(data);
}
//Compiler generated
void ScaleData_tanh(double[] data)
{
for(int i = 0; i < data.Length; i++)
{
double x = data[i];
data[i] = (Math.Exp(x) - Math.Exp(-x)) / (Math.Exp(x) + Math.Exp(-x));
}
}
void ScaleData(double[] data, Func<double, double> scale)
{
for(int i = 0; i < data.Length; i++)
data[i] = scale(data[i]);
} This would be a win because we have saved a call in a 'hot' loop. It could be done by Roslyn or by the JIT. But this optimization has a few limits/drawbacks however.
So yes it is possible to do this. But the C#/VB team typically don't like putting heavy optimizations into Roslyn, and the JIT pobably doesn't have the time to do this complex flow analysis. The JIT has to do quick optimizations that only take a few millisecond to keep the program starting up quickly. This opimization case would be better suited for the CoreRT or .NET Native projects where the compiler can take several minutes to study and tightly optimize the code. Did this answer your question? |
Either that, or tiered JIT. |
This optimization is being done routinely by the HotSpot JVM. The .NET JITs are just very far behind. .NET developers are not used to vcalls being optimized away effectively. From an optimization standpoint the techniques to devirtualize vcalls, interface calls and delegate calls are the same. The JVM monitors what classes actually are loaded. That way it often knows that a vcall can have only one possible target. The CLR could do the same thing. The JVM also re-JITs and profiles the actual call targets. It then generates patterns like
The check can often be pushed outside loops. Profiling is very easy with a tiered JIT. The first tier profiles, the second tier does not. The main drawback is that a profile that is created once never changes. But that is rare and this simple scheme already result in high gains. The JVM has a very sophisticated tiering scheme with multiple tiers and complicated rules for upgrading and downgrading the tier. I don't think that is necessary to realize gains. Rather, the JVM designers must have realized how important tiering is. So they invested a lot of time to squeeze the last points of performance out of it .NET can get started with a simpler scheme. |
Good point, Hotspot is a wonderful JIT indeed. Another thought is Chakra (IE JIT), is also tiered and performs profile guided monomorphic call optimization at runtime. Chakra can even replace a optimization profile if the execution flow changes to break earlier optimizations. In fact, that blog post has several optimizations that I wish .Net had. |
It boggles my mind how a platform that runs on hundreds of millions of servers has such an under-invested JIT. Isn't .NET the platform that is 2nd in the number of servers it drives (where Java is 1st)?! The amount of money that is wasted on, say, 10% more servers needed due to the sub-par code quality with .NET is unfathomably high. It's equal to the TCO of >10 millions of servers. Just the amount that Microsoft themselves waste on unneeded servers with Bing and Office 360 must be more than the entire JIT team's budget. It is known how to make the JIT far better. This is not research work (to a large extent; excluding the machine learning inlining decisions effort that is underway right now). |
Looking into it a bit, I wonder if the departure of Kevin Frei to Fackbook in july 2014 has slowed advancement of RyuJit. He was codgen dev lead for .NET and RyuJit might have been his design. Edit: See his twitter |
RyuJIT is based on some pre-existing JIT codebase according to Andrew Ayers in his LLVM talk. He says "it's a baroque codebase that nobody feels quite confident in". This wording is slightly misremembered by me. The actual wording might have been not as strong but certainly in this direction. It's certainly not entirely new code. The tree-based IR design feels wrong. It's not what state of the art compilers (LLVM, HotSpot) use. I don't think anyone would use such a design if writing a new JIT. It would be entirely SSA based. |
Fascinating! I'll have to find that talk. |
Yeah... I so don't understand the strategy. Not sure why a new JIT was created that's hardly better than the old. Maybe it's a stop-gap thing to be able to release CoreClr? Maybe the old JIT was not suitable for that (licensing and patent fears?). The team has stated that they want a tiered JIT. So the stop-gap theory makes sense where some powerful code generator is the 2nd tier (maybe LLVM). Once the 2nd tier is there the 1st tier can be an interpreter which is very cheap to implement. The video is here: http://llvm.org/devmtg/2015-04/ 2:10 "baroque internal architecture ... even the people who work on it are kind of nervous about aspects of the code". |
If only Mike Pall wasn't so tired of writing JIT compilers... |
This is exactly what I was thinking of! (just was lack of right words :) ) |
This issue is misplaced in the Roslyn repository. The Roslyn compilers are not intended to perform optimizations, but rather to translate source code into IL. The runtime compiler is where optimizations occur (or do not occur, as in this case). You probably want to report this in the coreclr repository. |
Is it possible to make it happen? (Maybe following the same rules applied to ordinary methods)
The text was updated successfully, but these errors were encountered: