You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When writing performance-critical code it often leads to code duplication.
Let's say we wanted to make a method that applies an effect on an image, in our case we want to apply a gray-scale and an optional invert. The code could look like this:
publicclassEffect{publicstaticvoidApply(Bitmapbmp,GreyscaleMethodgrayscaleMethod,boolinvert){// read bitmap dataintw=bmp.Width,h=bmp.Height;vardata=bmp.LockBits(newRectangle(0,0,w,h),ImageLockMode.ReadWrite,bmp.PixelFormat);if(bmp.PixelFormat!=PixelFormat.Format32bppArgb)thrownewInvalidOperationException($"Unsupported pixel format: {bmp.PixelFormat}");vars=data.Stride;
unsafe
{varptr=(byte*)data.Scan0;for(inty=0;y<h;y++){for(intx=0;x<w;x++){// read RGB (not quite optimized, but that's not the point)intoffset=y*s+x;intr=ptr[offset+1];intg=ptr[offset+2];intb=ptr[offset+3];// apply effects per pixelif(grayscaleMethod==GreyscaleMethod.Average){r=g=b=(r+g+b)/3;}elseif(grayscaleMethod==GreyscaleMethod.Luminance){r=g=b=(int)(r*0.2126+g*0.7152+b*0722);}if(invert){r=255-r;g=255-g;b=255-b;}// write RGBptr[offset+1]=(byte)r;ptr[offset+2]=(byte)g;ptr[offset+3]=(byte)b;}}}bmp.UnlockBits(data);}}publicenumGreyscaleMethod{None,Average,Luminance,}
However if we expect the invert to be only rarely used, that code is slower than it can be because of the constant if (invert) check inside the performance-critical inner loop. We could of course create another method that gets called when invert is false, but that leads to code duplication, is harder to maintain, etc.
What we would need to have both optimal performance and code reuse is a way to get the compiler to generate 2 methods at compile time depending on the value of invert. Without any new syntax the code might look like this:
publicclassEffect{privatestaticvoidApply<invert>(Bitmapbmp,GreyscaleMethodgrayscaleMethod)whereinvert:Bool{// [...] read bitmap data
unsafe
{varptr=(byte*)data.Scan0;for(inty=0;y<h;y++){for(intx=0;x<w;x++){// [...] read RGB// apply effects per pixelif(grayscaleMethod==GreyscaleMethod.Average){r=g=b=(r+g+b)/3;}elseif(grayscaleMethod==GreyscaleMethod.Luminance){r=g=b=(int)(r*0.2126+g*0.7152+b*0722);}if(typeof(invert)==typeof(True)){// type checkr=255-r;g=255-g;b=255-b;}// [...] write RGB}}}bmp.UnlockBits(data);}}publicclassFalse:Bool{}publicclassTrue:Bool{}publicclassBool{}
Now that check if a compile-time constant, so the compiler could remove the type-condition and its block away when invert is False, and remove the type-condition but leave its block when True, leading to performance optimal code in both cases without code duplication.
However does the compiler (or even the JIT) do that? According to this stackoverflow answer it currently does not.
This is a proposal to improve the compiler (or JIT) to do that sort of code inlining (through method duplication) for compile-time constant checks.
If this were implemented, we can optimize the code even further by doing the same with the grayscaleMethod parameter:
publicclassEffect{privatestaticvoidApply<invert,greyscaleMethod>(Bitmapbmp)whereinvert:BoolwheregreyscaleMethod:GreyscaleMethodEnum{// [...] read bitmap data
unsafe
{varptr=(byte*)data.Scan0;for(inty=0;y<h;y++){for(intx=0;x<w;x++){// [...] read RGB// apply effects per pixelif(typeof(greyscaleMethod)==typeof(GreyscaleMethod_Average)){r=g=b=(r+g+b)/3;}elseif(typeof(greyscaleMethod)==typeof(GreyscaleMethod_Luminance)){r=g=b=(int)(r*0.2126+g*0.7152+b*0722);}if(typeof(invert)==typeof(True)){r=255-r;g=255-g;b=255-b;}// [...] write RGB}}}bmp.UnlockBits(data);}}publicclassGreyscaleMethod_None:GreyscaleMethodEnum{}publicclassGreyscaleMethod_Average:GreyscaleMethodEnum{}publicclassGreyscaleMethod_Luminance:GreyscaleMethodEnum{}publicclassGreyscaleMethodEnum{}
Doing the same optimization through code duplication would require 6 methods, and the number would increase exponentially with the number of parameters. However the compiler would know to only generate the methods which are actually used in the code.
The text was updated successfully, but these errors were encountered:
When generic arguments are value types the JIT has to generate specialized code for each value type. That enables some optimization including recognizing that typeof(invert) == typeof(True) is always true when invert = True.
Though there recently a bug was introduced that prevented this optimization from working. It's fixed now in the latest CoreCLR builds but it's still present in some .NET Framework builds (e.g. the one that comes with the current Win 10 Preview).
Doing the same optimization through code duplication would require 6 methods, and the number would increase exponentially with the number of parameters.
That's why when the code is shared between instantiations when reference types are used.
However the compiler would know the only generate the methods which are actually used in the code.
Well, if you call all variants it will still have to generate code for all of them. It is what it is, a trade off between code size and performance.
When writing performance-critical code it often leads to code duplication.
Let's say we wanted to make a method that applies an effect on an image, in our case we want to apply a gray-scale and an optional invert. The code could look like this:
However if we expect the invert to be only rarely used, that code is slower than it can be because of the constant
if (invert)
check inside the performance-critical inner loop. We could of course create another method that gets called wheninvert
is false, but that leads to code duplication, is harder to maintain, etc.What we would need to have both optimal performance and code reuse is a way to get the compiler to generate 2 methods at compile time depending on the value of
invert
. Without any new syntax the code might look like this:Now that check if a compile-time constant, so the compiler could remove the type-condition and its block away when
invert
isFalse
, and remove the type-condition but leave its block whenTrue
, leading to performance optimal code in both cases without code duplication.However does the compiler (or even the JIT) do that? According to this stackoverflow answer it currently does not.
This is a proposal to improve the compiler (or JIT) to do that sort of code inlining (through method duplication) for compile-time constant checks.
If this were implemented, we can optimize the code even further by doing the same with the
grayscaleMethod
parameter:Doing the same optimization through code duplication would require 6 methods, and the number would increase exponentially with the number of parameters. However the compiler would know to only generate the methods which are actually used in the code.
The text was updated successfully, but these errors were encountered: