-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Uniform Shader Variable Handling #566
Comments
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Current (relevant) API: public void SetArray(string name, T[] value);
public void Set(string name, T value);
public T[] GetArray<T>(string name);
public T Get<T>(string name); Potential API when going with a multi-overload design: public void Set(string name, ContentRef<Texture> value);
public void Set(string name, float[] value);
public void Set(string name, Vector2[] value);
public void Set(string name, Vector3[] value);
public void Set(string name, Vector4[] value);
public void Set(string name, Matrix3[] value);
public void Set(string name, Matrix4[] value);
public void Set(string name, int[] value);
public void Set(string name, Point2[] value);
public void Set(string name, bool[] value);
public void Set(string name, float value);
public void Set(string name, Vector2 value);
public void Set(string name, Vector3 value);
public void Set(string name, Vector4 value);
public void Set(string name, Matrix3 value);
public void Set(string name, Matrix4 value);
public void Set(string name, int value);
public void Set(string name, Point2 value);
public void Set(string name, bool value);
public ContentRef<Texture> GetTexture(string name);
public float[] GetFloatArray(string name);
public Vector2[] GetVector2Array(string name);
public Vector3[] GetVector3Array(string name);
public Vector4[] GetVector4Array(string name);
public Matrix3[] GetMatrix3Array(string name);
public Matrix4[] GetMatrix4Array(string name);
public int[] GetIntArray(string name);
public Point2[] GetPoint2Array(string name);
public bool[] GetBoolArray(string name);
public float GetFloat(string name);
public Vector2 GetVector2(string name);
public Vector3 GetVector3(string name);
public Vector4 GetVector4(string name);
public Matrix3 GetMatrix3(string name);
public Matrix4 GetMatrix4(string name);
public int GetInt(string name);
public Point2 GetPoint2(string name);
public bool GetBool(string name); Avoid, if it can be avoided. Worth investigating why the
Edit: More info on the
|
@ilexp if you point me at a representative example where |
@AndyAyersMS Thanks for checking in! I'll try to keep it short: The use case I'm trying to get optimized boils down to the following: public T Get<T>(string key) where T : struct
{
if (typeof(T) == typeof(float))
{
// Transform data
float result = ...;
return (T)(object)result;
}
else if (typeof(T) == typeof(...))
{
// ...
}
// ...
} I would expect optimization for public T Get<T>(string key) where T : struct
{
// No type checks!
float result = ...;
// Crucial: No boxing either!
return result;
} However, this doesn't seem to happen - I get lots of boxing allocations instead that vanish when I use a specialized non-generic method overload. Edit: The unnecessary boxing part of this seems to trigger for certain structs only, definitely not for float as displayed above, see followup comment. I'm currently investigating this in a separate project and have not managed to get the public static void Main(string[] args)
{
Console.WriteLine("Begin");
TestCaseA();
TestCaseB();
TestCaseC();
TestCaseD<int>();
TestCaseE<int>();
Console.WriteLine("End");
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static int TestCaseA()
{
return 1;
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static int TestCaseB()
{
if (true)
{
return 1;
}
else
{
return 2;
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static int TestCaseC()
{
if (typeof(int) == typeof(int))
{
return 1;
}
else
{
return 2;
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static int TestCaseD<T>() where T : struct
{
if (typeof(T) == typeof(T))
{
return 1;
}
else
{
return 2;
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static int TestCaseE<T>() where T : struct
{
if (typeof(T) == typeof(int))
{
return 1;
}
else
{
return 2;
}
} To retrieve a JITed disassembly of each method, I'm building the project in Release mode, run it in Visual Studio, step into each method and switch to Disassembly. "Enable Just My Code" is false and "Suppress JIT Optimizations" is false as well. I'm getting the following disassembly for each method: TestCaseA:
TestCaseB:
TestCaseC:
TestCaseD:
TestCaseE:
I'll have to add that I'm not at all used to reading assembly code or debugging JIT optimizations, so if I took a wrong turn at some point, or you need additional information, please let me know. For the last test, I'm using Visual Studio 2017 (Version 15.2 (26430.15)) on .NET Framework Version 4.7.02046. Edit: Added two more test cases. C to E seem to be equal except for memory addresses. |
@AndyAyersMS I did a second test that focuses on the boxing itself, rather than omitting / resolving public struct SmallStruct
{
public int Value;
public SmallStruct(int v)
{
this.Value = v;
}
}
public struct BigStruct
{
public int Value;
public long Other;
public BigStruct(int v, long o)
{
this.Value = v;
this.Other = o;
}
}
public struct GenericStruct<T>
{
public int Value;
public T Other;
public GenericStruct(int v, T o)
{
this.Value = v;
this.Other = o;
}
}
public class Program
{
public static void Main(string[] args)
{
Console.WriteLine("Begin");
TestCaseA<SmallStruct>();
TestCaseB<BigStruct>();
TestCaseC<GenericStruct<long>>();
TestCaseD<GenericStruct<string>>();
Console.WriteLine("End");
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static T TestCaseA<T>() where T : struct
{
if (typeof(T) == typeof(SmallStruct))
{
return (T)(object)new SmallStruct(1);
}
else
{
return default(T);
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static T TestCaseB<T>() where T : struct
{
if (typeof(T) == typeof(BigStruct))
{
return (T)(object)new BigStruct(1, 2);
}
else
{
return default(T);
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static T TestCaseC<T>() where T : struct
{
if (typeof(T) == typeof(GenericStruct<long>))
{
return (T)(object)new GenericStruct<long>(1, 2);
}
else
{
return default(T);
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static T TestCaseD<T>() where T : struct
{
if (typeof(T) == typeof(GenericStruct<string>))
{
return (T)(object)new GenericStruct<string>(1, "Hello");
}
else
{
return default(T);
}
}
} Test case D is the one that I was stumbling over in my project, and looking at the disassembly, it is also the only one where boxing is not optimized away: TestCaseA: Aside from the typeof check that's still there, no boxing with the small struct.
TestCaseB: No boxing with the big struct either
TestCaseC: The generic struct with long generates code equal to the big struct code
TestCaseD: The generic struct with string generates this, boxing and "other things (?)"
|
Progress
Immediate ToDo
|
The jit in 4.7.02046 (aka desktop CLR version 4.7) had a bug fix that inhibited some cases of the type equality optimizations. These were fixed in 4.7.1 which is available as a preview and which should be officially released soon. On latest CoreCLR I get the following code for your second test case examples. I would expect similar results from 4.7.1 too. If you get a chance to try 4.7.1 and you don't get good results, please let me know. TestCaseD is clearly not getting optimized. There are challenges here because the CLR will share method implementations over all ref types in TestCaseD, so the type of ;;; Test case A
G_M22036_IG02:
B801000000 mov eax, 1
G_M22036_IG03:
C3 ret
;;; Test case B
G_M60708_IG02:
B801000000 mov eax, 1
BA02000000 mov edx, 2
8901 mov dword ptr [rcx], eax
48895108 mov qword ptr [rcx+8], rdx
488BC1 mov rax, rcx
G_M60708_IG03:
C3 ret
;;; Test case C
G_M4495_IG02:
B801000000 mov eax, 1
BA02000000 mov edx, 2
8901 mov dword ptr [rcx], eax
48895108 mov qword ptr [rcx+8], rdx
488BC1 mov rax, rcx
G_M4495_IG03:
C3 ret
;;; Test case D
G_M4522_IG01:
4157 push r15
4156 push r14
57 push rdi
56 push rsi
55 push rbp
53 push rbx
4883EC28 sub rsp, 40
4889542420 mov qword ptr [rsp+20H], rdx
488BD9 mov rbx, rcx
488BF2 mov rsi, rdx
G_M4522_IG02:
;; This bit of code is looking up the handle for the actual type of T
488B4E38 mov rcx, qword ptr [rsi+56]
488B09 mov rcx, qword ptr [rcx]
F6C101 test cl, 1
7404 je SHORT G_M4522_IG03
488B49FF mov rcx, qword ptr [rcx-1]
G_M4522_IG03:
;; Here's the check if typeof(T) == typeof(GenericStruct<string>)
48B8309598C6FC7F0000 mov rax, 0x7FFCC6989530
483BC8 cmp rcx, rax
;;; branch to the default(T) case if we don't have that type
0F8581000000 jne G_M4522_IG08
;;; all the rest of this down to the ret is from
;;; return (T)(object)new GenericStruct<string>(1, "Hello");
;;; we should be able to avoid the box / unbox / copy
BF01000000 mov edi, 1
48B9C8300090A8020000 mov rcx, 0x2A8900030C8
488B29 mov rbp, gword ptr [rcx]
48B9309598C6FC7F0000 mov rcx, 0x7FFCC6989530
E88508855F call CORINFO_HELP_NEWSFAST
4C8BF0 mov r14, rax
4D8D7E08 lea r15, bword ptr [r14+8]
498D0F lea rcx, bword ptr [r15]
488BD5 mov rdx, rbp
E873161F5F call CORINFO_HELP_CHECKED_ASSIGN_REF
41897F08 mov dword ptr [r15+8], edi
488B4E38 mov rcx, qword ptr [rsi+56]
488B09 mov rcx, qword ptr [rcx]
488BD1 mov rdx, rcx
8BC2 mov eax, edx
83E001 and eax, 1
85C0 test eax, eax
7404 je SHORT G_M4522_IG04
488B52FF mov rdx, qword ptr [rdx-1]
G_M4522_IG04:
85C0 test eax, eax
7404 je SHORT G_M4522_IG05
488B49FF mov rcx, qword ptr [rcx-1]
G_M4522_IG05:
493916 cmp qword ptr [r14], rdx
7408 je SHORT G_M4522_IG06
498BD6 mov rdx, r14
E823D13A5F call CORINFO_HELP_UNBOX
G_M4522_IG06:
498D7608 lea rsi, bword ptr [r14+8]
488BFB mov rdi, rbx
E887171F5F call CORINFO_HELP_ASSIGN_BYREF
48A5 movsq
488BC3 mov rax, rbx
G_M4522_IG07:
4883C428 add rsp, 40
5B pop rbx
5D pop rbp
5E pop rsi
5F pop rdi
415E pop r14
415F pop r15
C3 ret
G_M4522_IG08:
33C0 xor rax, rax
33D2 xor edx, edx
488903 mov gword ptr [rbx], rax
895308 mov dword ptr [rbx+8], edx
488BC3 mov rax, rbx
G_M4522_IG09:
4883C428 add rsp, 40
5B pop rbx
5D pop rbp
5E pop rsi
5F pop rdi
415E pop r14
415F pop r15
C3 ret |
@AndyAyersMS Thanks for your insight on this, and glad to hear it's not just me :) I'll check out the new desktop 4.7.1 CLR after it's released and see if that improves things.
Ahh, I see. That does sound a bit trickier than I originally thought. Will keep an eye on the issue you opened! Thanks again, really appreciated 👍 |
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
|
Summary
While investigating issue #219, it became apparent that the way Duality handles uniform shader variables on the core side is currently lacking. There are no efficient specialized data structures for this task, and there is no support for global shader variables besides a small set of builtin ones. The backend also updates variables too eagerly, causing redundant GL calls.
Analysis
ShaderParameters
class that encapsulates storage of any number of shader-field-to-value mappings that are agnostic to specific shader programs, and use it inBatchInfo
instead of the current two dictionary solution.ShaderParameters
instances for value equality is very efficient.BuiltinShaderFields
intoGlobalShaderFields
class and move it to theDuality.Drawing
namespace. Provide API to allow users to set values manually.Render
call, rather than on a per-material level.ShaderParameters
as well.The text was updated successfully, but these errors were encountered: