Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Uniform Shader Variable Handling #566

Closed
ilexp opened this issue Sep 22, 2017 · 16 comments
Closed

Improve Uniform Shader Variable Handling #566

ilexp opened this issue Sep 22, 2017 · 16 comments
Assignees
Labels
Backend Area: Backend core / editor plugins (OpenTK, etc.) Breaking Change Breaks binary or source compatibility Core Area: Duality runtime or launcher Rendering Related to rendering / graphics Task ToDo that's neither a Bug, nor a Feature Unit Tests Good candidate for adding more tests
Milestone

Comments

@ilexp
Copy link
Member

ilexp commented Sep 22, 2017

Summary

While investigating issue #219, it became apparent that the way Duality handles uniform shader variables on the core side is currently lacking. There are no efficient specialized data structures for this task, and there is no support for global shader variables besides a small set of builtin ones. The backend also updates variables too eagerly, causing redundant GL calls.

Analysis

  • Introduce a ShaderParameters class that encapsulates storage of any number of shader-field-to-value mappings that are agnostic to specific shader programs, and use it in BatchInfo instead of the current two dictionary solution.
    • Consider going with an array-of-structs approach where the struct is an internal union type for storing different kinds of values.
    • Consider using a single float array to store all float value data, e.g. floats, vectors, matrices and arrays of each kind.
    • Make sure comparing two ShaderParameters instances for value equality is very efficient.
    • Make sure retrieving values by name or other means is very efficient.
    • Write unit tests for this container class.
  • Transform the static BuiltinShaderFields into GlobalShaderFields class and move it to the Duality.Drawing namespace. Provide API to allow users to set values manually.
    • Consider removing DrawTechnique virtual API that is used for that purpose for dynamic lighting and similar use cases, provided it could then be replaced with using global variables.
  • Update the backend to only set global variables on a per-program basis in the beginning of a Render call, rather than on a per-material level.
    • Consider also storing the last set parameters in the native shader program implementation and avoid setting them if the new values are still the same. Could actually do this with ShaderParameters as well.
@ilexp ilexp added Backend Area: Backend core / editor plugins (OpenTK, etc.) Breaking Change Breaks binary or source compatibility Core Area: Duality runtime or launcher Rendering Related to rendering / graphics Task ToDo that's neither a Bug, nor a Feature Unit Tests Good candidate for adding more tests labels Sep 22, 2017
@ilexp ilexp added this to the v3.0 milestone Sep 22, 2017
@ilexp
Copy link
Member Author

ilexp commented Sep 25, 2017

Progress

  • Investigated potential ShaderParameters implementation and API.
    • Can't unify all data storage, as ContentRef<Texture> cannot be unified with float data.
    • Store int data at full int precision? Probably not. So far, unifying all numeric values to float internally seems to outweigh the advantages of native int storage.
    • API needs to be extended compared to BatchInfo to allow more convenient get / set of uniform values.
    • WiP code / thoughts available in this gist.
  • Global shader parameters probably shouldn't be global, but local to a DrawDevice.

Immediate ToDo

  • Introduce a ShaderParameters class that encapsulates storage of any number of shader-field-to-value mappings that are agnostic to specific shader programs, and use it in BatchInfo instead of the current two dictionary solution.
    • Start with a naive implementation that mirrors the current BatchInfo implementation.
    • Gradually refine API and implementation to be more usable and efficient.
    • Things that need to be very fast:
      • Duplicating a ShaderParameters instance, until any values are actually changed.
      • Comparing two ShaderParameters instances for value equality.
      • Retrieving values by name or similar.
    • Things that are allowed to be slow-ish:
      • Setting / changing values.
  • Write unit tests for ShaderParameters.
  • Consider removing or replacing BatchInfo, since it will only consist of a ShaderParameters reference combined with a DrawTechnique reference.
  • Transform the static BuiltinShaderFields into a DrawDevice and Camera local ShaderParameters access point and provide API to allow users to set values manually.
    • Consider removing DrawTechnique virtual API that is used for that purpose for dynamic lighting and similar use cases, provided it could then be replaced with using global variables.
  • Update the backend to only set shared / "global" shader variables on a per-program basis in the beginning of a Render call, rather than on a per-material level.
    • Consider also storing the last set parameters in the native shader program implementation and avoid setting them if the new values are still the same. Could actually do this with ShaderParameters as well.

@ilexp ilexp self-assigned this Sep 26, 2017
@ilexp
Copy link
Member Author

ilexp commented Sep 27, 2017

Progress

  • Started prototyping a ShaderParameters implementation in the new develop-3.0-shaderparams-wip branch.
    • Got rid of the old "shared internal data" / "detach-on-write" approach, as it complicated things and is only useful when making copies that are never modified, which is something that should be avoided anyway.
    • Implemented equality checks via hash-only comparison using a 64 bit hash. Will need to see if that works out, false-positives are a no go and hash collisions cannot be tolerated in practice.
  • Wrote unit tests for basic ShaderParameters functionality.
  • Fixed an old hashing and equality bug that was inherited from BatchInfo.

Immediate ToDo

  • Remove BatchInfo.MainColor special case entirely and instead treat it as an actual shader parameter. Consider doing this in a separate develop-3.0-maincolor-removal branch.
  • Finish work on ShaderParameters implementation.
  • Write more unit tests for ShaderParameters.
  • Write API docs for ShaderParameters.
  • Consider removing or replacing BatchInfo, since it will only consist of a ShaderParameters reference combined with a DrawTechnique reference.
  • Transform the static BuiltinShaderFields into a DrawDevice and Camera local ShaderParameters access point and provide API to allow users to set values manually.
    • Consider removing DrawTechnique virtual API that is used for that purpose for dynamic lighting and similar use cases, provided it could then be replaced with using global variables.
  • Update the backend to only set shared / "global" shader variables on a per-program basis in the beginning of a Render call, rather than on a per-material level.
    • Consider also storing the last set parameters in the native shader program implementation and avoid setting them if the new values are still the same. Could actually do this with ShaderParameters as well.

@ilexp
Copy link
Member Author

ilexp commented Sep 27, 2017

Progress

  • Wrote API docs for ShaderParameters and extended API as needed so far.
  • Re-implemented BatchInfo by moving all internal data into a ShaderParameters instance.
    • Adjusted public API of BatchInfo and Material.
    • Adjusted usage in the overall codebase to match new API.
    • BatchInfo no longer overloads equality operators for value equality. Use Equals instead.
    • Material.Info now directly exposes its internal BatchInfo.

Immediate ToDo

  • Fix remaining issues after BatchInfo change.
    • Fix BatchInfoPropertyEditor for Vector3[] / Vector4[] shader variables. For some reason, editors keep being re-created every time, making editing impossible. Consider rewriting that part of the property editor, as it's pretty old and bad code.
    • Check BatchInfo editing for "custom material" use case.
    • Update all sample content to reflect Material changes.
  • Implement a test case to make sure hash collisions in ShaderParameters are so rare that they are in fact negligible.
    • This article shows that for 64 bit hashes, anything below 10 million values to be hashed has a negligible collision probability, assuming equal distribution, which would be way more than enough.
    • However, hash distribution is definitely not equal in the current implementation. The test needs to check whether the implemented algorithm is still good enough.
    • Access to the internal 64 bit hash value will be required for the test. Can probably do a HashSet<long> and assert that no duplicates are added after repeatedly changing a ShaderParams and adding its hash value.
  • Run a benchmark to see if performance changed, compare results with the ones from issue Reduce Render Pipeline Bureaucracy #326.
  • Remove BatchInfo.MainColor special case entirely and instead treat it as an actual shader parameter. Consider doing this in a separate develop-3.0-maincolor-removal branch.
  • Consider removing or replacing BatchInfo, since it will only consist of a ShaderParameters reference combined with a DrawTechnique reference.
  • Transform the static BuiltinShaderFields into a DrawDevice and Camera local ShaderParameters access point and provide API to allow users to set values manually.
    • Consider removing DrawTechnique virtual API that is used for that purpose for dynamic lighting and similar use cases, provided it could then be replaced with using global variables.
  • Update the backend to only set shared / "global" shader variables on a per-program basis in the beginning of a Render call, rather than on a per-material level.
    • Be aware that materials might define values that are also defined in the shared value set. In that case, prefer the shared value over the custom one.
    • Textures will also require special handling, as the shader parameter is just binding to a texture unit index, not the actual texture.
    • Consider also storing the last set parameters in the native shader program implementation and avoid setting them if the new values are still the same. Could actually do this with ShaderParameters as well.

@ilexp
Copy link
Member Author

ilexp commented Sep 28, 2017

Progress

  • Fixed BatchInfoPropertyEditor for Vector3[] / Vector4[] shader variables and rewrote that part of its implementation in a cleaner, more straightforward way.
  • Changed ShaderParameters serialization to write key-value pairs directly, rather than redirecting to its internal dictionaries, which are both XML noise and an implementation detail that may be subject to change.
  • Updated all sample content.
  • Implemented a hash collision test that makes sure that even when reducing hash size by 20 bits, there are still no collisions in a sample size of 1 million items.
  • Ran the benchmark
  • First investigation of allocation regression: They're caused by Get<ContentRef<Texture>> calls.
    • As it turns out, the typeof(T) == typeof(X) pattern is for some reason not optimized away entirely.
    • There's still boxing on (T)(object)value even if T == typeof(value).

Immediate ToDo

  • Fix memory allocations due to ShaderParameters get / set implementations, benchmark again.
    • Consider moving to a multi-overload API instead of generic API.
    • Consider other approaches as well.
  • Benchmark again.
  • Remove BatchInfo.MainColor special case entirely and instead treat it as an actual shader parameter. Consider doing this in a separate develop-3.0-maincolor-removal branch.
  • Consider removing or replacing BatchInfo, since it will only consist of a ShaderParameters reference combined with a DrawTechnique reference.
  • Transform the static BuiltinShaderFields into a DrawDevice and Camera local ShaderParameters access point and provide API to allow users to set values manually.
    • Consider removing DrawTechnique virtual API that is used for that purpose for dynamic lighting and similar use cases, provided it could then be replaced with using global variables.
  • Update the backend to only set shared / "global" shader variables on a per-program basis in the beginning of a Render call, rather than on a per-material level.
    • Be aware that materials might define values that are also defined in the shared value set. In that case, prefer the shared value over the custom one.
    • Textures will also require special handling, as the shader parameter is just binding to a texture unit index, not the actual texture.
    • Consider also storing the last set parameters in the native shader program implementation and avoid setting them if the new values are still the same. Could actually do this with ShaderParameters as well.

@ilexp
Copy link
Member Author

ilexp commented Sep 28, 2017

Current (relevant) API:

public void SetArray(string name, T[] value);
public void Set(string name, T value);

public T[] GetArray<T>(string name);
public T Get<T>(string name);

Potential API when going with a multi-overload design:

public void Set(string name, ContentRef<Texture> value);

public void Set(string name, float[] value);
public void Set(string name, Vector2[] value);
public void Set(string name, Vector3[] value);
public void Set(string name, Vector4[] value);
public void Set(string name, Matrix3[] value);
public void Set(string name, Matrix4[] value);
public void Set(string name, int[] value);
public void Set(string name, Point2[] value);
public void Set(string name, bool[] value);

public void Set(string name, float value);
public void Set(string name, Vector2 value);
public void Set(string name, Vector3 value);
public void Set(string name, Vector4 value);
public void Set(string name, Matrix3 value);
public void Set(string name, Matrix4 value);
public void Set(string name, int value);
public void Set(string name, Point2 value);
public void Set(string name, bool value);


public ContentRef<Texture> GetTexture(string name);

public float[] GetFloatArray(string name);
public Vector2[] GetVector2Array(string name);
public Vector3[] GetVector3Array(string name);
public Vector4[] GetVector4Array(string name);
public Matrix3[] GetMatrix3Array(string name);
public Matrix4[] GetMatrix4Array(string name);
public int[] GetIntArray(string name);
public Point2[] GetPoint2Array(string name);
public bool[] GetBoolArray(string name);

public float GetFloat(string name);
public Vector2 GetVector2(string name);
public Vector3 GetVector3(string name);
public Vector4 GetVector4(string name);
public Matrix3 GetMatrix3(string name);
public Matrix4 GetMatrix4(string name);
public int GetInt(string name);
public Point2 GetPoint2(string name);
public bool GetBool(string name);

Avoid, if it can be avoided. Worth investigating why the typeof(T) optimization doesn't kick in? Could also try to reduce API surface by removing all but the most relevant overloads.

Could be an opportunity for checking out System.Runtime.CompilerServices.Unsafe and see if it can be used in Duality without switching compilers or runtimes - this also means making sure the old NuGet package manager in Duality can handle it when it shows up as a dependency. Edit: On second thought, due to the potential to break compatibility, it might actually be better to do without this dependency for now. Try to solve this via oldschool optimization and API design.

Edit: More info on the typeof(T) optimization for further research as to why it isn't applied:

@AndyAyersMS
Copy link

@ilexp if you point me at a representative example where typeof(T) == typeof(X) left a lot of boxing behind, I'll take a deeper look.

@ilexp
Copy link
Member Author

ilexp commented Sep 29, 2017

@AndyAyersMS Thanks for checking in! I'll try to keep it short:

The use case I'm trying to get optimized boils down to the following:

public T Get<T>(string key) where T : struct
{
	if (typeof(T) == typeof(float))
	{
		// Transform data
		float result = ...;
		return (T)(object)result;
	}
	else if (typeof(T) == typeof(...))
	{
		// ...
	}
	// ...
}

I would expect optimization for Get<float> to end up with something equivalent to:

public T Get<T>(string key) where T : struct
{
	// No type checks!
	float result = ...;
	// Crucial: No boxing either!
	return result;
}

However, this doesn't seem to happen - I get lots of boxing allocations instead that vanish when I use a specialized non-generic method overload. Edit: The unnecessary boxing part of this seems to trigger for certain structs only, definitely not for float as displayed above, see followup comment.

I'm currently investigating this in a separate project and have not managed to get the typeof(T) check optimization to kick in at all. I have uploaded my test project here if you want to take a look, but I'll summarize its contents below:

public static void Main(string[] args)
{
	Console.WriteLine("Begin");

	TestCaseA();
	TestCaseB();
	TestCaseC();
	TestCaseD<int>();
	TestCaseE<int>();

	Console.WriteLine("End");
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static int TestCaseA()
{
	return 1;
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static int TestCaseB()
{
	if (true)
	{
		return 1;
	}
	else
	{
		return 2;
	}
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static int TestCaseC()
{
	if (typeof(int) == typeof(int))
	{
		return 1;
	}
	else
	{
		return 2;
	}
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static int TestCaseD<T>() where T : struct
{
	if (typeof(T) == typeof(T))
	{
		return 1;
	}
	else
	{
		return 2;
	}
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static int TestCaseE<T>() where T : struct
{
	if (typeof(T) == typeof(int))
	{
		return 1;
	}
	else
	{
		return 2;
	}
}

To retrieve a JITed disassembly of each method, I'm building the project in Release mode, run it in Visual Studio, step into each method and switch to Disassembly. "Enable Just My Code" is false and "Suppress JIT Optimizations" is false as well. I'm getting the following disassembly for each method:

TestCaseA:

 mov         eax,1  
 ret  

TestCaseB:

 mov         eax,1  
 ret 

TestCaseC:

 sub         esp,20h  
 mov         rcx,7FF9BEDD9288h  
 call        00007FF9C062BBF0  
 mov         rsi,rax  
 mov         rcx,7FF9BEDD9288h  
 call        00007FF9C062BBF0  
 cmp         rax,rsi  
 jne         00007FF961020556  
 mov         eax,1  
 add         rsp,20h  
 pop         rsi  
 ret  
 mov         eax,2  
 add         rsp,20h  
 pop         rsi  
 ret 

TestCaseD:

 sub         esp,20h  
 mov         rcx,7FF9BEDD9288h  
 call        00007FF9C062BBF0  
 mov         rsi,rax  
 mov         rcx,7FF9BEDD9288h  
 call        00007FF9C062BBF0  
 cmp         rax,rsi  
 jne         00007FF9610005B6  
 mov         eax,1  
 add         rsp,20h  
 pop         rsi  
 ret  
 mov         eax,2  
 add         rsp,20h  
 pop         rsi  
 ret  

TestCaseE:

 sub         esp,20h  
 mov         rcx,7FF9BEDD9288h  
 call        00007FF9C062BBF0  
 mov         rsi,rax  
 mov         rcx,7FF9BEDD9288h  
 call        00007FF9C062BBF0  
 cmp         rax,rsi  
 jne         00007FF961000616  
 mov         eax,1  
 add         rsp,20h  
 pop         rsi  
 ret  
 mov         eax,2  
 add         rsp,20h  
 pop         rsi  
 ret  

I'll have to add that I'm not at all used to reading assembly code or debugging JIT optimizations, so if I took a wrong turn at some point, or you need additional information, please let me know. For the last test, I'm using Visual Studio 2017 (Version 15.2 (26430.15)) on .NET Framework Version 4.7.02046.

Edit: Added two more test cases. C to E seem to be equal except for memory addresses.

@ilexp
Copy link
Member Author

ilexp commented Sep 29, 2017

@AndyAyersMS I did a second test that focuses on the boxing itself, rather than omitting / resolving typeof(T) checks. Test project is here, code and disassembly below:

public struct SmallStruct
{
	public int Value;

	public SmallStruct(int v)
	{
		this.Value = v;
	}
}
public struct BigStruct
{
	public int Value;
	public long Other;

	public BigStruct(int v, long o)
	{
		this.Value = v;
		this.Other = o;
	}
}
public struct GenericStruct<T>
{
	public int Value;
	public T Other;

	public GenericStruct(int v, T o)
	{
		this.Value = v;
		this.Other = o;
	}
}
public class Program
{
	public static void Main(string[] args)
	{
		Console.WriteLine("Begin");

		TestCaseA<SmallStruct>();
		TestCaseB<BigStruct>();
		TestCaseC<GenericStruct<long>>();
		TestCaseD<GenericStruct<string>>();

		Console.WriteLine("End");
	}
	[MethodImpl(MethodImplOptions.NoInlining)]
	public static T TestCaseA<T>() where T : struct
	{
		if (typeof(T) == typeof(SmallStruct))
		{
			return (T)(object)new SmallStruct(1);
		}
		else
		{
			return default(T);
		}
	}
	[MethodImpl(MethodImplOptions.NoInlining)]
	public static T TestCaseB<T>() where T : struct
	{
		if (typeof(T) == typeof(BigStruct))
		{
			return (T)(object)new BigStruct(1, 2);
		}
		else
		{
			return default(T);
		}
	}
	[MethodImpl(MethodImplOptions.NoInlining)]
	public static T TestCaseC<T>() where T : struct
	{
		if (typeof(T) == typeof(GenericStruct<long>))
		{
			return (T)(object)new GenericStruct<long>(1, 2);
		}
		else
		{
			return default(T);
		}
	}
	[MethodImpl(MethodImplOptions.NoInlining)]
	public static T TestCaseD<T>() where T : struct
	{
		if (typeof(T) == typeof(GenericStruct<string>))
		{
			return (T)(object)new GenericStruct<string>(1, "Hello");
		}
		else
		{
			return default(T);
		}
	}
}

Test case D is the one that I was stumbling over in my project, and looking at the disassembly, it is also the only one where boxing is not optimized away:

TestCaseA: Aside from the typeof check that's still there, no boxing with the small struct.

 sub         esp,20h  
 mov         rcx,7FF960F05B18h  
 call        JIT_GetRuntimeType (07FF9C062BBF0h)  
 mov         rsi,rax  
 mov         rcx,7FF960F05B18h  
 call        JIT_GetRuntimeType (07FF9C062BBF0h)  
 cmp         rax,rsi  
 jne         00007FF961010536  
 mov         eax,1  
 add         rsp,20h  
 pop         rsi  
 ret  
 xor         eax,eax  
 add         rsp,20h  
 pop         rsi  
 ret 

TestCaseB: No boxing with the big struct either

 push        rdi  
 push        rsi  
 sub         rsp,28h  
 mov         rsi,rcx  
 mov         rcx,7FF960F05C40h  
 call        JIT_GetRuntimeType (07FF9C062BBF0h)  
 mov         rdi,rax  
 mov         rcx,7FF960F05C40h  
 call        JIT_GetRuntimeType (07FF9C062BBF0h)  
 cmp         rax,rdi  
 jne         00007FF9610105A9  
 mov         eax,1  
 mov         edx,2  
 mov         dword ptr [rsi],eax  
 mov         qword ptr [rsi+8],rdx  
 mov         rax,rsi  
 add         rsp,28h  
 pop         rsi  
 pop         rdi  
 ret  
 xor         eax,eax  
 xor         edx,edx  
 mov         dword ptr [rsi],eax  
 mov         qword ptr [rsi+8],rdx  
 mov         rax,rsi  
 add         rsp,28h  
 pop         rsi  
 pop         rdi  
 ret  

TestCaseC: The generic struct with long generates code equal to the big struct code

 push        rdi  
 push        rsi  
 sub         rsp,28h  
 mov         rsi,rcx  
 mov         rcx,7FF960F05E50h  
 call        JIT_GetRuntimeType (07FF9C062BBF0h)  
 mov         rdi,rax  
 mov         rcx,7FF960F05E50h  
 call        JIT_GetRuntimeType (07FF9C062BBF0h)  
 cmp         rax,rdi  
 jne         00007FF961010629  
 mov         eax,1  
 mov         edx,2  
 mov         dword ptr [rsi],eax  
 mov         qword ptr [rsi+8],rdx  
 mov         rax,rsi  
 add         rsp,28h  
 pop         rsi  
 pop         rdi  
 ret  
 xor         eax,eax  
 xor         edx,edx  
 mov         dword ptr [rsi],eax  
 mov         qword ptr [rsi+8],rdx  
 mov         rax,rsi  
 add         rsp,28h  
 pop         rsi  
 pop         rdi  
 ret  

TestCaseD: The generic struct with string generates this, boxing and "other things (?)"

 push        r15  
 push        r14  
 push        rdi  
 push        rsi  
 push        rbp  
 push        rbx  
 sub         rsp,28h  
 mov         qword ptr [rsp+20h],rdx  
 mov         rbx,rcx  
 mov         rsi,qword ptr [rdx+10h]  
 mov         rcx,qword ptr [rsi]  
 test        ecx,1  
 jne         00007FF961010685  
 jmp         00007FF96101068C  
 mov         rcx,qword ptr [rcx-1]  
 call        JIT_GetRuntimeType (07FF9C062BBF0h)  
 mov         rdi,rax  
 mov         rcx,7FF960F06058h  
 call        JIT_GetRuntimeType (07FF9C062BBF0h)  
 cmp         rax,rdi  
 jne         00007FF961010739  
 mov         edi,1  
 mov         rcx,28990003580h  
 mov         rbp,qword ptr [rcx]  
 mov         rcx,7FF960F06058h  
 call        JIT_TrialAllocSFastMP_InlineGetThread (07FF9C0622510h)  
 mov         r14,rax  
 lea         r15,[r14+8]  
 lea         rcx,[r15]  
 mov         rdx,rbp  
 call        JIT_CheckedWriteBarrier (07FF9C0623E00h)  
 mov         dword ptr [r15+8],edi  
 mov         rcx,qword ptr [rsi]  
 test        ecx,1  
 jne         00007FF9610106F0  
 jmp         00007FF9610106F7  
 mov         rcx,qword ptr [rcx-1]  
 mov         rdx,qword ptr [rsi]  
 test        edx,1  
 jne         00007FF961010704  
 jmp         00007FF96101070B  
 mov         rdx,qword ptr [rdx-1]  
 cmp         qword ptr [r14],rcx  
 je          00007FF96101071B  
 mov         rcx,rdx  
 mov         rdx,r14  
 call        JIT_Unbox (07FF9C06B88C0h)  
 lea         rsi,[r14+8]  
 mov         rdi,rbx  
 call        JIT_ByRefWriteBarrier (07FF9C0623F20h)  
 movs        qword ptr [rdi],qword ptr [rsi]  
 mov         rax,rbx  
 add         rsp,28h  
 pop         rbx  
 pop         rbp  
 pop         rsi  
 pop         rdi  
 pop         r14  
 pop         r15  
 ret  
 xor         eax,eax  
 xor         edx,edx  
 mov         qword ptr [rbx],rax  
 mov         dword ptr [rbx+8],edx  
 mov         rax,rbx  
 add         rsp,28h  
 pop         rbx  
 pop         rbp  
 pop         rsi  
 pop         rdi  
 pop         r14  
 pop         r15  
 ret  

@ilexp
Copy link
Member Author

ilexp commented Sep 29, 2017

Progress

  • Adjusted ShaderParameters API to distinguish between (blittable) value type, array and texture values with different methods, avoiding the previously found boxing allocations.
  • Ran the benchmark again
  • The remaining (slight) increase in allocations is due to temporary BatchInfo instances that are either created or copy-constructed during rendering. With the previous BatchInfo optimizations no longer in place, each of them will always allocate two internal dictionaries. There will be some optimization work to do on ShaderParameters.

Immediate ToDo

  • Fix memory allocations during rendering because of ShaderParameters eagerly allocating internal dictionaries.
    • Consider lazy instantiation and copying.
    • Consider not using dictionaries, but a more efficient data structure internally.
    • Consider internal pooling.
  • Benchmark and profile for allocations again.
  • Remove BatchInfo.MainColor special case entirely and instead treat it as an actual shader parameter. Consider doing this in a separate develop-3.0-maincolor-removal branch.
  • Consider removing or replacing BatchInfo, since it will only consist of a ShaderParameters reference combined with a DrawTechnique reference.
  • Transform the static BuiltinShaderFields into a DrawDevice and Camera local ShaderParameters access point and provide API to allow users to set values manually.
    • Consider removing DrawTechnique virtual API that is used for that purpose for dynamic lighting and similar use cases, provided it could then be replaced with using global variables.
  • Update the backend to only set shared / "global" shader variables on a per-program basis in the beginning of a Render call, rather than on a per-material level.
    • Be aware that materials might define values that are also defined in the shared value set. In that case, prefer the shared value over the custom one.
    • Textures will also require special handling, as the shader parameter is just binding to a texture unit index, not the actual texture.
    • Consider also storing the last set parameters in the native shader program implementation and avoid setting them if the new values are still the same. Could actually do this with ShaderParameters as well.

@AndyAyersMS
Copy link

The jit in 4.7.02046 (aka desktop CLR version 4.7) had a bug fix that inhibited some cases of the type equality optimizations. These were fixed in 4.7.1 which is available as a preview and which should be officially released soon.

On latest CoreCLR I get the following code for your second test case examples. I would expect similar results from 4.7.1 too. If you get a chance to try 4.7.1 and you don't get good results, please let me know.

TestCaseD is clearly not getting optimized. There are challenges here because the CLR will share method implementations over all ref types in TestCaseD, so the type of T is not know to the jit. However once the outcome of the type equality check is known it seems like we ought to be able to do better. I've opened dotnet/coreclr#14256 for this.

;;; Test case A
G_M22036_IG02:
       B801000000           mov      eax, 1

G_M22036_IG03:
       C3                   ret

;;; Test case B
G_M60708_IG02:
       B801000000           mov      eax, 1
       BA02000000           mov      edx, 2
       8901                 mov      dword ptr [rcx], eax
       48895108             mov      qword ptr [rcx+8], rdx
       488BC1               mov      rax, rcx

G_M60708_IG03:
       C3                   ret

;;; Test case C
G_M4495_IG02:
       B801000000           mov      eax, 1
       BA02000000           mov      edx, 2
       8901                 mov      dword ptr [rcx], eax
       48895108             mov      qword ptr [rcx+8], rdx
       488BC1               mov      rax, rcx

G_M4495_IG03:
       C3                   ret

;;; Test case D
G_M4522_IG01:
       4157                 push     r15
       4156                 push     r14
       57                   push     rdi
       56                   push     rsi
       55                   push     rbp
       53                   push     rbx
       4883EC28             sub      rsp, 40
       4889542420           mov      qword ptr [rsp+20H], rdx
       488BD9               mov      rbx, rcx
       488BF2               mov      rsi, rdx

G_M4522_IG02:
       ;; This bit of code is looking up the handle for the actual type of T
       488B4E38             mov      rcx, qword ptr [rsi+56]
       488B09               mov      rcx, qword ptr [rcx]
       F6C101               test     cl, 1
       7404                 je       SHORT G_M4522_IG03
       488B49FF             mov      rcx, qword ptr [rcx-1]

G_M4522_IG03:
       ;; Here's the check if typeof(T) == typeof(GenericStruct<string>)
       48B8309598C6FC7F0000 mov      rax, 0x7FFCC6989530
       483BC8               cmp      rcx, rax
       ;;; branch to the default(T) case if we don't have that type
       0F8581000000         jne      G_M4522_IG08
       ;;; all the rest of this down to the ret is from 
       ;;;     return (T)(object)new GenericStruct<string>(1, "Hello");
       ;;; we should be able to avoid the box / unbox / copy
       BF01000000           mov      edi, 1
       48B9C8300090A8020000 mov      rcx, 0x2A8900030C8
       488B29               mov      rbp, gword ptr [rcx]
       48B9309598C6FC7F0000 mov      rcx, 0x7FFCC6989530
       E88508855F           call     CORINFO_HELP_NEWSFAST
       4C8BF0               mov      r14, rax
       4D8D7E08             lea      r15, bword ptr [r14+8]
       498D0F               lea      rcx, bword ptr [r15]
       488BD5               mov      rdx, rbp
       E873161F5F           call     CORINFO_HELP_CHECKED_ASSIGN_REF
       41897F08             mov      dword ptr [r15+8], edi
       488B4E38             mov      rcx, qword ptr [rsi+56]
       488B09               mov      rcx, qword ptr [rcx]
       488BD1               mov      rdx, rcx
       8BC2                 mov      eax, edx
       83E001               and      eax, 1
       85C0                 test     eax, eax
       7404                 je       SHORT G_M4522_IG04
       488B52FF             mov      rdx, qword ptr [rdx-1]

G_M4522_IG04:
       85C0                 test     eax, eax
       7404                 je       SHORT G_M4522_IG05
       488B49FF             mov      rcx, qword ptr [rcx-1]

G_M4522_IG05:
       493916               cmp      qword ptr [r14], rdx
       7408                 je       SHORT G_M4522_IG06
       498BD6               mov      rdx, r14
       E823D13A5F           call     CORINFO_HELP_UNBOX

G_M4522_IG06:
       498D7608             lea      rsi, bword ptr [r14+8]
       488BFB               mov      rdi, rbx
       E887171F5F           call     CORINFO_HELP_ASSIGN_BYREF
       48A5                 movsq
       488BC3               mov      rax, rbx

G_M4522_IG07:
       4883C428             add      rsp, 40
       5B                   pop      rbx
       5D                   pop      rbp
       5E                   pop      rsi
       5F                   pop      rdi
       415E                 pop      r14
       415F                 pop      r15
       C3                   ret

G_M4522_IG08:
       33C0                 xor      rax, rax
       33D2                 xor      edx, edx
       488903               mov      gword ptr [rbx], rax
       895308               mov      dword ptr [rbx+8], edx
       488BC3               mov      rax, rbx

G_M4522_IG09:
       4883C428             add      rsp, 40
       5B                   pop      rbx
       5D                   pop      rbp
       5E                   pop      rsi
       5F                   pop      rdi
       415E                 pop      r14
       415F                 pop      r15
       C3                   ret

@ilexp
Copy link
Member Author

ilexp commented Sep 29, 2017

@AndyAyersMS Thanks for your insight on this, and glad to hear it's not just me :) I'll check out the new desktop 4.7.1 CLR after it's released and see if that improves things.

TestCaseD is clearly not getting optimized. There are challenges here because the CLR will share method implementations over all ref types in TestCaseD, so the type of T is not know to the jit. However once the outcome of the type equality check is known it seems like we ought to be able to do better.

Ahh, I see. That does sound a bit trickier than I originally thought. Will keep an eye on the issue you opened! Thanks again, really appreciated 👍

@ilexp
Copy link
Member Author

ilexp commented Sep 30, 2017

Progress

  • Adjusted ShaderParameters implementation to be more memory efficient and less eager in making
    new allocations.
    • Instead of two dictionaries, a sorted RawList<ValueItem> is used.
    • Storage space for data is no longer allocated for empty parameter collections.
  • Did another benchmark run.

Immediate ToDo

  • Remove BatchInfo.MainColor special case entirely and instead treat it as an actual shader parameter.
    • Can likely be replaced code-wise with color tint / vertex color in almost all cases.
    • Will require handling in builtin shaders and the currently existing fixed function fallback.
  • Consider removing or replacing BatchInfo, since it will only consist of a ShaderParameters reference combined with a DrawTechnique reference.
  • Consider reworking Canvas with regard to reducing allocations.
    • Allow switching device targets, so Canvas users can use the same instance continuously.
    • Integrate CanvasBuffer into Canvas directly.
    • Buffer the state stack, so repeated push / pop calls won't cause allocations.
    • Consider allowing direct material access and on-demand recalculating material / main texture stats on access.
  • Transform the static BuiltinShaderFields into a DrawDevice and Camera local ShaderParameters access point and provide API to allow users to set values manually.
    • Consider removing DrawTechnique virtual API that is used for that purpose for dynamic lighting and similar use cases, provided it could then be replaced with using global variables.
  • Update the backend to only set shared / "global" shader variables on a per-program basis in the beginning of a Render call, rather than on a per-material level.
    • Be aware that materials might define values that are also defined in the shared value set. In that case, prefer the shared value over the custom one.
    • Textures will also require special handling, as the shader parameter is just binding to a texture unit index, not the actual texture.
    • Consider also storing the last set parameters in the native shader program implementation and avoid setting them if the new values are still the same. Could actually do this with ShaderParameters as well.

@ilexp
Copy link
Member Author

ilexp commented Sep 30, 2017

Progress

  • Created a new develop-3.0-shaderparams-maincolor-wip branch and switched to it for MainColor removal.
  • Removed special color tint / pre-multiply handling for MainColor from all renderers, Canvas and internal drawing code.
  • Introduced MainColor as a new shader variable similar to MainTexture.
  • Added a temporary solution to shader variable default values, likely to be moved or redesigned.
  • Removed fixed function fallback from GraphicsBackend. Instead, the embedded minimal shader program is used when no other shader is set.

Immediate ToDo

  • Finish BatchInfo.MainColor removal.
    • Figure out a proper way to handle shader variable defaults. Can probably be done along the same lines as reflection info retrieval and then be stored runtime- and readonly in the core ShaderProgram Resource. If moving it there, remove the current storage from NativeShaderProgram.
    • Update the BatchInfo property editor to display a variables default value in case it is not defined in the parameters.
    • Check all CamView editor drawings to see if rendering still works as expected. Both bright and dark modes.
    • Check all sample projects whether they're working correctly.
    • Merge back to develop-3.0-shaderparams-wip
  • Consider removing or replacing BatchInfo, since it will only consist of a ShaderParameters reference combined with a DrawTechnique reference.
  • Consider reworking Canvas with regard to reducing allocations.
    • Allow switching device targets, so Canvas users can use the same instance continuously.
    • Integrate CanvasBuffer into Canvas directly.
    • Buffer the state stack, so repeated push / pop calls won't cause allocations.
    • Consider allowing direct material access and on-demand recalculating material / main texture stats on access. Edit: That one could only work, when keeping track of whether the current BatchInfo was already submitted, and doing an internal copy on change in that case, which would mean not exposing the material directly.
  • Transform the static BuiltinShaderFields into a DrawDevice and Camera local ShaderParameters access point and provide API to allow users to set values manually.
    • Consider removing DrawTechnique virtual API that is used for that purpose for dynamic lighting and similar use cases, provided it could then be replaced with using global variables.
  • Update the backend to only set shared / "global" shader variables on a per-program basis in the beginning of a Render call, rather than on a per-material level.
    • Be aware that materials might define values that are also defined in the shared value set. In that case, prefer the shared value over the custom one.
    • Textures will also require special handling, as the shader parameter is just binding to a texture unit index, not the actual texture.
    • Consider also storing the last set parameters in the native shader program implementation and avoid setting them if the new values are still the same. Could actually do this with ShaderParameters as well.

@ilexp
Copy link
Member Author

ilexp commented Oct 1, 2017

Progress

  • Moved default ShaderParameters instance from native shader program to DrawTechnique in anticipation of unifying various shader-related resources in issue Refactor Shader / DrawTechnique Resources #489.
  • Spent some time drafting how to handle shader variable defaults overall, no great conclusion yet.
    • ShaderParameters doesn't and shouldn't know about defaults.
    • BatchInfo could know about defaults because of its DrawTechnique reference, but it cannot apply them transparently, because that would mean hiding and wrapping its internal ShaderParameters instance.
    • From a code perspective, batchInfo.MainColor might return transparent black (value not defined), while the actual rendering uses solid white (as specified in the technique defaults).
    • From an editor perspective, the latest plan to display defaults instead would mean splitting code / editor views. Not a very intuitive or great solution.
  • Moved Canvas rework ideas to issue Rework Canvas to Reduce Allocations #568.

Immediate ToDo

  • Figure out how to handle shader variable defaults. First idea for a potential approach:
    • Change ShaderParameters API to force users to acknowledge undefined variables, for example by using a TryGet approach and removing convenience MainColor / MainTexture API. No silent fallbacks, treat it as a dictionary with Set / TryGet only.
    • Introduce BatchInfo and Material wrapper API for imperative ("assume all values exist") parameter retrieval that falls back to technique defaults.
    • In that case, there would be no split between code / editor views, but the APIs of ShaderParameters vs BatchInfo / Material would have different semantics. If this approach is implemented, that semantic difference needs to be made clear to both core developers and API users. Could capitalize on this by:
      • Providing a Keys property.
      • Renaming it to ShaderParameterCollection.
      • Maybe not even exposing it in Material and BatchInfo?
  • Finish BatchInfo.MainColor removal.
    • Check all CamView editor drawings to see if rendering still works as expected. Both bright and dark modes.
    • Check all sample projects whether they're working correctly.
    • Merge back to develop-3.0-shaderparams-wip
  • Transform the static BuiltinShaderFields into a DrawDevice and Camera local ShaderParameters access point and provide API to allow users to set values manually.
    • Consider removing DrawTechnique virtual API that is used for that purpose for dynamic lighting and similar use cases, provided it could then be replaced with using global variables.
  • Update the backend to only set shared / "global" shader variables on a per-program basis in the beginning of a Render call, rather than on a per-material level.
    • Be aware that materials might define values that are also defined in the shared value set. In that case, prefer the shared value over the custom one.
    • Textures will also require special handling, as the shader parameter is just binding to a texture unit index, not the actual texture.
    • Consider also storing the last set parameters in the native shader program implementation and avoid setting them if the new values are still the same. Could actually do this with ShaderParameters as well.

@ilexp
Copy link
Member Author

ilexp commented Oct 3, 2017

Progress

  • Renamed ShaderParameters to ShaderParameterCollection and changed its semantics to a strict, minimal key-value store with a Set / TryGet API and no shortcut properties.
  • Moved convenience and imperative get API to BatchInfo (mirrored in Material) and implemented a fallback to technique default in there.
  • Removed direct ShaderParameterCollection access from BatchInfo and Material.
  • Removed old color-and-material constructors from BatchInfo and Material, as they were almost unused.

Immediate ToDo

  • Finish BatchInfo.MainColor removal.
    • Check all CamView editor drawings to see if rendering still works as expected. Both bright and dark modes.
    • Check all sample projects whether they're working correctly.
    • Merge back to develop-3.0-shaderparams-wip
  • Transform the static BuiltinShaderFields into a DrawDevice and Camera local ShaderParameters access point and provide API to allow users to set values manually.
    • Consider removing DrawTechnique virtual API that is used for that purpose for dynamic lighting and similar use cases, provided it could then be replaced with using global variables.
  • Update the backend to only set shared / "global" shader variables on a per-program basis in the beginning of a Render call, rather than on a per-material level.
    • Be aware that materials might define values that are also defined in the shared value set. In that case, prefer the shared value over the custom one.
    • Textures will also require special handling, as the shader parameter is just binding to a texture unit index, not the actual texture.
    • Consider also storing the last set parameters in the native shader program implementation and avoid setting them if the new values are still the same. Could actually do this with ShaderParameters as well.

@ilexp
Copy link
Member Author

ilexp commented Oct 7, 2017

Progress

  • Wrapped up BatchInfo.MainColor special case removal.
  • Introduced shared shader parameters for Camera and DrawDevice.
  • Added per-program evaluation and overrides of shared shader variables in the graphics backend.
  • Replaced builtin shader field handling with automatically set DrawDevice shared shader variables.
  • Updated DynamicLighting sample to use shared shader variables for lighting setup. Since this no longer requires per-object customization of materials, dynamically lit objects can now be batched.
  • Removed support for preparation DrawTechniques, since the only use case so far can now be covered with shared shader variables in a much more efficient way.
  • Reduced DrawDevice allocations slightly by re-using RenderOptions and RenderStats instances.
  • Updated benchmark measurements:
  • Merged back to develop-3.0.

@ilexp ilexp closed this as completed Oct 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend Area: Backend core / editor plugins (OpenTK, etc.) Breaking Change Breaks binary or source compatibility Core Area: Duality runtime or launcher Rendering Related to rendering / graphics Task ToDo that's neither a Bug, nor a Feature Unit Tests Good candidate for adding more tests
Development

No branches or pull requests

2 participants