Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary (etc) optimization suggestion, use readonly ref #24178

Closed
akraines opened this issue Nov 19, 2017 · 10 comments
Closed

Dictionary (etc) optimization suggestion, use readonly ref #24178

akraines opened this issue Nov 19, 2017 · 10 comments
Labels
area-System.Collections design-discussion Ongoing discussion about design without consensus enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@akraines
Copy link

Now that readonly ref (in) is being added to the language, Dictionary.Contains, TryGetValue etc. can all use "in" parameters as they only read from the input. This would improve dictionary lookups of keys which value-tuples since they would be passed by ref and not by value.
Really this is relavent for any method that doesn't modify its parameters. This may be a clr/jit level optimization, since this can be identified by an analyser.

@jkotas
Copy link
Member

jkotas commented Nov 20, 2017

Modern calling conventions have this optimization built-in: Small value types are passed by value and large value types are passed by reference in the JIT code on x64 and arm64 today.

@akraines akraines reopened this Nov 20, 2017
@akraines
Copy link
Author

If that is the case then what is the big gain of passing a struct by ref? Until now, I understood that value types pass by value and passing by ref is an optimization. I therefore suggested the above optimization since it is easy to implement with the new "in" feature as it doesn't effect call sites.
How does the Jitter know if it should be pass a ref of the struct or a copy? Sometimes the called method may want a copy in order to modify it. Is this all included in the Jit optimization, that it detects how the struct is used?
This is important to me because based on your answer it means that it is unnecessary for me to mark my performance critical methods which receive structs (value-tuples) as "in" parameters, since the jitter will work this our for me. Is this correct?
Thanks

@jkotas
Copy link
Member

jkotas commented Nov 20, 2017

I understood that value types pass by value

Value types are logically passed by value. The low-level implementation implementation may be actually passing pointers to them around.

passing by ref is an optimization

Passing by ref is primarily a semantic change - the function can modify the value passed in. The different performance characteristics are secondary effect.

How does the Jitter know if it should be pass a ref of the struct or a copy?

The JIT follows the standard platform calling conventions. The exact rules differ from platform to platform. For example, https://docs.microsoft.com/en-us/cpp/build/parameter-passing describes how arguments are passed on Windows x64. The sentence relevant to this discussion is Structs/unions other than these sizes are passed as a pointer to memory allocated by the caller..

This is important to me because based on your answer it means that it is unnecessary for me to mark my performance critical methods which receive structs (value-tuples) as "in" parameter

You should measure and understand the performance effects of in on your code. in is double edged sword. It can both help or hurt the performance depending on the exact pattern. Also, it has a subtle semantic differences.

@mikedn
Copy link
Contributor

mikedn commented Nov 20, 2017

Dictionary.Contains, TryGetValue etc. can all use "in" parameters as they only read from the input

Nope because that would be a breaking change. Besides, passing things by ref is only (potentially) better for large value types. For reference types and small value types passing by ref is basically a de-optimization.

Value types are logically passed by value. The low-level implementation implementation may be actually passing pointers to them around.

Yes but when they do pass a pointer around they first make a copy of the value being passed as a parameter and pass a pointer to that copy. Passing values by ref avoids that copy.

@akraines
Copy link
Author

The pros and cons of the new "in" parameter are much clearer now.
It is still not clear why it would be a breaking change, but that is not relevant anymore.
Thanks

@mikedn
Copy link
Contributor

mikedn commented Nov 20, 2017

It is still not clear why it would be a breaking change

It depends on what you mean by "Dictionary.Contains, TryGetValue etc. can all use "in" parameters as they only read from the input". To me it sounds as if you're suggesting to change the existing methods and that's a breaking change.

An alternative is to add overloads (e.g. bool ContainsKey(ref TKey key)). Doable, though it adds more public APIs that people need to understand how and when to use.

@akraines
Copy link
Author

AFAIK, changing the implementation of Dictionary.Contains(T) to Dictionary.Contains(in T) would still work for the interface Dictionary.Contains(T), and therefore wouldn't be a breaking change. This of course assumes that the other aspects of the dictionary interface also can be assumed not to mutate the key, because abvously changing assumptions is a breaking change.
(Of course, you pointed out this would be a regression in performance in many cases, but that is not what we are discussing any more)

@stephentoub
Copy link
Member

AFAIK, changing the implementation of Dictionary.Contains(T) to Dictionary.Contains(in T) would still work for the interface Dictionary.Contains(T), and therefore wouldn't be a breaking change

What do you mean by changing the implementation? Changing from Dictionary.Contains(T) to Dictionary.Contains(in T) changes the API signature and is a breaking change.

@mikedn
Copy link
Contributor

mikedn commented Nov 21, 2017

Changing from Dictionary.Contains(T) to Dictionary.Contains(in T) changes the API signature and is a breaking change.

Yes, it looks like there is some confusion about what in really is. It looks like such a change would be source level compatible due to in not being required at the callsite and local temporaries being generated when literals are used as arguments (e.g. ContainsKey(42) would work).

But of course, in IL/metadata in is byref, like ref and out. So this breaks binary compatibility and possibly introduces other issues (e.g. worse performance).

@akraines
Copy link
Author

That is exactly what I missed. Thanks for clarifying.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Collections design-discussion Ongoing discussion about design without consensus enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

No branches or pull requests

5 participants