-
Notifications
You must be signed in to change notification settings - Fork 2.7k
CoreFx #15622 Dictionary<TKey, TValue>.Remove(TKey, out TValue) #10203
Conversation
@@ -592,11 +592,19 @@ private void Resize(int newSize, bool forceNewHashCodes) | |||
|
|||
public bool Remove(TKey key) | |||
{ | |||
TValue unused; | |||
return Remove(key, out unused); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In C# 7.0, this might be a place to use out *
in place of out unused
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't seem to compile in our compiler right now. If it did, I think it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual C# 7.0 syntax is out var _
. Using that seems to compile fine for me.
@@ -614,6 +622,9 @@ public bool Remove(TKey key) | |||
{ | |||
entries[last].next = entries[i].next; | |||
} | |||
|
|||
value = entries[i].value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be slightly clearer if this line went directly after the if on line 615
if (key == null) | ||
{ | ||
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key); | ||
} | ||
|
||
value = default(TValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could move just before the return false.
@@ -614,6 +622,9 @@ public bool Remove(TKey key) | |||
{ | |||
entries[last].next = entries[i].next; | |||
} | |||
|
|||
value = entries[i].value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation of public bool Remove(TKey key, out TValue value)
penalizes many users of public bool Remove(TKey key)
by introducing an unnecessary copy (and initialization) of the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikedn Any suggestions? A branch? Duplicate the body of the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicating the code might be the only option that doesn't impact performance. But it too has its cost since we're looking at a generic type. Using a branch to avoid the unnecessary copy might be a reasonable compromise.
But before doing any of that the performance impact should be measured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danmosemsft @mikedn Thanks for the review comments. I will work on getting the performance data. This review gives me a close and significant insight into working of C# and how to think differently when wearing a C# vs C++ hat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikedn hmm. A few questions, mainly as a learning for me...
Duplicating the code might be the only option that doesn't impact performance. But it too has its cost since we're looking at a generic type.
- Are we referring to maintenance cost? But then what is special in context of generic type... Seems there is more here... Please share.
Using a branch to avoid the unnecessary copy might be a reasonable compromise.
-
In non-duplication way, I'm not able to understand how we could avoid assignment at line 626, since we have to squirrel away the value before being overwritten. I think I am not able to understand what is meant by branch here. Appreciate your inputs.
-
I was curious why the C# compiler would not inline entire
Remove(key, out value)
into theRemove(key)
and distill off the unnecessary copies via intermediate code optimizations. For long I have always wondered why the chained constructors also won't inline (as seen in IL for even Release mode).
Performance data points I will collect
a. Old Remove implementation
b. Nested Remove implementation as in the PR with code re-org.
c. Branched version (but I need to be sure I understand what is meant by branch in this context).
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we referring to maintenance cost? But then what is special in context of generic type...
Maintenance cost too since we'll end up with 2 very similar pieces of code. But this is a generic type, this means that if you duplicate that code you're not adding just a single method, you're adding many more - one for each Dictionary
instantiation where TValue
is a value type (e.g. one for Dictionary<int, int>
, another one for Dictionary<int, bool>
and so on).
In non-duplication way, I'm not able to understand how we could avoid assignment at line 626...
I don't know exactly what @danmosemsft had in mind when he said branch but one possible option is to pass a bool
parameter to the method indicating whether to perform the copy or not. That would imply moving all this code to a new private method and having both Remove
overloads call that method.
I was curious why the C# compiler would not inline entire Remove(key, out value) into the...
The C# compiler doesn't do inlining, that's the job of the JIT compiler since inlining normally needs to take into account various "low level" aspects (e.g. native code size, calling convention). Anyway, there's too much code to inline.
Performance data points I will collect
Make sure to measure the case of large value types. I don't expect this change to have a measurable impact on Dictionary<int, int>
but the story may be different for something like Dictionary<int, Matrix>
.
@WinCPP could you please do some ad-hoc performance measurements? A simple console app is fine. This should help: https://github.com/dotnet/coreclr/blob/master/Documentation/workflow/UsingYourBuild.md Even easier, just install 2.0 and paste in the Dictionary code into your own code -- both the old and new implementations (I'd try a branch to avoid the copy). Make sure to build release. Then I'd set up some dictionaries with different numbers of entries and confirm the timings with each implementation. Timing with Stopwatch is fine, and you'd probably do a bunch of iterations and drop outliers and get the standard deviation. Perhaps @mikedn can point to a gist of code someone else has used for this purpose. I haven't got one handy. |
Or even something simple like this one: |
Thanks @danmosemsft @safern for the guidance. I will work on the performance part. |
For the performance testing, I want to use my local clr + corefx. I'm stuck in that. Please help. I want to use local CoreCLR because it has implementation and CoreFx because it has facade.
But the
|
@WinCPP it seems like the package name/version is wrong in your csproj. Would you mind sharing the
That you added on your csproj? |
@safern please find below the csproj contents: <Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>netcoreapp2.0</TargetFramework>
<RuntimeFrameworkVersion>2.0.0-beta-001776-00</RuntimeFrameworkVersion>
<RuntimeIdentifier>win7-x64</RuntimeIdentifier>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.Private.CoreFx.NETCoreApp" Version="4.4.0-beta-25211-0" />
</ItemGroup>
</Project> As mentioned in the step in the doc, I made sure that the Update: I also tried by changing
|
Going by the error string that I'm getting,
does it mean that I have to build by local CoreClr and / or CoreFx with .NETCoreApp version 2.0? And then only I can use my local bits...? Because following is the output for
|
Note that ".NET Command Line Tools" version != .NET Core version. The tools can target multiple .NET Core versions (like VS). Here's example of .NET Core only API addition, if that helps: https://github.com/dotnet/corefx/issues/1942 |
@karelz The link discusses about performance, but I could not get the fix for my issue... Actually there are two local builds to be used CoreClr and CoreFx and now I notice that each of them has user your build / dogfooding documents here and here respectively, with former pointing to latter. Steps in the former document for using local CoreClr bits worked perfectly. However this being a case of adding new API with implementation in CoreClr and facade in CoreFx, I thought I have to do the steps in CoreFx dogfooding document as well; and there it fails. So are the steps for using local CoreFx not to be done and, just do steps for local CoreClr? Then, how would I get the new API on CoreFx facade? |
I was still not able to solve the above problems so I abandoned that approach. I made copy of System.Collections.Tests and modified its contents and added performance tests for only two implementations of Remove: API implementation
OLD - bool Remove(TKey key)
NEW - bool Remove(Tkey key) that internally uses Remove(TKey key, out value) Following is the performance data that I got. Please let me know if it makes sense.
|
Thanks @WinCPP . This is a very hot method, so we don't want to take a 10% performance regression. I would try this approach now
Unless @jkotas has another idea. |
Well, the Dictionary is big and fat type already and each extra convenience APIs is making it fatter. The lookup loop is replicated in the implementation many times, so duplicating it one more time is hardly going to register on the radar. If we wanted to avoid this duplication, I think that the naïve implementation would be the best way to do it: if(!TryGetValue(key, out value)) return false;
Remove(key);
return true; Yes, it is not as fast as possible, but I do not see a problem with it. Dictionary is not the fastest collection for any given case. It is a collection that is fast enough for most common uses. The performance effects of adding every convenience API imaginable over time to the common collections is something for fxdc to think about. I am not sure whether this particular API would made it if it was my call... . |
@jkotas what kind of performance impact do you see in adding more convenience APIs? Size of the type? Code locality? |
@danmosemsft There is an out parameter that will have to be mandatorily assigned before leaving the private method. Essentially something like this, // Following is the core for the private method
if (FoundInDictionary)
{
if (WantTheValueOut)
{
value = entries[i].value;
}
else
{
value = default(TValue); // out parameter hence all return paths should assign a value
}
// Do the actual remove stuff
//
return true;
}
value = default(TValue);
return false; Therefore, a copy or a create is inevitable and will penalize the old I will capture stats for Just a layman's wishful thinking... I always wish if inlining for compilers could be smart to detect plain vanilla function overloads in which one wraps the other ( |
Yes, size of the type - that gets multipled for generic types. Contributes to disk footprint, startup time, ... .
Then they can be extension methods against the interface... . |
What does that solve? You still end up with a bunch of additional native code being generated. To add some numbers to this story: the existing |
Maybe we can build both |
Current find entry returns the index for the found entry by abstracting the direct bucket / entry list. Remove APIs work at the found location to adjust the root / next links for the buckets / entries to effect a removal; essentially this removal code is again a common block which is in fact dependent on results of the FindEntry functional block. It is between these two common functional blocks that the second Hope I am making sense... |
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key); | ||
} | ||
|
||
if (buckets != null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to duplicate this for loop code, we have a private method called FindEntry
which returns the index to that entry and if it is not found it returns -1 but it won't update the last value so we could probably add a FindEntryRemove(key)
where it finds the entry and does the bucket/entries update and then return the value... something like:
private int FindEntryRemove(TKey key)
{
if (buckets != null)
{
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
int bucket = hashCode % buckets.Length;
int last = -1;
for (int i = buckets[bucket]; i >= 0; last = i, i = entries[i].next)
{
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key))
{
if (last < 0)
{
buckets[bucket] = entries[i].next;
}
else
{
entries[last].next = entries[i].next;
}
return i;
}
}
}
return -1;
}
And then both removes could be implemented around that private API, something like:
int i = FindEntryRemove(key);
if (i > 0)
{
value = entries[i].value;
FreeEntry(i);
return true;
}
value = default(TValue);
return false;
And we could add a private method called FreeEntry(int index), that basically does:
entries[i].hashCode = -1;
entries[i].next = freeList;
entries[i].key = default(TKey);
entries[i].value = default(TValue);
freeList = i;
freeCount++;
version++;
@jkotas @danmosemsft any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether it would be a win. You would need to collect throughput and size data to tell what is better. I am ok with the implementation in one big method as is.
I would rather look into how to make the native code generated for this implementation smaller. For example, caching pointer to entries[i]
in a local may help to reduce native code size speed and improve speed: ref Entry entry = ref entries[i];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@safern I believe this is in the pursuit of reducing duplicated code. But then the duplication, although restricted to few lines, happens between FindEntry
and FindEntryAndRemove
for the finding part. Unless we want to merge them and pass a boolean
to indicate 'remove' that will default to false
...? ... potentially introducing a branch on the paths where existing FindEntry
is used. Either ways, the method FindEntryAndRemove
still feels incomplete from design point of view because we have removed but not free'd the entry and freeing part comes as post fix...
- Curiosity question, wouldn't the C# compiler perform local CSE on the four lines which fetch the entry ... i.e.
entry[i].*
lines... with effect being that the entry is computed only once? This is based on my experience with C / C++ compilers though... - How does the usage of
default(TValue)
work? Does it initialize and then do a memberwise copy to destination (due to operator =) or, does it directly construct the default into memory referenced by the destination? If it is latter, does it still do memberwise init or it does sort of memset to zeros using span APIs? Thought in my head: cycles spent in memberwise init to default for a value type will be proportional to its size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C# compiler does very few trivial optimizations only. Optimizations like CSE are done by the JIT. Whether or not the optimization kicks in depends on number of factors. The best way to know for sure is to look the disassembly of the generated code and measure.
One thing to keep in mind is that the array indexing in C# is bound-checked. It restricts the optimizations that the JIT can do transparently, for example it cannot do the bound-check speculatively because of it would result in spurious exceptions.
I have glanced over the native code generated for Dictionary<int,int>.Remove
. The CSE for entry[i]
kicks in some cases, but not others. If you cache ref Entry entry = ref entries[i];
at the start of the loop and replace all 9 occurrences of entries[i]
with it, the native code generated for this method should be approximately 20% smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, I think this one is fine to merge as is, and the optimizations for both the new and existing methods should be done in separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas thanks for the elaborate answer. I helps me understand more internals. I had not given a thought that the operator []
would do bounds checking... took it for plain simple C / C++ array access.
{ | ||
if (key == null) | ||
{ | ||
value = default(TValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of setting the value on exception thrown? It is not how similar method works. For example, Dictionary.TryGetValue
won't set the value on exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas hmmm as I mentioned earlier, in the initial code I did leave out assigning default to the out
parameter referring to some method but I am not able to find out which. I did the latest change based on Dictionary.TryGetValue
itself which actually assigned default value on false path (and in fact there is no exception since it is a Try
method)... please refer here.
So looked up MSDN for what should be the behavior of an out
parameter in C# (link) and it says this...
Although variables passed as out arguments do not have to be initialized before being passed, the called method is required to assign a value before the method returns.
And then this aligned with the TryGetValue
behavior mentioned previously and also with COM which mentions that callee is supposed to assign value to out parameter even on E_FAIL, in which case it may be a NULL pointer... hence...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although variables passed as out arguments do not have to be initialized before being passed, the called method is required to assign a value before the method returns.
This is only applicable when the method returns to its caller; if it instead throws an exception, this does not apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub Thanks for the clarification. Removing.
Assigning on the false path is fine. I am questioning assigning on the exception throwing path. Dictionary.TryGetValue does not assign on exception throwing path either. |
@jkotas ... yup got it. Removing. Now I recollect, the exception was inside |
Hmm... then as for the behavior of the API, can we say that the value in the Between... just curiosity.... the other |
|
The value is undefined in such situations, but ideally in most situations if the target previously had a value assigned it would remain unchanged. From a language perspective, |
Thanks @jkotas @stephentoub . I'll update the test case to expect that the contents of the out parameter remain unaltered... |
@safern @jkotas @stephentoub Please review the latest commit. Thanks. |
@@ -628,6 +628,48 @@ public bool Remove(TKey key) | |||
return false; | |||
} | |||
|
|||
public bool Remove(TKey key, out TValue value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a comment explaining that we're duplicating the code of Remove(TKey) for performance reasons. Othrwise someone will want to refactor it later...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... But then @jkotas has opened #10883 ... would that cover this as well...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#10883 is not about eliminating the code duplication between the different overloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops ok. @danmosemsft @jkotas I have added a comment above both overloads. Kindly review the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but Jan should sign off also.
Looks good thanks for doing this @WinCPP ! Nit: if you'd made the comment addition in a 2nd commit, rather than squashing locally, we could easily see it was the only change you make. When we merge, there is a squash button so there's no need to squash everything yourself. |
I'll merge when CI's green. |
@dotnet-bot test tizen armel cross debug build (https://github.com/dotnet/coreclr/issues/10897) |
@dotnet-bot test tizen armel cross release build (https://github.com/dotnet/coreclr/issues/10897) |
@danmosemsft Oops ok. Will keep that in mind next time onwards. |
Ignoring Tizen as per #10897 |
Even when the method doesn't use any of the generic params? Like |
Yes, it is the case today. |
New API to return the value associated with key being removed, if the key is present.
@karelz @danmosemsft Kindly review.