-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize RAR's GetReferenceItems method #5929
Conversation
Changes "true" to "True" and "false" to "False", but that's ok because it should be case-insensitive.
This means we don't have to repeatedly unset it.
…emoved once nonForwardableMetadata?.Remove(ItemMetadataNames.*) is used to prevent the correct metadata value from being overwritten when metadata are re-added from nonForwardableMetadata.
src/Utilities/TaskItem.cs
Outdated
CopyOnWriteDictionary<string> copiedMetadata = _metadata.Clone(); // Copy on write! | ||
foreach (KeyValuePair<string, string> entry in destinationAsTaskItem.Metadata) | ||
{ | ||
copiedMetadata[entry.Key] = entry.Value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify the purpose of this change? Is this to optimistically optimize the case where you are copying metadata values that exactly match the destination (in value and count) - in this case the foreach here will not cause a "copy" of the dictionary because it will just be writing the same values? If that's the case, why not destinationAsTaskItem.Metadata?.Count == _metadata?.Count
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was previously an optimization for if destinationAsTaskItem.Metadata was null, i.e., its count was 0, but there are also cases in which its count is small compared to _metadata?.Count. This is trying to improve that case, where there are a few metadata defined for destinationAsTaskItem but not as many as for the source. In that case, it's better to modify a clone of the source rather than modifying the destination directly.
src/Utilities/TaskItem.cs
Outdated
CopyOnWriteDictionary<string> copiedMetadata = _metadata.Clone(); // Copy on write! | ||
foreach (KeyValuePair<string, string> entry in destinationAsTaskItem.Metadata) | ||
{ | ||
copiedMetadata[entry.Key] = entry.Value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be escaping, as on line 331? I'm not sure, but you'll want to make sure there's a test. IIRC, everything is always escaped when it enters the MSBuild engine, and unescaped on the way out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copying metadata from a TaskItem to a TaskItem, so its escaped status should be the same in both cases. Just copying it over without escaping or unescaping it makes sense to me.
Line 331 is for ITaskItems that are not ITaskItem2s, and this will always be an ITaskItem2, so if some escaping/unescaping is warranted, it should be replacing entry.Value with destinationAsTaskItem.GetMetadataValueEscaped(entry.Key), but that's less performant, and I don't understand why that would be warranted. Doesn't mean I'm right, though.
{ | ||
referenceItem.SetMetadata(ItemMetadataNames.copyLocal, "false"); | ||
} | ||
referenceItem.SetMetadata(ItemMetadataNames.copyLocal, reference.IsCopyLocal.ToString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert to ternary operator
Also, comparing to a null value apparently always returns false, so the check I was using was bogus. This fixes that.
src/Utilities/TaskItem.cs
Outdated
{ | ||
destinationAsTaskItem.Metadata = _metadata.Clone(); // Copy on write! | ||
CopyOnWriteDictionary<string> copiedMetadata; | ||
if (destinationAsTaskItem.Metadata == null || destinationAsTaskItem.Metadata.Count < _metadata.Count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment, please. And I think I personally would find it clearer if there were three explicit cases:
- Destination has no metadata.
- Destination has less metadata than source.
- Source has more metadata than destination.
@@ -2777,7 +2765,6 @@ private ITaskItem SetItemMetadata(List<ITaskItem> relatedItems, List<ITaskItem> | |||
// Clone metadata. | |||
referenceItem.CopyMetadataTo(item); | |||
// We don't have a fusion name for scatter files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these comments please.
// Set the FusionName metadata properly. | ||
referenceItem.SetMetadata(ItemMetadataNames.fusionName, fusionName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "properly" mean? In general, how do you know this is safe? Is the local variable fusionName
guaranteed to be empty for all the paths where it was previously unset? How do you know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any of reference.GetSourceItems() has a fusionName, or reference.PrimarySourceItem has one, fusionName may have been set to an invalid value at this point. We remove it before forwarding referenceItem's metadata to other items, but we still need to set it properly at the end, i.e., give it the right fusionName.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind clarifying what you meant about fusionName being guaranteed to be empty? Previously, fusionName was set to a value near the top of this method, that value was passed on to a variety of "relatedItems," "satelliteItems," etc., and then it was unset from all of those other items. Here, instead, I'm unsetting it prior to using fusionName for any of those, and only giving it its value at the bottom, so CopyMetadataTo automatically skips fusionName.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The meta comment is: "properly" is basically useless in a code comment. Comments describe what you want the code to do/why it's done, and of course you want to do things "properly". Instead, describe details. Here, you might have a comment that fusionName
should be set last, so that it doesn't get copied to the derived items but is still available on the primary item.
item.RemoveMetadata(ItemMetadataNames.winmdImplmentationFile); | ||
item.RemoveMetadata(ItemMetadataNames.imageRuntime); | ||
item.RemoveMetadata(ItemMetadataNames.winMDFile); | ||
Dictionary<string, string> metadata = new Dictionary<string, string>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a more descriptive name for this variable? removedMetadata
?
if (!String.IsNullOrEmpty(meta)) | ||
{ | ||
metadata.Add(ItemMetadataNames.targetPath, meta); | ||
} | ||
item.RemoveMetadata(ItemMetadataNames.targetPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this one has to be next to the add to the new list, can you move the others to be next to their equivalents too?
List<string> relatedFileExtensions = reference.GetRelatedFileExtensions(); | ||
List<string> satellites = reference.GetSatelliteFiles(); | ||
List<string> serializationAssemblyFiles = reference.GetSerializationAssemblyFiles(); | ||
string[] scatterFiles = reference.GetScatterFiles(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My gut feeling is that these will all be small enough that keeping them live all at once instead of in sequence won't be a memory problem. But do you have data on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. How would I get such data? Build OrchardCore and...somehow check whether something is being paged out?
// Set the FusionName metadata properly. | ||
referenceItem.SetMetadata(ItemMetadataNames.fusionName, fusionName); | ||
|
||
if (nonForwardableMetadata != null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment please
referenceItem.SetMetadata(ItemMetadataNames.winmdImplmentationFile, Path.GetFileName(reference.ImplementationAssembly)); | ||
nonForwardableMetadata?.Remove(ItemMetadataNames.winmdImplmentationFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These inverted remove-from-the-removal-list things definitely need a comment please.
@@ -3455,7 +3455,7 @@ public void PrimaryFXAssemblyRefIsNotCopyLocal() | |||
|
|||
Assert.Single(t.ResolvedFiles); | |||
Assert.Equal(Path.Combine(s_myVersion20Path, "System.Data.dll"), t.ResolvedFiles[0].ItemSpec); | |||
Assert.Equal("false", t.ResolvedFiles[0].GetMetadata("CopyLocal")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert all these please since you went back to the ternary.
// This happens when the redist root is "null". This means there | ||
// was no IsRedistRoot flag in the Redist XML (or there was no | ||
// redist XML at all for this item). | ||
referenceItem.SetMetadata(ItemMetadataNames.isRedistRoot, (bool)reference.IsRedistRoot ? "true" : "false"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cast stands out. Should it be
referenceItem.SetMetadata(ItemMetadataNames.isRedistRoot, (bool)reference.IsRedistRoot ? "true" : "false"); | |
referenceItem.SetMetadata(ItemMetadataNames.isRedistRoot, reference.IsRedistRoot == true ? "true" : "false"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both ways work. We did a null check just before this, so it's either true or false.
src/Utilities/TaskItem.cs
Outdated
else | ||
{ | ||
copiedMetadata = destinationAsTaskItem.Metadata.Clone(); | ||
copiedMetadata.SetItems(Metadata.Where(entry => !destinationAsTaskItem.Metadata.ContainsKey(entry.Key))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unsure between using Where or setting them one-by-one in this case. I did some benchmarking, and this is what I got:
Note that for each test case, I added NumStrings / 10
items to an ImmutableDictionary with NumStrings
items in it previously.
You can see that using Where had fewer allocations for all three sizes. (Base is creating but not combining the dictionaries, so Where actually allocated very little.) It lost on time, though, unless the dataset was large, and I don't have data as to whether that's expected sometimes, always, or never. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you instead do copiedMetadata = Metadata.Clone().SetItems(destinationAsTaskItem.Metadata)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and honestly, there are probably a lot of cases in which that would be better than what I currently have. What I'm trying to optimize for is adding metadata from a small dictionary to a large dictionary, and in that case, it would be better to add a few items individually rather than using SetItems to set a lot of items. I can run a test for that, though, and see if it's optimized nicely, in which case I'll switch to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like doing it that way was 50-75% slower if the adding dictionary had 10x as many items. Next, trying it with a smaller ratio...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/Utilities/TaskItem.cs
Outdated
else | ||
{ | ||
copiedMetadata = destinationAsTaskItem.Metadata.Clone(); | ||
copiedMetadata.SetItems(Metadata.Where(entry => !destinationAsTaskItem.Metadata.ContainsKey(entry.Key))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original non-optimized code down in the else
branch uses string.IsNullOrEmpty(value)
to decide if the metadatum should be overwritten. It doesn't seem to match your new logic here - you only check if the key exists in the metadata dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
string meta = item.GetMetadata(ItemMetadataNames.winmdImplmentationFile); | ||
if (!String.IsNullOrEmpty(meta)) | ||
{ | ||
removedMetadata.Add(ItemMetadataNames.winmdImplmentationFile, meta); | ||
} | ||
item.RemoveMetadata(ItemMetadataNames.winmdImplmentationFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Refactor these statements into a local function to be called 4 times?
/// </summary> | ||
public void SetItems(IEnumerable<KeyValuePair<string, V>> items) | ||
{ | ||
_backing = _backing.SetItems(items); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetItems
and Where
will also need to be implemented on the .NET 3.5 version of ImmutableDictionary
in C:\src\msbuild\src\MSBuildTaskHost\Immutable\ImmutableDictionary.cs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Why isn't Where
implemented by LINQ as it is for most IEnumerables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course! I stand corrected, only SetItems
is missing.
Co-authored-by: Rainer Sigwald <raines@microsoft.com>
@@ -2818,6 +2747,61 @@ private ITaskItem SetItemMetadata(List<ITaskItem> relatedItems, List<ITaskItem> | |||
} | |||
} | |||
|
|||
if (reference.IsWinMDFile) | |||
{ | |||
// The ImplementationAssembly is only set if the implementation file exits on disk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Typo exits
.
src/Utilities/TaskItem.cs
Outdated
copiedMetadata = _metadata.Clone(); // Copy on write! | ||
copiedMetadata.SetItems(destinationAsTaskItem.Metadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is doing the same thing as the else
right below. I would expect the !IsNullOrEmpty
check here as well.
The else below:
- Take all of destination item's metadata.
- Overwrite with this item's metadata that are missing or empty on the destination item.
This block:
- Take all of this item's metadata.
- Overwrite with all of destination item's metadata.
For a missing destination metadatum this behaves the same, but for an empty one the SetItem
call in this block will erroneously overwrite it with the empty value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right. I'm wondering if we're going at this the wrong way, though, since that would mean we'd need another Where
statement to filter out non-empty metadata, which would make this optimization noticeably less of an optimization. Empty metadata should be equivalent (in all cases) to missing metadata, I think. If that's true, it would probably be more efficient to maintain an invariant that TaskItems don't have empty metadata. The COW dictionary storing the metadata is private, so that isn't a problem, but it's theoretically a breaking change unless we keep a separate list for "metadata added as empty" for the unfortunately public MetadataNames (and MetadataCount). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't go as far as to introduce a separate list for empty metadata unless it proves impactful. Can you measure how much this optimization is gaining us with and without handling empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I measured 3 scenarios (building from a clean state (I appended /restore to the command line for this one), building with caches but not processes to connect to, and building with both) for 3 versions of TaskItem (as it is in master, with this change plus a Where statement on the else if branch, and with all the empty metadata stored in a HashSet once each for building OrchardCore. Note that I kept the other optimizations in this PR for all three versions of TaskItem. I used my computer instead of a VM, mostly because my VM seems to be slow...but I also tried to avoid doing anything too intense in the background. Note that I forgot to use -m
for the first several tests, so I continued to not use it for the last few. If you want new results with having that enabled for all of them, I can do that.
I have full results saved away if you want to look. My first takeaway was that caches really matter. It took more than twice as long to build in every case if I deleted them and restored them. In contrast, presumably because OrchardCore is big, and I neglected to enable -m, having processes already running only saved about 170ms on average.
As far as differences between the versions of TaskItem, here are some times for the entire build in ms:
MSBuild execution times | Clean build | Process not running | Process running |
---|---|---|---|
Master | 108475 | 49657 | 49704 |
Where statements | 108253 | 48140 | 47949 |
Empty HashSet | 105464 | 49105 | 48708 |
Since it occurred to me that this has impacts outside RAR, I don't think just looking at RAR time makes sense. Are there other data you'd like to see? I'm not sure why the Where statements was so much worse than keeping an empty list for the clean state build—it's possible that was an outlier, since I only ran each of these once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I was actually expecting you to measure this particular aspect with a microbenchmark. Basically, see if the SetItems
call is helping (even with the Where
) given the expected metadata count. I second Ben's question below about what the realistic number of metadata is. Maybe you could add a logging statement here while building OrchardCore and see where the breaking point is relative to the distribution of metadata count seen during the build. (Assuming there is a breaking point and the new code is actually a bit slower if the number of metadata is small.)
Measuring the entire build end-to-end would have to be repeated several times to get trustworthy results. As you wrote, it could have been an outlier and, honestly, this optimization is likely going to be well within noise when running the entire thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Merging with @benvillalobos's comment) I'm currently a bit confused by the results I got when I tried to test this. I put in a simple check of how many custom metadata each taskitem has when its metadata is copied over, as well as the amount of metadata on the destination. The answer seemed to be that the destination never had any custom metadata for either MSBuild.sln or OrchardCore.sln, in which case we would never hit the case I optimized, and the time would be almost identical between them unless the other changes I made (which could have a minor effect, but I didn't expect to be really relevant) had much more of a positive effect than I'd expected. The same test suggested that the source TaskItems had 20-50 pieces of metadata, of which 15 were not custom metadata, so these wouldn't be huge numbers, but this operation also got called a lot, which might mean that a small optimization could have somewhat noticeable effects. What do you make of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good argument against pursuing this optimization further. If you really wanted to be sure, you could run the build under a profiler to see how much time we're spending in this function - which would be the upper limit of what any micro-optimization here could give us.
{ | ||
copiedMetadata = _metadata.Clone(); // Copy on write! | ||
} | ||
else if (destinationAsTaskItem.Metadata.Count < _metadata.Count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How likely are we to hit the case where destinationAsTaskItem.Metadata.Count < _metadata.Count
and both metadata count are very large? I don't think I've seen items with a standout number of metadata, but MSBuild has surprised me before.
Do we happen to have any idea of a "realistic average" number of metadata per TaskItem that comes through here? Something more detailed than just assuming it's the number of well-known metadata?
I added one more commit that should make it do exactly what the previous code path had done in all cases. Rather than fully reverting that part, I left it in, although I don't expect that to be a perf optimization anymore, just a potential improvement for some customers. If you want me to fully revert it and just leave the other part, I can, but it feels wrong to undo something that could theoretically improve perf if I don't think it could worsen perf. |
src/Utilities/TaskItem.cs
Outdated
else | ||
{ | ||
copiedMetadata = destinationAsTaskItem.Metadata.Clone(); | ||
copiedMetadata.SetItems(Metadata.Where(entry => !destinationAsTaskItem.Metadata.TryGetValue(entry.Key, out string val) || String.IsNullOrEmpty(val))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super-nit: Directly read this._metadata
instead of calling the this.Metadata
getter here for consistency with the if
branches?
A TaskItem in AppDomain A is identical to a TaskItem in AppDomain B, but the TaskItems cannot access each other's fields, only their properties. This changes accessing a possibly remote TaskItem's _metadata field into accessing its Metadata property.
won't have the opportunity to re-review
This change should not alter behavior other than changing casing on metadata (that should be case-insensitive), but it includes a series of changes (see individual commits) that simplify the code, correct typos, or prevent MSBuild from having to do unnecessary work. (Note that this means that most of the commits are expected to have no relevant impact on perf despite the title of this PR.) From building OrchardCore once with and once without this change, RAR's execution time declined by about 2.5% when there were no caches. This part of RAR does not rely on the caches, so this gain will not be wiped out by including caches.