-
Notifications
You must be signed in to change notification settings - Fork 7.6k
Fix performance issues in Add-Type #5243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
02a2a1d
to
fd56efe
Compare
fd56efe
to
db1d42d
Compare
{ | ||
this.sourceCode = ""; | ||
StringBuilder sb = new StringBuilder(paths.Length); | ||
sourceCode = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sourceCode doesn't need to be initialized since it is always assigned after loop. Also we normally name class fields with beginning underscore: "_sourceCode".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove the line.
sourceCode
is used in many places in the file. If I rename it will be very noisy and unrelated with the performance fix - I think it is better to leave it as is.
foreach (string file in paths) | ||
{ | ||
this.sourceCode += System.IO.File.ReadAllText(file) + "\n"; | ||
sb.Append(System.IO.File.ReadAllText(file)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some feedback:
ReadAllText
returns a large string in the end, which may be allocated in LOH if the file size is larger than 85kb. The allocation of this large string will be wasted since we are appending it to the StringBuilder.
If usingReadAllLines
, then only small strings will be created, which can be quickly garbage collected. So it probably would be better if we go with
foreach (string file in paths)
{
foreach (string line in File.ReadAllLines(file))
{
sb.AppendLine(line);
}
}
sourceCode = sb.ToString();
- When there is only one file specified, which might be the case for the most time, this is still not performant. It could be better if we can treat the one-file case specially. Like:
if (paths.Length == 1)
{
sourceCode = File.ReadAllText(file);
}
else
{
var sb = new StringBuilder(paths.Length);
foreach (string file in paths)
{
foreach (string line in File.ReadAllLines(file))
{
sb.AppendLine(line);
}
}
sourceCode = sb.ToString();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thoughts was that we should get CompileAssemblyFromFile
and there is no point in doing a lot of optimization. But we can get it for a while, so I'll add the optimization. Only it will work for small files - StringBuilder.MaxCapacity
is int.MaxValue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand your comment about CompileAssemblyFromFile
(is it a class?). Could you please elaborate it a bit more? I quickly searched about CSharpSyntaxTree.ParseText
, and didn't find a good way to make it consume multiple files all together directly from files. That API takes a SourceText
type, which has a static method From
that takes a file path, but not multiple files. But I'm sure there should be ways to do it, otherwise, how would a compiler work 😄.
For some background, the perf issue of Add-Type was reported by Azure Profiler team. They find this exact code causes a lot LOH allocations in some services and if we treat the path.length == 1
case specially, we can save, as quoted, "5.39 million samples here, or around 1% of total PowerShell.exe CPU cost". It was about the windows powershell, but this piece of code is same for powershell core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't find a good way to make it consume multiple files all together directly from files
I didn't find too. So I decided to postpone this to a later PR.
Really CompileAssemblyFromFile
is in CodeDom. There isn't such method In CodeAnalysis.
I found using SourceText.From
.
Maybe @lzybkr know a right way.
{ | ||
this.sourceCode = ""; | ||
StringBuilder sb = new StringBuilder(paths.Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use paths.Length
as the initial capacity is not very helpful. Maybe use a larger estimate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops 😄
|
||
foreach (string file in paths) | ||
{ | ||
FileInfo f = new FileInfo(file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what the cost is for obtaining file information. I assume the file doesn't need to be opened? Is the cost a reasonable trade off to StringBuilder automatic allocation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find out how it was implemented in CoreFX. 😕
In any case, if we want to handle a lot of files, we need to refactor the cmdlet so that we don't read all the files in one string, but compile file by file. I plan to do it later
{ | ||
StringBuilder sb = new StringBuilder((int)initLength); | ||
|
||
foreach (string file in paths) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment that this pattern is used to minimize potential temporary LOH allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added.
} | ||
else | ||
{ | ||
sourceCode = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was my understanding that ReadAllText() uses StringBuilder internally. Do we know that it correctly handles strings larger than MaxInt characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. I removed the code.
{ | ||
FileInfo f = new FileInfo(file); | ||
initLength += f.Length; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calculate the total size might not be worth it. How about we just use a relatively large initial length? Say 8192 (8k)?
To be honest, I think if a user feed a file with more than 85k in size to Add-Type, they are using it wrongly (that Azure service might use Add-Type
wrongly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you ask Azure profile team about this? I guess they can have real things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean asking about the azure service might misuse Add-Type? @PaulHigin already replied them and brought it up to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant what is typical file size they see in their traces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mean the initial capacity for StringBuilder, I think there is no accurate answer for it as it's probably a case by case basis. The default initial capacity is 16 bytes, so I think as long as we are using a relatively large number, it should be OK. (it's micro optimization that may be premature anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike always pre-allocating StringBuilder to a large size. It may be over allocated for the majority of cases. But if we do this we should definitely ensure we pre-allocate at a size less than a LOH block. I like the idea of pre-allocating intelligently but this means computing the file sizes and I am concerned that may have a high cost (I seem to remember a .Net bug where FileInfo resulted in opening files on some platforms).
I am inclined to just leave it at the .Net default allocation. To me the big perf win was removing the duplicate file read / string allocation in the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.Net default (16 bytes) is very small for file size. I think I agree that 8k is best in the case. Also we plan remove the code at all in follow PR so I ready put any value in the time :-)
{ | ||
this.sourceCode = ""; | ||
foreach (string file in paths) | ||
if (paths.Length == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I don't think we need to special case this for a single file. My understanding is that ReadAllText() uses StringBuilder internally in a similar way so there is no gain in using it over our StringBuilder use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We plan remove the code at all in follow PR. We can keep it as is today, but I'll have that thought in follow PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that ReadAllText() uses StringBuilder internally in a similar way so there is no gain in using it over our StringBuilder use.
The gain we get in specializing paths.length == 1
is to save a big string list that holds all lines and the big string array that is returned. See https://referencesource.microsoft.com/#mscorlib/system/io/file.cs,1018
After inspecting the API code, I start to suspect whether the ReadAllLines()
API will actually save us from the LOH allocation because of the string list and the string array it converts to. It may double the LOH allocation ... Anyways, it's good that @iSazonov is going to eliminate the "reading text from file" code and instead try to compile the files directly.
fc1a6d9
to
8d8f6c2
Compare
@PaulHigin @daxian-dbw Thanks for review! It seems I found how compile files - I'll test and come back with new PR in days so you will have a choice that back port to PS 5.1. |
@iSazonov Thanks for spending effort redesign the code, looking forward to the new PR! |
@daxian-dbw, @iSazonov I just noticed that the AddTypeCommandBase abstract class is public. So removing the duplicate sourceCode read from there could potentially be a breaking change if someone was relying on that functionality. I feel it is an acceptable risk for Core 6 since it is unlikely anyone has a dependency on it. But for Windows (back porting) it could adversely affect a customer. |
Good catch! We can avoid breaking change if we move AddTypeCommandBase.EndProcessing code in AddTypeCommand.EndProcessing. Makes a new PR? |
@iSazonov Yes, moving the EndProcessing code into the base class seems like the right thing to do. |
Related #5158.
Fix description
Commit 1. Minor optimizations in
OutputError()
.Commit 2. Before the fix we read sources files twice in base.EndProcessing() and in this.EndProcessing() - it is excluded. Also now we read source files in StringBuilder to exclude large reallocations. Added one test.
Additional considerations
I expect the fix remove performance issues reported in #5158. So we can close the Issue by the PR and open new Issue to discuss refactoring the code to use
CompileAssemblyFromFile()
. Using Roslyn can open paths to enhance the Add-Type cmdlet's capabilities but requires a lot of work.