Skip to content

Fix performance issues in Add-Type #5243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 31, 2017

Conversation

iSazonov
Copy link
Collaborator

Related #5158.

Fix description

Commit 1. Minor optimizations in OutputError().

Commit 2. Before the fix we read sources files twice in base.EndProcessing() and in this.EndProcessing() - it is excluded. Also now we read source files in StringBuilder to exclude large reallocations. Added one test.

Additional considerations

I expect the fix remove performance issues reported in #5158. So we can close the Issue by the PR and open new Issue to discuss refactoring the code to use CompileAssemblyFromFile(). Using Roslyn can open paths to enhance the Add-Type cmdlet's capabilities but requires a lot of work.

@iSazonov iSazonov force-pushed the exclude-readalltext branch from fd56efe to db1d42d Compare October 26, 2017 14:57
{
this.sourceCode = "";
StringBuilder sb = new StringBuilder(paths.Length);
sourceCode = "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sourceCode doesn't need to be initialized since it is always assigned after loop. Also we normally name class fields with beginning underscore: "_sourceCode".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove the line.

sourceCode is used in many places in the file. If I rename it will be very noisy and unrelated with the performance fix - I think it is better to leave it as is.

foreach (string file in paths)
{
this.sourceCode += System.IO.File.ReadAllText(file) + "\n";
sb.Append(System.IO.File.ReadAllText(file));
Copy link
Member

@daxian-dbw daxian-dbw Oct 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some feedback:

  1. ReadAllText returns a large string in the end, which may be allocated in LOH if the file size is larger than 85kb. The allocation of this large string will be wasted since we are appending it to the StringBuilder.
    If using ReadAllLines, then only small strings will be created, which can be quickly garbage collected. So it probably would be better if we go with
foreach (string file in paths)
{
    foreach (string line in File.ReadAllLines(file))
    {
        sb.AppendLine(line);
    }
}
sourceCode = sb.ToString();
  1. When there is only one file specified, which might be the case for the most time, this is still not performant. It could be better if we can treat the one-file case specially. Like:
if (paths.Length == 1)
{
    sourceCode = File.ReadAllText(file);
}
else
{
    var sb = new StringBuilder(paths.Length);
    foreach (string file in paths)
    {
        foreach (string line in File.ReadAllLines(file))
        {
            sb.AppendLine(line);
        }
    }
    sourceCode = sb.ToString();
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts was that we should get CompileAssemblyFromFile and there is no point in doing a lot of optimization. But we can get it for a while, so I'll add the optimization. Only it will work for small files - StringBuilder.MaxCapacity is int.MaxValue.

Copy link
Member

@daxian-dbw daxian-dbw Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your comment about CompileAssemblyFromFile (is it a class?). Could you please elaborate it a bit more? I quickly searched about CSharpSyntaxTree.ParseText, and didn't find a good way to make it consume multiple files all together directly from files. That API takes a SourceText type, which has a static method From that takes a file path, but not multiple files. But I'm sure there should be ways to do it, otherwise, how would a compiler work 😄.

For some background, the perf issue of Add-Type was reported by Azure Profiler team. They find this exact code causes a lot LOH allocations in some services and if we treat the path.length == 1 case specially, we can save, as quoted, "5.39 million samples here, or around 1% of total PowerShell.exe CPU cost". It was about the windows powershell, but this piece of code is same for powershell core.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't find a good way to make it consume multiple files all together directly from files

I didn't find too. So I decided to postpone this to a later PR.
Really CompileAssemblyFromFile is in CodeDom. There isn't such method In CodeAnalysis.
I found using SourceText.From.
Maybe @lzybkr know a right way.

{
this.sourceCode = "";
StringBuilder sb = new StringBuilder(paths.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use paths.Length as the initial capacity is not very helpful. Maybe use a larger estimate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops 😄


foreach (string file in paths)
{
FileInfo f = new FileInfo(file);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what the cost is for obtaining file information. I assume the file doesn't need to be opened? Is the cost a reasonable trade off to StringBuilder automatic allocation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find out how it was implemented in CoreFX. 😕
In any case, if we want to handle a lot of files, we need to refactor the cmdlet so that we don't read all the files in one string, but compile file by file. I plan to do it later

{
StringBuilder sb = new StringBuilder((int)initLength);

foreach (string file in paths)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment that this pattern is used to minimize potential temporary LOH allocations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added.

}
else
{
sourceCode = "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my understanding that ReadAllText() uses StringBuilder internally. Do we know that it correctly handles strings larger than MaxInt characters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. I removed the code.

{
FileInfo f = new FileInfo(file);
initLength += f.Length;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calculate the total size might not be worth it. How about we just use a relatively large initial length? Say 8192 (8k)?
To be honest, I think if a user feed a file with more than 85k in size to Add-Type, they are using it wrongly (that Azure service might use Add-Type wrongly).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you ask Azure profile team about this? I guess they can have real things.

Copy link
Member

@daxian-dbw daxian-dbw Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean asking about the azure service might misuse Add-Type? @PaulHigin already replied them and brought it up to them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant what is typical file size they see in their traces?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean the initial capacity for StringBuilder, I think there is no accurate answer for it as it's probably a case by case basis. The default initial capacity is 16 bytes, so I think as long as we are using a relatively large number, it should be OK. (it's micro optimization that may be premature anyway)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike always pre-allocating StringBuilder to a large size. It may be over allocated for the majority of cases. But if we do this we should definitely ensure we pre-allocate at a size less than a LOH block. I like the idea of pre-allocating intelligently but this means computing the file sizes and I am concerned that may have a high cost (I seem to remember a .Net bug where FileInfo resulted in opening files on some platforms).

I am inclined to just leave it at the .Net default allocation. To me the big perf win was removing the duplicate file read / string allocation in the base class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Net default (16 bytes) is very small for file size. I think I agree that 8k is best in the case. Also we plan remove the code at all in follow PR so I ready put any value in the time :-)

{
this.sourceCode = "";
foreach (string file in paths)
if (paths.Length == 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I don't think we need to special case this for a single file. My understanding is that ReadAllText() uses StringBuilder internally in a similar way so there is no gain in using it over our StringBuilder use.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We plan remove the code at all in follow PR. We can keep it as is today, but I'll have that thought in follow PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that ReadAllText() uses StringBuilder internally in a similar way so there is no gain in using it over our StringBuilder use.

The gain we get in specializing paths.length == 1 is to save a big string list that holds all lines and the big string array that is returned. See https://referencesource.microsoft.com/#mscorlib/system/io/file.cs,1018

After inspecting the API code, I start to suspect whether the ReadAllLines() API will actually save us from the LOH allocation because of the string list and the string array it converts to. It may double the LOH allocation ... Anyways, it's good that @iSazonov is going to eliminate the "reading text from file" code and instead try to compile the files directly.

@SteveL-MSFT SteveL-MSFT added the Backport-5.1-Consider Consider to backport to Windows PowerShell 5.1 due to impact label Oct 28, 2017
@iSazonov iSazonov force-pushed the exclude-readalltext branch from fc1a6d9 to 8d8f6c2 Compare October 30, 2017 07:16
@iSazonov
Copy link
Collaborator Author

@PaulHigin @daxian-dbw Thanks for review! It seems I found how compile files - I'll test and come back with new PR in days so you will have a choice that back port to PS 5.1.

@iSazonov iSazonov merged commit 1c446c1 into PowerShell:master Oct 31, 2017
@iSazonov iSazonov deleted the exclude-readalltext branch October 31, 2017 15:00
@daxian-dbw
Copy link
Member

daxian-dbw commented Oct 31, 2017

@iSazonov Thanks for spending effort redesign the code, looking forward to the new PR!
As for porting to windows, one thing we all need to know is that in Windows powershell, add-type goes through completely different code path (it uses CodeDom in full .NET). So your new PR most likely won't be ported back, but this micro-optimization change may be ported back.

@PaulHigin
Copy link
Contributor

@daxian-dbw, @iSazonov I just noticed that the AddTypeCommandBase abstract class is public. So removing the duplicate sourceCode read from there could potentially be a breaking change if someone was relying on that functionality. I feel it is an acceptable risk for Core 6 since it is unlikely anyone has a dependency on it. But for Windows (back porting) it could adversely affect a customer.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Nov 1, 2017

Good catch! We can avoid breaking change if we move AddTypeCommandBase.EndProcessing code in AddTypeCommand.EndProcessing. Makes a new PR?

@PaulHigin
Copy link
Contributor

@iSazonov Yes, moving the EndProcessing code into the base class seems like the right thing to do.

@daxian-dbw daxian-dbw added the Breaking-Change breaking change that may affect users label Nov 3, 2017
@SteveL-MSFT SteveL-MSFT removed the Backport-5.1-Consider Consider to backport to Windows PowerShell 5.1 due to impact label Feb 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Breaking-Change breaking change that may affect users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants