Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch Windows Checked build to -O2 from -O1 #59235

Merged

Conversation

BruceForstall
Copy link
Member

That is, optimize for speed, not for size.

@ghost
Copy link

ghost commented Sep 16, 2021

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

Issue Details

That is, optimize for speed, not for size.

Author: BruceForstall
Assignees: -
Labels:

area-Infrastructure-coreclr

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Sep 17, 2021

Does it make superpmi faster?

@BruceForstall
Copy link
Member Author

Does it make superpmi faster?

When I first pushed this, with #53888, I measured improvements: #53849.

I didn't make any changes to the CLR to fix the problems we saw then, but I just wanted to double-check. Maybe I'll have time to investigate sometime...

@BruceForstall BruceForstall force-pushed the SwitchCheckedWindowsBuildToO2 branch from 017370e to 134704d Compare October 1, 2021 00:01
@dotnet dotnet deleted a comment from azure-pipelines bot Oct 1, 2021
@BruceForstall
Copy link
Member Author

For win-x64 Checked build, I measure a code size increase of:

clrjit.dll: 3,620,864 => 4,058,112
coreclr.dll: 13,232,640 => 14,578,176

For runtime, superpmi replay (which is almost 100% JIT time) is 11% faster with -O2 across our current collections.

@BruceForstall BruceForstall force-pushed the SwitchCheckedWindowsBuildToO2 branch from 134704d to 64dd823 Compare October 4, 2021 16:27
@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress

@dotnet dotnet deleted a comment from azure-pipelines bot Oct 4, 2021
@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

BruceForstall added a commit to BruceForstall/runtime that referenced this pull request Oct 6, 2021
The win-x64 ABI requires allocating 32 bytes of outgoing argument
stack space for every call, which can be used by callees. In the case
where there are no calls in a function, there is no need to allocate
that space. The JIT computes the required argument space during
`fgSimpleLowering` by processing the IR and looking for call instructions,
but also accounting for a few special cases where calls are not visible
in the IR. One case was missing, leading to a zero-sized outgoing argument
space even though a call was still generated. This can cause corruption in the
caller stack frame. The case is where there are no explicit calls but the
JIT is going to generate a PInvoke frame initialization helper call. This
shouldn't happen, however, it occurs because the JIT eliminates a PInvoke
that is statically dead code, but still generates the prolog setup code.

A general solution would be to eliminate the PInvoke setup code if there
are no remaining PInvokes in the function. The simple change made here is
to ensure there is sufficient outgoing argument space for the PInvoke helper
call if that will be generated.

This was found by seeing test crashes due to stack corruption when
building the CLR with the VC++ compiler `-O2` optimization switch, which
stored data in the incoming argument stack space of the PInvoke helper function
(with PR dotnet#59235).

There are 5 SPMI diffs: 2 in coreclr tests (the ones that failed at runtime),
2 in libraries tests, and 1 in System.Net.Http.WinHttpChannelBinding:.ctor(SafeWinHttpHandle):this
@BruceForstall BruceForstall force-pushed the SwitchCheckedWindowsBuildToO2 branch from 64dd823 to 983194e Compare October 6, 2021 04:30
@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

BruceForstall added a commit that referenced this pull request Oct 6, 2021
…per (#60050)

The win-x64 ABI requires allocating 32 bytes of outgoing argument
stack space for every call, which can be used by callees. In the case
where there are no calls in a function, there is no need to allocate
that space. The JIT computes the required argument space during
`fgSimpleLowering` by processing the IR and looking for call instructions,
but also accounting for a few special cases where calls are not visible
in the IR. One case was missing, leading to a zero-sized outgoing argument
space even though a call was still generated. This can cause corruption in the
caller stack frame. The case is where there are no explicit calls but the
JIT is going to generate a PInvoke frame initialization helper call. This
shouldn't happen, however, it occurs because the JIT eliminates a PInvoke
that is statically dead code, but still generates the prolog setup code.

A general solution would be to eliminate the PInvoke setup code if there
are no remaining PInvokes in the function. The simple change made here is
to ensure there is sufficient outgoing argument space for the PInvoke helper
call if that will be generated.

This was found by seeing test crashes due to stack corruption when
building the CLR with the VC++ compiler `-O2` optimization switch, which
stored data in the incoming argument stack space of the PInvoke helper function
(with PR #59235).

There are 5 SPMI diffs: 2 in coreclr tests (the ones that failed at runtime),
2 in libraries tests, and 1 in System.Net.Http.WinHttpChannelBinding:.ctor(SafeWinHttpHandle):this
@BruceForstall BruceForstall marked this pull request as ready for review October 6, 2021 15:55
@BruceForstall
Copy link
Member Author

The OSX test failure is due to infra (out of disk space).

@BruceForstall
Copy link
Member Author

@janvorli @jkotas @dotnet/runtime-infrastructure This change is ready to go. Since there are problems with -O2 on x86 (#59845), I simply changed it to only use -O2 on non-x86 platforms.

Any comments?

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

That is, optimize for speed, not for size.

Don't do this for Windows x86, due to
dotnet#59845
@BruceForstall BruceForstall force-pushed the SwitchCheckedWindowsBuildToO2 branch from 983194e to a380ba7 Compare October 7, 2021 00:50
@BruceForstall
Copy link
Member Author

libraries failure is #60097

@BruceForstall BruceForstall merged commit f21a315 into dotnet:main Oct 7, 2021
@BruceForstall BruceForstall deleted the SwitchCheckedWindowsBuildToO2 branch October 7, 2021 05:15
@ghost ghost locked as resolved and limited conversation to collaborators Nov 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants