Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checks for missing SSA numbers #57274

Merged
merged 3 commits into from
Aug 17, 2021

Conversation

jakobbotsch
Copy link
Member

In several places there were cases where we were checking lvaInSsa for
the lcl num of a local tree node, but then proceeded to use the ssa
number directly after. It is possible for a local to be in SSA without
tree nodes themselves having SSA form built for them, for example if
unreachable code makes it to SSA building. This adds some additional
check for missing SSA numbers.

I'm not personally a big fan of not being able to rely on all blocks
having SSA form built, but these checks would for instance have
prevented the JIT crash in #57061. Also, if someone has ideas on a
different way to do this then please don't hold back. Maybe we should
add a BBF_SSA_BUILT instead, and not process basic blocks without this flag?
Or maybe even noway_assert this.

No SPMI diffs with this change.

PTAL @AndyAyersMS cc @dotnet/jit-contrib

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 12, 2021
Comment on lines +527 to +528
if (treeRhs->OperIsScalarLocal() && lvaInSsa(treeRhs->AsLclVarCommon()->GetLclNum()) &&
treeRhs->AsLclVarCommon()->HasSsaName())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for HasSsaName to return true yet lvaInSsa be false? Seems counterintuitive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but yes, it's possible. For promoted structs that aren't tracked we sometimes give them the same SSA number as their tracked field if LclVarDsc::CanBeReplacedWithItsField returns true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That interaction is probably something to consider refactoring in .NET 7.0. If it is intended then I doubt the compiler is handling it correctly in all places.

Copy link
Contributor

@SingleAccretion SingleAccretion Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. IIRC if it is the condition I am thinking about, it is the fix for the regressions that came up after we stopped lying about struct types.

Copy link
Contributor

@briansull briansull Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to lookup IIRC:

IIRC
ABBREVIATION
IIRC :: If I Recall Correctly.
"IIRC, that was in the seventies"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like #41242 is related to this.

@SingleAccretion
Copy link
Contributor

SingleAccretion commented Aug 12, 2021

Maybe we should add a BBF_SSA_BUILT instead, and not process basic blocks without this flag?

I am not knowledgeable in this area, what would be the downsides to this option? It seems appealing to just have the core optimizer loops change from using for (BasicBlock* block : Blocks()) to for (BasicBlock* block : BlocksInSsa()).

@jakobbotsch
Copy link
Member Author

I am not knowledgeable in this area, what would be the downsides to this option? It seems appealing to just have the core optimizer loops change from using for (BasicBlock* block : Blocks()) to for (BasicBlock* block : BlocksInSsa()).

It seems weird to me that we would ever have basic blocks that come through SSA without having anything built for them, because that means we are keeping around basic blocks that the SSA pass knows are unreachable. So maybe this implies that we should noway_assert that we visit all blocks in the SSA pass.

Of course it would still be possible that we add new blocks or trees after SSA where the checks in this PR would then help. But, I would prefer to have something that verifies that all blocks/IR are in expected format between each pass, but that's probably a bigger change.

@JulieLeeMSFT
Copy link
Member

@jakobbotsch is this a bug fix and do you want to take this to .NET 6?

Copy link
Contributor

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great improvement.

@@ -7334,7 +7334,7 @@ bool Compiler::optComputeLoopSideEffectsOfBlock(BasicBlock* blk)
if (rhsVN != ValueNumStore::NoVN)
{
rhsVN = vnStore->VNNormalValue(rhsVN);
if (lvaInSsa(lhsLcl->GetLclNum()))
if (lvaInSsa(lhsLcl->GetLclNum()) && lhsLcl->HasSsaName())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you review all of the places where we call GetSsaNum() and were missing the guard HasSsaName() ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly what I did. The most problematic cases were when we were calling GetPerSsaData without this guard, since that will index into bad memory.

@@ -2963,7 +2963,7 @@ GenTree* Compiler::optCopyAssertionProp(AssertionDsc* curAssertion,

// Extract the matching lclNum and ssaNum.
const unsigned copyLclNum = (op1.lcl.lclNum == lclNum) ? op2.lcl.lclNum : op1.lcl.lclNum;
unsigned copySsaNum = BAD_VAR_NUM;
unsigned copySsaNum = SsaConfig::RESERVED_SSA_NUM;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change here the minimal fix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SetLclNum call below resets the SSA number, so while this was clearly bogus, it did not cause any issues (because the order was SetSsaNum -> SetLclNum before). However, we only call this function during local assertion prop and swapping the order below did not have any diffs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual fix is one of the guards added in optComputeLoopSideEffectsOfBlock.

if (lclDefSsaNum != SsaConfig::RESERVED_SSA_NUM)
{
lvaTable[lclNum].GetPerSsaData(lclDefSsaNum)->m_vnPair = newLhsVNPair;
wasInSsa = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a fix here as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is another place where we could potentially call GetPerSsaData with a bad SSA num. But here it was a bit harder to get the control flow the same as before since the tree is not available when we call lvaInSsa above, so I had to introduce the variable.

In several places there were cases where we were checking lvaInSsa for
the lcl num of a local tree node, but then proceeded to use the ssa
number directly after. It is possible for a local to be in SSA without
tree nodes themselves having SSA form built for them, for example if
unreachable code makes it to SSA building. This adds some additional
check for missing SSA numbers.
@jakobbotsch
Copy link
Member Author

@JulieLeeMSFT Overnight I found a couple of x86 tests that are still failing after #57061 that this PR fixes, so I would like to have it for .NET 6.

@jakobbotsch
Copy link
Member Author

The test failures seem to be jobs that were killed (?) -- at least, the logs end very abruptly.
Another one seems to be failing elsewhere as well: https://runfo.azurewebsites.net/search/tests/?q=started%3A~7%20definition%3Aruntime%20name%3A%22system.diagnostics.tests.processmoduletests.longmodulefilenamesaresupported%22

Given that the CI was completely green before I added two tests (which are passing), I think that this one can be merged. I'll give it a few more hours to see if anyone else has feedback.

@jakobbotsch jakobbotsch merged commit 57839c4 into dotnet:main Aug 17, 2021
@jakobbotsch jakobbotsch deleted the harden-missing-ssa-nums branch August 17, 2021 17:48
@JulieLeeMSFT
Copy link
Member

JulieLeeMSFT commented Aug 17, 2021

Overnight I found a couple of x86 tests that are still failing after #57061 that this PR fixes, so I would like to have it for .NET 6.

Let's do that. If you can fix within a week, we might be able to backport to RC1. If not, we can do it for RC2.
[Edit] Ah, you already merged. Good that it met RC1 timeline.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants