Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SSP restoring in edge cases #104820

Merged
merged 5 commits into from
Jul 17, 2024

Conversation

janvorli
Copy link
Member

There are edge cases when the SSP restoring for continuation after a catch handler completes doesn't work correctly. The problem is caused by the fact that we scan for the Rip of the frame handling the exception on the shadow stack to find where to restore it, and in those edge cases, the same address can be there multiple times and the first occurence is not the right one. For example, when an exception is thrown from a catch handler, it escapes the handler and the handler for the escaped exception is in the same method as the one that invoked the handler.

This change fixes it by finding the SSP of the first managed frame where we search for the handler and then updating the SSP with every unwind. The SSP is stored in the REGDISPLAY. So when we reach the CallCatchFunclet, the REGDISPLAY contains the SSP to restore.

There was also one more issue with restoring the SSP. I turned out that the incsspq instruction uses only the lowest 8 bits of the argument to increment the SSP, so the ClrRestoreNonVolatileContextWorker needs to have a loop that repeats that instruction in case we need to move it by more than 255 slots.

There are edge cases when the SSP restoring for continuation after a
catch handler completes doesn't work correctly. The problem is caused by
the fact that we scan for the Rip of the frame handling the exception on
the shadow stack to find where to restore it, and in those edge cases,
the same address can be there multiple times and the first occurence is
not the right one. For example, when an exception is thrown from a catch
handler, it escapes the handler and the handler for the escaped
exception is in the same method as the one that invoked the handler.

This change fixes it by finding the SSP of the first managed frame where
we search for the handler and then updating the SSP with every unwind.
The SSP is stored in the REGDISPLAY. So when we reach the
CallCatchFunclet, the REGDISPLAY contains the SSP to restore.

There was also one more issue with restoring the SSP. I turned out that
the incsspq instruction uses only the lowest 8 bits of the argument to
increment the SSP, so the ClrRestoreNonVolatileContextWorker needs to
have a loop that repeats that instruction in case we need to move it by
more than 255 slots.
@janvorli janvorli added this to the 9.0.0 milestone Jul 12, 2024
@janvorli janvorli requested review from jkotas and VSadov July 12, 2024 18:37
@janvorli janvorli self-assigned this Jul 12, 2024
// The float updating unwinds the stack so the pRD->pCurrentContext->Rip contains correct unwound Rip
// This is used for exception handling and the Rip extracted from m_pCallerReturnAddress is slightly
// off, which causes problem with searching for the return address on shadow stack on x64, so
// we keep the value from the unwind.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it a subtle bug that the Rip was off, or we just did not care about the correct IP in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually expected due to the way how the InlinedCallFrame is created in some cases. Neither the GC stack walk nor the new EH cared, but it caused problem with the newly added search for the first managed frame, as it sometimes is one that called the pinvoke (QCALL).

@jkotas
Copy link
Member

jkotas commented Jul 12, 2024

Do we have test coverage for these corner cases?

@janvorli
Copy link
Member Author

Do we have test coverage for these corner cases?

There is no way to really test those, as Windows would happily run and silently fixup things.

@VSadov
Copy link
Member

VSadov commented Jul 12, 2024

Do we have test coverage for these corner cases?

There is no way to really test those, as Windows would happily run and silently fixup things.

Also testing the topmost SSP against what we can figure from SP is rarely helpful either. The top item is often correct, while we did not pop the things properly and the SSP keeps growing, for example.

There is some sensitivity to wrong SSP in hijacking as OS unhijacks using SSP and we use our own stashed value (but we assert that it works the same), so incorrect SSP may lead to asserts. However observing this requires a fairly tight race with suspension.
I wonder if running GC stress with CET enabled would trip on this more often. Although it would still be mostly about the top of the stack, which could be correct while the rest is not.

@VSadov
Copy link
Member

VSadov commented Jul 12, 2024

Maybe there is a way to disable the OS fixup behavior, but then, I'd not be surprised something beyond our control requires it and nothing will run in such mode, even if it was possible to set up.

Update_Loop:
cmp r11, rax
cmovb rax, r11
incsspq rax
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed documented as only bits 0-7 are used, regardless of operand size.

Perhaps there is a way to save a byte of assembly by doing INCSSPD eax :-)

Copy link
Member Author

@janvorli janvorli Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The incsspd increments the SSP by multiples of 4, not by 8 :-)

Copy link
Member

@VSadov VSadov Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also make it use bits 0-3, because why not ... :-)

@janvorli
Copy link
Member Author

Maybe there is a way to disable the OS fixup behavior

There is not, I've explicitly asked Windows folks about it.

Copy link
Member

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@jkotas
Copy link
Member

jkotas commented Jul 13, 2024

Is it possible for the shadow stack to overflow before this fix - can we use that to build regression test? Or can the effects of this fix be observed by better performance?

If there is really no way to observe the effects of this fix, I am wondering why it is needed. If it is the case, can we simplify things by leaving it to the OS to fix things up as necessary?

@VSadov
Copy link
Member

VSadov commented Jul 13, 2024

Is it possible for the shadow stack to overflow before this fix

yes, if we pop SSP to the fist expected IP match, but we need to pop to some other match, the overflow could be the ultimate result. After enough iterations. It could take many iterations though and very special crafted repro.

@jkotas
Copy link
Member

jkotas commented Jul 13, 2024

We should create the special repro. I would expect that it should be a viable outer loop test at least.

@VSadov
Copy link
Member

VSadov commented Jul 13, 2024

It is possible that we already have tests when ssp grows, but we do not see that since the tests do not run long enough.

I thought about some kind of asserting that shadow stack is roughly the same depth as the regular (i.e. like not twice deep), and use in some random places, but not sure how that can be done in practice, considering native frames on stack.

@janvorli
Copy link
Member Author

I will try to create a repro that would result in stack overflow without this change.

@janvorli
Copy link
Member Author

@jkotas I have added a test that fails with (shadow) stack overflow before this fix and passes after.

@jkotas
Copy link
Member

jkotas commented Jul 16, 2024

(shadow) stack overflow

Just curious - is there a special status code for shadow stack overflow or does it get reported as regular stack overflow?

@janvorli
Copy link
Member Author

Just curious - is there a special status code for shadow stack overflow or does it get reported as regular stack overflow?

It gets reported as regular stack overflow.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for creating the test!

@VSadov
Copy link
Member

VSadov commented Jul 16, 2024

BTW, The test also demonstrates that the fixup performed by Windows is not always correct.

It is a good and perhaps the only possible backward-compat heuristic - pop to the nearest shadow stack item that matches the return site. However, the site may be recorded on the shadow stack multiple times and the code may be returning not to the lowermost occurrence.

@janvorli
Copy link
Member Author

BTW, The test also demonstrates that the fixup performed by Windows is not always correct.

I think that it will eventually fix itself after a couple of returns - one return might pick a wrong location, but then a next return will be wrong again, so yet another fixup will be made that finally corrects things.

@janvorli janvorli merged commit 4f38f92 into dotnet:main Jul 17, 2024
91 of 95 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Aug 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants