Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

null-check the redirect context before using. #65910

Merged
merged 3 commits into from
Mar 1, 2022
Merged

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Feb 25, 2022

Fixes: #65890

In rare cases (typically on x86) the redirect context can be NULL. It is supposed to be preallocated before redirecting a thread, but we could be short on memory at the time when we tried to allocate.

The condition should not be fatal. It should just cause the current redirection attempt to fail and cause a retry.
On the next attempt we may be able to allocate or stop the thread via other mechanisms.
We used to indirectly rely on EEGetThreadContext to fail when the context is NULL. Now the first use is SetXStateFeaturesMask which crashes with NRE.

We should just check pCtx for NULL before using it.

}

// We may not have a preallocated context. Could be short on memory when we tried to preallocate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crash was hit in the CI by a lot of tests. It sounds very unlikely we would be running out of memory that often.

Isn't the actual problem a race condition somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern as well. The situation should not be fatal, but should not be common either. These tests do not look like something that should run close to OOM.
I am not sure if this was always happening and handled or it is something new.

I could not reproduce this on purpose. I've spent quite some time trying. Maybe having fewer resources on lab machines is a factor.
I think the fix makes sense one way or another, but why we need it more than I'd expect is still a question.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unable to reproduce this so far. Did we see this on anything other than Windows7/x86?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have seen it on Win7/x86 only

Copy link
Member Author

@VSadov VSadov Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call InitializeContext with 0 for context buffer size to get the actual buffer size. The expected result is to get ERROR_INSUFFICIENT_BUFFER and to get the size required.
Rarely, for unknown reasons we get a different error. We have disabled an assert because of this.

// The following assert is valid, but gets triggered in some Win7 runs with no impact on functionality.

I wonder if in such case we also get invaid size (like 0) and thus the allocation fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec apparently mentions that:

 If the function fails with an error other than ERROR_INSUFFICIENT_BUFFER, the contents of ContextLength are undefined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just do not know what can possibly go wrong with a call like InitializeContext(NULL, context, NULL, &contextSize); - that is the same use as in the example.

Copy link
Member Author

@VSadov VSadov Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not even try to allocate when InitializeContext returns unexpected results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps that is the root cause, but the reason why the call may fail on Win7/x86 is still unclear.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@VSadov
Copy link
Member Author

VSadov commented Mar 1, 2022

Thanks!!

@VSadov VSadov merged commit 5130645 into dotnet:main Mar 1, 2022
@VSadov VSadov deleted the ctxNull branch March 1, 2022 06:51
VSadov added a commit to VSadov/runtime that referenced this pull request Mar 2, 2022
* null-check the redirect context before using.

* tweak the comment

* do not allocate context if InitializeContext has unexpected results.
agocke pushed a commit to agocke/runtime that referenced this pull request Mar 8, 2022
* null-check the redirect context before using.

* tweak the comment

* do not allocate context if InitializeContext has unexpected results.
carlossanlop pushed a commit that referenced this pull request Mar 8, 2022
…#66120)

* Use CopyContext to restore saved context on X86 (#65490)

* Use CopyContext to restore saved context on X86

* PR feedback

* more PR feedback

* Do not copy XState other than AVX when redirecting for GC stress (#65825)

* Do not copy XState other than AVX

* #if defined(TARGET_X86) || defined(TARGET_AMD64)

* mask XState unconditionally

* Ensure XSTATE_MASK_AVX is set before calling EEGetThreadContext

* redundant supportsAVX, more clear comment

* PR feedback

* null-check the redirect context before using. (#65910)

* null-check the redirect context before using.

* tweak the comment

* do not allocate context if InitializeContext has unexpected results.

* EE Suspension on x86 should use RtlRestoreContext when available (#65878)

* RestoreContextSimulated

* probe for RtlRestoreContext

* ntdll.dll

* restore self-trap sequence

* PR feedback

* Clarify CopyContext in RedirectedHandledJITCaseExceptionFilter

* simpler indentation.

* restore last error on the legacy path.

* Update src/coreclr/vm/threads.h

Co-authored-by: Dan Moseley <danmose@microsoft.com>

Co-authored-by: Dan Moseley <danmose@microsoft.com>
carlossanlop pushed a commit that referenced this pull request Mar 15, 2022
* Use CopyContext to restore saved context on X86 (#65490)

* Use CopyContext to restore saved context on X86

* PR feedback

* more PR feedback

* Do not copy XState other than AVX when redirecting for GC stress (#65825)

* Do not copy XState other than AVX

* #if defined(TARGET_X86) || defined(TARGET_AMD64)

* mask XState unconditionally

* Ensure XSTATE_MASK_AVX is set before calling EEGetThreadContext

* redundant supportsAVX, more clear comment

* PR feedback

* null-check the redirect context before using. (#65910)

* null-check the redirect context before using.

* tweak the comment

* do not allocate context if InitializeContext has unexpected results.

* EE Suspension on x86 should use RtlRestoreContext when available (#65878)

* RestoreContextSimulated

* probe for RtlRestoreContext

* ntdll.dll

* restore self-trap sequence

* PR feedback

* Clarify CopyContext in RedirectedHandledJITCaseExceptionFilter

* simpler indentation.

* restore last error on the legacy path.

* Update src/coreclr/vm/threads.h

Co-authored-by: Dan Moseley <danmose@microsoft.com>

Co-authored-by: Vladimir Sadov <vsadov@microsoft.com>
Co-authored-by: Dan Moseley <danmose@microsoft.com>
@ghost ghost locked as resolved and limited conversation to collaborators Mar 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fatal error. Internal CLR error. (0x80131506)
2 participants