This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand the change. The lazyState edi, esi and ebx are initialized from the baseState members, so the context pointers would point to the same values.
The only difference is the EBP and it seems to me that we were missing copying EBP from baseState to lazyState. At least amd64 version copies all callee saved registers. Isn't this actually causing the problem you were trying to fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that
HelperMethodFrame::InsureInit
passes a pointer to a local variable (unwound
) aslazyState
, which means that&(lazyState->...)
becomes invalid afterHelperMethodFrame::InsureInit
returns.PAL_VirtualUnwind
will not updatenonVolRegPtrs.Edi
ifedi
is not saved in the corresponding helper function. In this case,nonVolRegPtrs.Edi
remains as&(lazyState->_edi)
, and thusLazyMachState::unwindLazyState
setslazyState->_pEdi
as&(lazyState->_edi)
just before returning.Later, EH system updates a resume context via using this invalid pointer if a managed exception is thrown inside any helper function that does not save
edi
, and the managed code that invokes this helper function catches this exception.This incorrect update causes test failures in the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is interesting that x64 has the same structure, but does not suffer from this issue. I guess that x64 does not suffer from this issue as it implements
HelperMethodFrame::UpdateRegDisplay
differently.Here is x64 implementation:
Here is x86 implementation:
Unlike x86, x64 uses saved register values directly instead of dereferencing a corresponding pointer. As I understand, only pointers are invalid, and thus x64 may not suffer from this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parjong, I have looked at the sources again and the issue is somewhere else. The amd64 code is correct including the context pointers. Look at the LazyMachState::setLazyStateFromUnwind. It copies the context pointers from the passed in argument (which is the local you have referred to) to the target "this". And during that process, it converts the context pointers that were pointing inside of the source MachState to point to the same registers, but in the target mach state. See here:
https://github.com/dotnet/coreclr/blob/master/src/vm/amd64/gmscpu.h#L153-L164.
And this translation is missing in the x86 version. It is done in ARM and ARM64 too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janvorli Hmm... I guess that the fixup that you mentioned also fixes this issue:
ctx
is always correct (unlikenonVolRegPtrs
), and the fixup will update an (incorrect) interior pointer (untouched byPAL_VirtualUnwind
) as a correct pointer that points a corresponding variable insideMachState
itself.For me, however, it seems that this fixup is necessary only because
LazyMachState::unwindLazyState
incorrectly initializesnonVolRegPtrs
.As I understand,
baseState
is the captured state at the beginning of unmanaged function. If unmanaged fuction does not touch a callee-saved register, the corresponding value inbaseState
should be what we want to find. If unmanaged fuction touches this register, then managed code should save its previous value on the stack. This means that the corresponding pointer will be updated byPAL_VirtualUnwind
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janvorli How about changing all the architectures at once? Is it too risky?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the x86/Windows implementation of
LazyMachState::unwindLazyState
.It initializes
lazyState->_pXXX
as&baseState->_XXX
at the beginning (similarly as this PR does):It seems that the fixup approach resolves the semantic difference between x86/Linux and all the other ports, but introduces another semantic difference between x86/Linux and x86/Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not 100% sure there are no hidden issues with that approach. I feel that modifying all the architectures at once now without thorough testing using CoreCLR and CoreFX tests with various GC stress modes would be risky. Possible issues in this area could manifest themselves as GC holes that are inherently hard to debug.
Since the x86 Linux EH and unwinding is like the other architectures and diverged from the x86 windows, it is better to keep it aligned with the other architectures rather than the Windows x86 wherever it is possible.
I'd prefer creating a separate github issue for looking into the possibility of changing all architectures to the way you have suggested and spending time testing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janvorli I tried to implement the fixup approach as you suggested. I found that the fixup code would be different from other architectures due to the difference in
LazyMachState
.To be specific, all the other architectures use the following layout for
LazyMachState
:But, x86 interleaves callee saved registeres with callee saved register pointers:
Should I revise
LazyMachState
?HelperMethodFrame::InsureInit
is the only function thatLazyMachState::unwindLazyState
, and bothLazyMachState::unwindLazyState
andsetLazyStateFromUnwind
are implemented in gmscpu.h (and gmsx86.cpp).I'm not sure, but it seems that these functions are related with architecture rather than EH model. Could you share your opinion?
@jkotas Could you please take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have much to add to what you have discussed already.
Your proposed fix looks fine to me.
To check for the hidden gotchas, you can try doing the same change for x64 and submit a PR just to run tests on it.