-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[x86/Linux] Fix incorrect initialization in LazyMachState::unwindLazyState #10888
Conversation
@dotnet-bot test Ubuntu16.04 arm Cross Debug Build please |
@dotnet-bot test Ubuntu x64 Checked Build and Test please |
@dotnet-bot test Ubuntu arm Cross Release Build please |
@dotnet-bot test Ubuntu arm Cross Release Build please |
@dotnet-bot test OSX10.12 x64 Checked Build and Test please |
nonVolRegPtrs.Esi = &(lazyState->_esi); | ||
nonVolRegPtrs.Ebx = &(lazyState->_ebx); | ||
nonVolRegPtrs.Ebp = &(lazyState->_ebp); | ||
nonVolRegPtrs.Edi = &(baseState->_edi); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand the change. The lazyState edi, esi and ebx are initialized from the baseState members, so the context pointers would point to the same values.
The only difference is the EBP and it seems to me that we were missing copying EBP from baseState to lazyState. At least amd64 version copies all callee saved registers. Isn't this actually causing the problem you were trying to fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that HelperMethodFrame::InsureInit
passes a pointer to a local variable (unwound
) as lazyState
, which means that &(lazyState->...)
becomes invalid after HelperMethodFrame::InsureInit
returns.
PAL_VirtualUnwind
will not update nonVolRegPtrs.Edi
if edi
is not saved in the corresponding helper function. In this case, nonVolRegPtrs.Edi
remains as &(lazyState->_edi)
, and thus LazyMachState::unwindLazyState
sets lazyState->_pEdi
as &(lazyState->_edi)
just before returning.
Later, EH system updates a resume context via using this invalid pointer if a managed exception is thrown inside any helper function that does not save edi
, and the managed code that invokes this helper function catches this exception.
This incorrect update causes test failures in the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is interesting that x64 has the same structure, but does not suffer from this issue. I guess that x64 does not suffer from this issue as it implements HelperMethodFrame::UpdateRegDisplay
differently.
Here is x64 implementation:
#define CALLEE_SAVED_REGISTER(regname) pRD->pCurrentContext->regname = pUnwoundState->m_Capture.regname;
ENUM_CALLEE_SAVED_REGISTERS();
#undef CALLEE_SAVED_REGISTER
Here is x86 implementation:
#define CALLEE_SAVED_REGISTER(regname) pRD->pCurrentContext->regname = *((DWORD*) m_MachState.p##regname());
ENUM_CALLEE_SAVED_REGISTERS();
#undef CALLEE_SAVED_REGISTER
Unlike x86, x64 uses saved register values directly instead of dereferencing a corresponding pointer. As I understand, only pointers are invalid, and thus x64 may not suffer from this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parjong, I have looked at the sources again and the issue is somewhere else. The amd64 code is correct including the context pointers. Look at the LazyMachState::setLazyStateFromUnwind. It copies the context pointers from the passed in argument (which is the local you have referred to) to the target "this". And during that process, it converts the context pointers that were pointing inside of the source MachState to point to the same registers, but in the target mach state. See here:
https://github.com/dotnet/coreclr/blob/master/src/vm/amd64/gmscpu.h#L153-L164.
And this translation is missing in the x86 version. It is done in ARM and ARM64 too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janvorli Hmm... I guess that the fixup that you mentioned also fixes this issue: ctx
is always correct (unlike nonVolRegPtrs
), and the fixup will update an (incorrect) interior pointer (untouched by PAL_VirtualUnwind
) as a correct pointer that points a corresponding variable inside MachState
itself.
For me, however, it seems that this fixup is necessary only because LazyMachState::unwindLazyState
incorrectly initializes nonVolRegPtrs
.
As I understand, baseState
is the captured state at the beginning of unmanaged function. If unmanaged fuction does not touch a callee-saved register, the corresponding value in baseState
should be what we want to find. If unmanaged fuction touches this register, then managed code should save its previous value on the stack. This means that the corresponding pointer will be updated byPAL_VirtualUnwind
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janvorli How about changing all the architectures at once? Is it too risky?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the x86/Windows implementation of LazyMachState::unwindLazyState
.
It initializes lazyState->_pXXX
as &baseState->_XXX
at the beginning (similarly as this PR does):
#ifndef DACCESS_COMPILE
lazyState->_pEdi = &baseState->_edi;
lazyState->_pEsi = &baseState->_esi;
lazyState->_pEbx = &baseState->_ebx;
lazyState->_pEbp = &baseState->_ebp;
#endif
It seems that the fixup approach resolves the semantic difference between x86/Linux and all the other ports, but introduces another semantic difference between x86/Linux and x86/Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not 100% sure there are no hidden issues with that approach. I feel that modifying all the architectures at once now without thorough testing using CoreCLR and CoreFX tests with various GC stress modes would be risky. Possible issues in this area could manifest themselves as GC holes that are inherently hard to debug.
Since the x86 Linux EH and unwinding is like the other architectures and diverged from the x86 windows, it is better to keep it aligned with the other architectures rather than the Windows x86 wherever it is possible.
I'd prefer creating a separate github issue for looking into the possibility of changing all architectures to the way you have suggested and spending time testing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janvorli I tried to implement the fixup approach as you suggested. I found that the fixup code would be different from other architectures due to the difference in LazyMachState
.
To be specific, all the other architectures use the following layout for LazyMachState
:
struct LazyMachState {
TADDR CalleeSavedRegisters[...];
PTR_TADDR CalleeSavedRegisterPointers[...];
}
But, x86 interleaves callee saved registeres with callee saved register pointers:
PTR_TADDR _pEdi;
TADDR _edi;
PTR_TADDR _pEsi;
TADDR _esi;
PTR_TADDR _pEbx;
TADDR _ebx;
PTR_TADDR _pEbp;
TADDR _ebp;
Should I revise LazyMachState
?
HelperMethodFrame::InsureInit
is the only function that LazyMachState::unwindLazyState
, and both LazyMachState::unwindLazyState
and setLazyStateFromUnwind
are implemented in gmscpu.h (and gmsx86.cpp).
I'm not sure, but it seems that these functions are related with architecture rather than EH model. Could you share your opinion?
@jkotas Could you please take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have much to add to what you have discussed already.
Your proposed fix looks fine to me.
To check for the hidden gotchas, you can try doing the same change for x64 and submit a PR just to run tests on it.
@dotnet-bot test Tizen armel Cross Debug Build please |
@parjong yes, there is. I have just triggered it for your testing PR. |
The current implementation initializes nonVolRegPtrs.Reg as lazyState->Reg, which may result in incorrect update when Reg is not stored. This incorrect initialization results in several Pri2 test failures:
This commit resolves these test failures.