-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ELT call stub issues on Linux #10706
Comments
Could you please share the details about the segmentation fault? |
This looks suspect:
We should be running with aligned stack pointer here. It should not be required to manually align the stack pointer. Calling the C/C++ stub methods with misagligned stack pointer can lead to segmentation faults. This was introduced in dotnet/coreclr#12603. @sergign60 Do you happen to remember why you had to do the manual stack alignment there? |
In addition to wherever the stack alignment issue raised by Jan leads, I think you've got a more fundamental issue - there are two different variations of the callbacks: The assembly you copied from the runtime isn't actually either of those signatures. The runtime uses it as if it were a FunctionEnter3 but when we do so we are relying on an implementation detail of the jitted code that passes us a 2nd hidden argument not mentioned in the FunctionEnter3 signature. In your case you are casting incorrectly and claiming to the runtime that the assembly routine implements the FunctionEnter3WithInfo signature. Because the COR_PROF_ELT_INFO argument is not the same type as the hidden profiledRsp argument, the ABI doesn't match up and the code crashes. You either need to use SetEnterLeaveFunctionHooks3 that takes FunctionEnter3 as input, or you need to implement a method with the FunctionEnter3WithInfo signature. The runtime doesn't have any implementation of FunctionEnter3WithInfo to use an example, but a decent starting point that would probably get you unblocked is to write the function in C. Technically that doesn't give strong enough register preservation requirements according to MSDN, but if you stay away from doing any floating point math (including things the compiler would auto-vectorize into floating point math) it would probably work. Looking over some of the code it doesn't feel like even the devs at Microsoft quite agree on what the correct bar is for preserving registers here, but I hope we can start getting that ironed out (#19023). I'll look around and see if I can find an example implementation of FunctionEnter3WithInfo that looks correct. |
@noahfalk Thank you, that was what my gut was telling me as well, I just needed to validate. Perhaps it would help if I posted my working code from the Windows side? The syntax is different as I was relying on VS/ML64 over there. But it might be a better starting point, if someone can help translate to what really needs done. I can work on this on the straight C approach as well - but again, not fully understanding the ABI and how to deal with saving registers is my main issue. Assembly just adds a level of syntax on top of that issue. And yes, what I need is a working FunctionEnter3WithInfo method. Thanks again for the help. @jkotas no additional detail on the fault. Segmentation fault (core dumped) and process exits. |
I peaked at some of our old test code laying around and it was doing it directly as C/C++:
At the moment I'm not even sure we at Microsoft have a solid consensus on what the ABI is supposed to be - ProfileEnter callers/ProfilerEnter implementers/MSDN documentation/comments are all making different claims about what the ABI is (see dotnet/coreclr#19023) so I don't have an immediate answer for you - I'm working on it : ) In the meantime doing it in C may not qualify as rigorously correct, but it will probably unblock you while we work this out. Once we've got a specific ABI we are trying to hit we can help a bit with the mechanics of implementing an example routine. |
Thanks, @noahfalk . I'll try this approach and see what I get. When I was originally working on some very prototype-y code, I was doing similar, and it was at least passing functionID and it was stable (didn't ever try to use the ELT info). So this makes some sense. Will let you know how it turns out. |
@noahfalk this "doesn't not work." It no longer results in a segmentation fault, however I've yet to validate the results of the API call. If they are good - then I can keep working. Otherwise I'm just stuck at a new spot. :) Just curious, why / how would this work, and not destroy the stack in the process? |
The current implementation for FunctionEnter3WithInfo (and similar for Leave/Tailcall) calls your callback with a stack like this: your_function_enter_3_with_info ProfileEnterNaked saves and restores some registers on your behalf (like the integer argument registers and some of the return registers). The remaining registers aren't saved, but it works either because undocumented implementation details in the JIT means we don't wind up caring about those registers, or your callback doesn't wind up trashing them. For an example that might not work you could intrument a managed method that takes floating point arguments and you call a native method in your callback that also uses floating point arguments. I'd expect the native call with floating point args would change some of the xmm register values, no code in any of those functions would restore them and after the callback returns the jitted code would observe the arguments used within your callback, not the ones its caller originally provided. |
Makes sense. Thanks! |
@noahfalk your above explanation makes sense. It got me past the point that I cared about it - I was only trying to get to class layout and field info. But now, moving on, trying to get parameter info. And seeing some odd behavior that I think is related. Take this .net method:
I want to get the I start to set things up by getting the range of arguments from the profiling api:
But then something a bit odd happens. My
Of course the documentation here says that "Arguments that are stored in registers are spilled into memory to build the structures. " So, I'm guessing that until I resolve this, I'm not going to be getting valid addresses for arguments. Let me know if there's anything else I can do to help this all out. |
@dotnetjt are you sure the data is not there? Try printing out the values of the data from the pArgumentInfo array. I would recommend you use non-pointer types so you can easily validate using printf debugging. Your screenshot indicates to me that the data might be there but your debugger UI doesn't know to associate |
@mjsabby You may just be entirely right. I let my loop run through the ranges array, and there was a 2nd value there. But, I have a separate issue that seems to be something different from my Windows version. When I enumerate I pass through a couple of helper methods, and in this case (correctly) land on a helper to pull that string buffer out of memory. I know this code works, as it works on fields (ObjectID of the class + offset of field). But passing the startAddress of a param doesn't. Admittedly, I haven't dug into it yet, but maybe you know something I don't about why this works fine on windows, but not Linux. Basically, it looks like this (startAddress being passed to the helper):
Anything obvious that I'm missing? |
I don't have any immediate insight on why you're running into this. I recommend you look at the length offset that is returned from |
Nope, definitely some issue. If I take the
Whereas if I take the address of a field in a class layout, run it through the exact same helper, it's good:
Substituting into gdb what my code does:
So, all I'm trying to say is that the code that looks at memory is good. But for parameters, I'm getting suspect startAddresses, which I am going to assume is either this issue around the calling convention and register preservation, or it is something else as of yet unreported. ¯_(ツ)_/¯ Thanks for your continued help on this guys. |
microsoft/clr-samples#11 is tracking adding a sample that uses the ELT APIs for Linux X64 (as well as the well-tested Windows X64/X86) for everyone's benefit. I did a quick test this morning, and while it wasn't exhaustive I'm able to do some inspections but the jury isn't out yet so I might poke more when I add that sample. Your issue doesn't say but are you using a recent build of CoreCLR? The official .NET Core 2.1 latest builds may have bugs. Given that @noahfalk is still chasing exactly what is preserved or not, can you try just preserving and restoring everything? The sample I intend to write at first will do just that. |
Sorry I've been preoccupied with some other issues so I haven't gotten a chance to to look into this 2nd part of the issue yet. If there is a repro that would be handy, but if not there might be enough info above or I can tinker with our profiler tests. Sorry I just haven't gotten to this one yet. |
@mjsabby would you be willing to share with me what you have so far? As I've mentioned, assembly is far from my strong suit. It might be enough to keep me going and I can help test and provide feedback. |
Also, using current 2.1 release. |
Just wanted to give an update - haven't forgotten but its going to be at least another week before we can get to investigating this. |
Hey @mjsabby and @noahfalk just an update for you. I took what @mjsabby provided in the ELT sample referenced and plugged in:
Still the same result. The 'startAddress` given for the argument is wrong, by a long shot. I hate to beat a dead horse, but this to me seems like an issue somewhere with the API giving back the right addresses. Or, something continues to clobber the stack, despite the above ^. I'm happy to test out whatever you throw over my way. :) |
@noahfalk any more traction on this? |
@sywhang I think started looking into it yesterday, not sure where he got with it so far |
Hi, I think I might be hitting the same issue so I thought additional info could be useful. My printf(
" argumentInfo:\n numRanges: %u\n totalArgumentSize: %u\n ranges:\n",
argumentInfo->numRanges,
argumentInfo->totalArgumentSize
);
for (uint64_t i = 0; i < argumentInfo->numRanges; i++) {
COR_PRF_FUNCTION_ARGUMENT_RANGE range = argumentInfo->ranges[i];
printf(
" startAddress: %p\n length: %u\n",
(void *) range.startAddress,
range.length
);
printf(" data:");
for (UINT_PTR index = 0; index < range.length; index++) {
uint8_t byte = *(((uint8_t *) range.startAddress) + index);
printf(" %02x", byte);
}
printf("\n");
} With the following C# function being profiled: static int foo(int a, int b, int c, string d)
{
return a + b + c + d.Length;
} For
The data doesn't make sense. I'm not familiar with the memory layout of .NET objects so maybe I missed something. Complete code is available at https://github.com/bbc2/dotnet-test-profiler/tree/function-enter-arg-issue.
|
Hey @dotnetjt and @bbc2, thanks for reporting this issue. I was told about this issue by @noahfalk a couple days ago, but I haven't been able to spend much time on it because of a more pressing issue I've been asked to look at. I will spend some time today to go over this and get back to you later tonight. This may be in the form of a full sample ELT profiler code. (That is, if this is not an issue on our ELT API but simply a misuse of our API) If it turns out that our API has an issue, then I will also follow up on this thread with what I found. Thanks, |
@bbc2 your code looks correct to me for grabbing an int out of memory, assuming the startAddress you are given is correct. :) My profiler is a little more open ended and does a lot of stuff before hand, inspecting signatures to determine the type, checking to see if it's boxed, etc. But in a controlled test, you can do exactly what you are there, and that's how I encountered the same issue. Glad you were able to see it as well so I feel more confident that it's not a coding issue on my end. |
So sorry for the late response. I just wanted to let you know that I am still looking into this issue! I wrote an example profiler for Windows but it turns out that there are subtle changes I have to make it also work on Linux (which mostly has to do with the wide string types that the profiler API uses).. I've just been pulled away for random things so I haven't had the time to get back to this but I still wanted to let you know that I haven't run away from this :) I will get post an update to this thread as soon as I have something concrete and tested on all platforms! |
@sywhang any updates? Completely understand on the string types.... been there myself! |
So I was able to replicate your issue - I wrote a sample profiler that can successfully retrieve function arguments with FunctionEnter3WithInfo callback, but the same profiler code does not seem to behave as documented on Linux. I'm still continuing my investigation on whether it is the profiler that's at fault, or the runtime API itself that's at fault. It may be entirely possible that you've unveiled a bug in a part of the profiler API that hasn't been tested well, so thank you for reporting this issue! I'll continue posting updates as I continue my investigation. |
I've successfully verified that this is indeed a runtime bug and filed https://github.com/dotnet/coreclr/issues/19622 to track the specific issue in the runtime (since this thread is more about the general usage of ELT API I thought it'd be better to track the runtime bug in a different thread :) ) |
@sywhang @noahfalk Just to add more visibility to this, I'm also blocked from porting an existing Profiler-based Diagnostic tool from Windows to Linux, I see the exact same behaviour that @dotnetjt describes in https://github.com/dotnet/coreclr/issues/18977#issuecomment-407596910. Will a fix for dotnet/coreclr#19622 make it into the 3.0 release? Also, is it also feasible to have it back-ported to a previous release, for instance 2.2? |
I wasn't aware of this issue, but independently ran in to it and fixed it in #39550. There were issues with the assembly stubs and the argument iterator, they should all be fixed starting with 5.0. |
Hi @noahfalk - this is to continue the discussion from a closed issue prior.
I'm (slowly) porting our windows CLR profiler to linux. Everything was going spectacularly well until I tried to use the ICorProfilerInfo3::GetFunctionEnter3Info method. Calling this immediately results in a segmentation fault:
I'm smart enough to know the issue is probably a bad pointer for eltInfo, which of course is going to be the enter stub code. Let's pretend for a moment that I don't know or understand assembly or calling conventions enough to and that I only got my windows version working with the help of some really great samples that magically worked.
Here's my setup of the ELT hooks:
These are defined as:
And finally, the assembly behind all this, which is pretty much a straight lift of
AsmHelpers.S
but with method names changed:It seems to me like the frame isn't being set up right, but I just have no idea where to begin. Help? Thanks in advance for any insight.
The text was updated successfully, but these errors were encountered: