-
-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: gdb/openocd synchronization/possible race condition #186
Comments
let's say it is somehow related, but generally questions should be addressed via the project forum.
unfortunately it is the only portable synchronisation method I could find.
it happens when OpenOCD decides to print this message. in most cases it is when the board is ready.
I have no control over this.
for you to have a reference, the SEGGER J-Link GDB Server, by design, prints a message to confirm it is waiting for connections. this method is a good synchronisation method, and I don't recall such problems with the J-Link plug-in. OpenOCD does not print such messages, so I had to find a workaround. I personally do not use OpenOCD, and do not recommend it to users. if you decide to improve OpenOCD, I suggest you contribute a patch to OpenOCD, to print the waiting connection message, and later I can update the plug-ins to use it. |
Hi Liviu - thanks for the feedback. |
starts the GDB client. |
Thanks Liviu - I was confusing gdb starting openocd (piped mode) with socket mode used here where openocd and gdb are started separately (in this case by the plugin). I'll continue looking into this to see if I can get a better solution. At the moment the only one I have is to put a counter delay at the top of gdbinit which is a real hack! :-) |
yeah, you can switch to J-Link. |
I hear ya! :-) But unfortunately I have to support OpenOCD in this case. :-( |
In case it helps anybody else ... one hack (obviously not ideal) which helps is to delay for a period before allowing gdb to proceed. The period to delay will vary and must be chosen by trial and error.
Where -n specifies the number of seconds (- 1?) to delay and the value needs to be large enough to give openocd time to fully init. |
FWIW I have worked around this for now using my own hack to OpenOCD but this is not an ideal/general fix for everybody. When I get the time I will follow up on this with the OpenOCD project - e.g. start a discussion about the issue and maybe request an enhancement to address it.
Liviu - I presume that you build OpenOCD as-is and don't apply any of your own patches/mods to it? |
that's correct. unless absolutely necessary, I avoid adding patches, since they increase maintenance cost. unfortunately your patch is not very fortunate, I would search for a more elaborate solution. |
I agree but I am using that for now as I build OpenOCD with some other mods right now. |
a more portable trick would be a small delay in ms, default 500, user configurable in the plug-in. |
I don't think that depending on a timed delay is a good idea. |
if you manage to push some changes to the mainstream openocd to have a stable, 100% reliable way of getting this message, even if when enabled by a switch or command, then I'll consider updating the plug-in, although we need to maintain compatibility with older versions too. |
since I doubt that it'll be a fix soon, we'll close this for now. |
Just looking at the OpenOCD code here: https://sourceforge.net/p/openocd/code/ci/master/tree/src/server/server.c Does this offer what's needed? |
I guess it should work too, but I don't think it'll make a big difference from hunting for my message.
|
I'm not so sure. |
I don't know what to say, in my case the full listing is
from this sequence I understand that the JTAG probe is initialised before the socket, so when the gdb client comes, everyting should be ready. could you check your console when you encounter the problem and compare it with this sequence? |
What probe are you using? Olimex or something similar? Put a delay/sleep of a few seconds in the relevant OpenOCD JTAG interface driver's init method and see if it still works. I suspect that it will not and you will get something like GDB error E0 or some other anomalous behaviour. |
you mean to patch the openocd source code? |
Yes, but just to show that the current -c 'echo "Started by GNU MCU Eclipse"' approach is not robust. Not as a fix. Maybe having the plugin wait for "accepting 'gdb' connection on tcp/xxxx" would work in the general case. I'll see if I can try this out to check if it does work. |
ah, right, I know it is not robust. and, although I did not measure it, I think that the "accepting..." message is only a few millisecond away from the echo message, which might be only marginally better, but not enough. could you add some messages in your patched init method, one before the sleep and one after it, to see how they inteleave with the rest of the messages? I expected the init to complete well before preparing the socket, but if openocd is multithreaded, the sleep itself may give control to the next thread, which might be the one to prepare the socket. |
perhaps a more robust aproach would be to make the synchronisation mechanism configurable by the user, i.e. in the gdb client section to add a field where to configure wither a string or a delay in milliseconds, and maybe also make the echo string added to openocd optional or configurable. |
The sleep in multi threaded openocd might not help. |
OpenOCD is single threaded right now and has been for a long time. |
are you sure? on macOS, the Activity Monitor reports 3 threads.
ok, good plan. the only drawback I see for this solution is that the plug-in will no longer work with older openocds, since that message was added not so long ago. |
yes, we can do it, but it'll have the same disadvantage, it'll not work with older openocds. |
There is definitely no thread creation code in OpenOCD.
Having said that I do see multiple threads in ProcessExplorer on Windows 10 too
but I suspect that threads in addition to the main one created by OpenOCD are created by the OS for its own purposes/reasons - e.g. https://stackoverflow.com/questions/34822072/why-does-windows-10-start-extra-threads-in-my-program The code itself is definitely not multithreaded. |
This message:
exists in all of the following OpenOCD versions:
This message
appears in:
This message:
appears in:
As such checking for "accepting 'gdb' connection" could be an appropriate and backward compatible check or else a check for the three possible formats outlined above? My only other concern is that the LOG_INFO message appears before the call to new_connection() which may be the code that actually prepares the socket for connections in which case there could still be a race condition albeit a lot less severe than the existing one...
But in the meantime let me experiment with my modded plugin and OpenOCD 0.10.0+dev at least. :-) |
as long as future versions will not change it, yes... :-( |
if you confirm that hunting for this string solves your problem, I'll update the plug-ins. |
Hmmm - yes - forward compatibility is always a challenge... :-|
Thanks - I'm looking into it now with my modified GME openocd plugin and will do some experimentation and report back. BTW - in relation to earlier questions about the relative timing of messages from OpenOCD this might clarify:
The "Listening on port 3333 for gdb connections" message seems to be a SiFive addition - not sure why. |
I apologise if this is a stupid question, but would it be viable to probe the port with netstat or lsof to see if the advertised port is really LISTENING? Not even sure if there is Windows equivalent for it. |
as long as it is intended to help, there is no such thing as a stupid question. it might work, but:
it seems complicated and heavy. |
Probing it on Java level gets rid of the parsing: http://www.geekality.net/2013/04/30/java-simple-check-to-see-if-a-server-is-listening-on-a-port/ But I'm not sure that a empty connection and disconnecting will trigger something undesired on the openOCD side. I was trying to change the wait till the "accepting 'gdb' connection" and I got timeouts, so I wondered if I never received that text. So I put lot of System.out.println and run eclipse with -consoleLog and for some reason last text I received was the "Started by GNU ARM Eclipse". I'm wondering would it be possible that something interrupts/breaks the error pipe and I'm not receiving any more information from the current pipe? |
where exactly is written the openocd log, to stdout or stderr? is it the same stream as the one used for |
This is what we were getting in the log (e.g. when Eclipse is launched with -consoleLog):
The "accepting 'gdb' connections on tcp/3333" message does't appear there even though it is appearing in the OpenOCD log in Eclipse and is going to the stderr as it's coloured red in the console and other LOG_INFO messages are obviously visible to the plugin Java code. It's very very odd.... |
Just to clarify - Anton sees "Started by GNU ARM Eclipse" last because he is using a version of OpenOCD with my ugly hack that moves this log message to the end of openocd.c:openocd_thread() instead of it appearing at the start as it normally does in the unhacked openocd. |
it looks like there was a good reason why I used the |
In our case it should be the stderr I should probably try the same with the vanilla OpenOCD. I tried to debug the plugin, so I got the Eclipse SDK and I got so far that the SDK Eclipse was able to compile the plugin and then run Eclipse Oxygen with it, but I didn't far enough to be able to debug it. Something odd/funny is on the OpenOCD which doesn't not work well with the pipes the way they are handled. |
Hi guys! By the way, I found that GDB 8.0 + OpenOCD 0.10.0 in most case unusable combination in case, when some RTOS (ThreadX in my case) is used and RTOS support module used by OpenOCD. Often I observe such messages when try to stop process:
All work good with GDB 7.7.1 (original) and 7.8 (Linaro branch) and OpenOCD 0.10.0. Currently I have no ideas how it report and just freeze used GDB version. Issue found on Manjaro Linux using GDB located in repos and Qt Creator, but I reproduce issues using only GDB and OpenOCD from command line. I deal with Cypress FX3 chip in our production board. |
I cannot confirm it, for my RISC-V tests I use the latest GDB and the latest OpenOCD and did not encounter this problem. |
Cypress FX3 uses ARM926E-JS core and RTOS ThreadX. OpenOCD ability to handle RTOS threads is used too. Seems, that RTOS and it handling on the OpenOCD/GDB side is a major thing. |
I don't see what these posts have to do with the original topic of this thread?! |
they probably don't. @h4tr3d, for support and general questions, please use the project forum. |
I hope it's OK to post this here in case anybody has some insight even though admittedly it's almost certainly not a GNU ARM Eclipse plugin issue per se (or at all) and I realise that the FAQ says to only post GNU ARM Eclipse issues.
I see here (http://gnuarmeclipse.github.io/debug/openocd/) under "Quote the entire echo command" a snippet about gdb synchronizing with openocd using the -c 'echo "Started by GNU ARM Eclipse"' command.
Is that the sum total of synchronization between gdb and openocd?
At what point does it happen?
In particular does that synchronization ensure that openocd has initialized fully (e.g. gdb/mi port 3333 and telnet port 4444 servers if applicable etc.) or just that it has started up but may still have initialization to do?
My problem is that in some cases it looks like gdb tries to attach to openocd's gdb/mi port 3333 before openocd has successfully initialized that "server" and I get this:
Specifically I don't get this if I have one of our FlashPro programmers attached (driven by a custom JTAG driver that I have added to openocd). But if I have more than one then the added time taken to enumerate/init etc. these seems to delay openocd sufficiently that it has not yet initialized the gdb/mi port 3333 server by the time that gdb attempts to connect.
However even though it has not happened to date with a single programmer attached it seems to me that there is some latent race condition between gdb and openocd and it was simply luck that avoided problems previously?
Any tips on how to synchronize gdb and openocd in order to avoid this problem?
Thanks and apologies again if this is somewhat off topic from GNU ARM Eclipse proper.
Regards
Tommy`
The text was updated successfully, but these errors were encountered: