-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop.bat is buggy #117
Comments
Also, the Stop script types a "q" into whatever app happens to be in the foreground if the telnet app quits early for whatever reason. This is probably made more likely by my workaround (even if I keep a single "q" and only change the delay before "q" from 50ms to 5000ms.) |
Thanks for the info. Before I spend time investigating this, can you please tell me what is the reason why you sometimes want to shut down IBC with a stop command immediately after you start it? I suspect what's happening is that if you send the stop before TWS/Gateway has displayed the main window and built the menus, the stop task simply waits until that has happened (but I need to check the code to confirm this, which I'll do when I have some spare moments - rare things they are!). If that's the case, it should be easy enough for IBC to just exit without waiting for TWS/Gateway to complete its initialisation. I've noticed the 'q' issue in the past, but have never tried hard to solve it, since it didn't seem to cause any problems and no-one has complained about it before. Telnet on Windows is such a pig that the best thing to do might be to write a tiny Java app that connects IBC and sends the STOP command (or any other command that IBC recognises). |
I don't really want to shut it down right after I start it, but I think multiple watchdogs may step on each other sometimes. I also feel like there are other ways for IBC to get into this unresponsive state (because I've seen it in cases where there is only one watchdog and it can't step on itself), but this is an easy to repro case and it might point to a more general issue. I often end up in situations where the IBC process can't be killed but it also doesn't have a working connection to IB, and new instances of IBC can't be started because there is an open handle on one of the files it needs, presumably form a zombie process. I'm trying another strategy where I don't try to restart IBC every time IB loses connection, just leave it running all day. But this is not a good solution because if I ever log into the account manually, it will kick out IBGateway and it will never get restarted if I forget to restart the whole shebang. Also, I think this type of instability might be affecting other frameworks that use IBC, like ib_insync. The connection died semi-permanently too often while I was using the ib_insync watchdog (that's why I'm trying to reimplement my own watchdog). Overall I think everyone would be better served if there were no race conditions. Hope that gives more context on the issue. Thanks! |
Ok I've spent some time on this and there is definitely something wrong when you use Stop.bat before Gateway/TWS reaches a certain stage in its initialisation (and their initialisation sequence are quite different in significant ways). There are some aspects of what's happening that I don't quite understand yet. I won't be able to get back to this until this evening (UK time), and it might take some time to get it sorted. So in the meantime can I ask that you make some effort to prevent Stop,bat being called immediately after starting Gateway, because that is really an awkward situation. Bear in mind that if Stop,bat is called before IBC has managed to get its command server running (about a second or so on my server), it is guaranteed to do nothing, which your watchdog(s?) may not expect. I'm also rapidly convincing myself that Windows telnet is really not up to the job, so I probably will implement a small command sender program (that could also be used as a library), but don't hold your breath! By the way, what are these watchdogs? Why is there more than one of them? And why do you 'restart IBC every time IB loses connection'? (I presume you mean the Gateway/TWS's connection to the IB server?) |
I've fixed these problems and I hope all will be well now. I've created a new version 3.8.6-beta.2, which you can download from here: https://1drv.ms/u/s!AlqfLEOWDJ9Zh8ckH_dReHyajw45KQ?e=iYXyxm Please can you give this version a try, and let me know how it goes. If everything is ok I'll make an 'official' release. Changes in this version:
I've just discovered that if you run Stop.bat before IBC has opened the command processor socket (ie within 20 seconds of that, so you can even run it up to about 19 seconds before starting IBC), IBC does shut down straight away but currently it writes a lot of identical error messages to the log file. It's too late to fix this now, but it would be worth you using the new version anyway and I'll try to fix this untidiness tomorrow. |
Stop doesn't stop at all now once IBG is logged in. You can also try a script like this to test various corner cases:
|
Oops! Late night working is never a good idea... A new version 3.8.6-beta.3 is now at: https://1drv.ms/u/s!AlqfLEOWDJ9Zh-At2bALrtTIH2Yv9w?e=CAq5Hn Note that I've also modified the SendStopCommand.vbs to not send the EXIT command: IBC now always closes the connection when it receives a STOP command. It might be worth pointing out that in some circumstance where STOP is invoked before TWS/Gateway has finished initialising, or is processing an asynchronous task such as overriding the API port setting, calling STOP causes an InterruptException that is logged by TWS, and the log entries appear in the IBC log. They are harmless and don't prevent the STOP being actioned. Regarding your script, I'm not currently a Python speaker, but it certainly has its uses, so perhaps I should put some time into it... |
Any update on this? |
So sorry I got busy and couldn't get back to this. Maybe next week. Currently I have one out of two scripts turned off to keep it stable. |
Hi! Sorry for the late reply and thanks a lot for the fixes! The stop is reliable now. However, it doesn't work at all when run from Windows Task Scheduler. Try to add this to a .bat file and create a task and run it:
You will notice the java process is still running after, as well as a conhost and a telnet process. Running Stop.bat from the command line is able to kill the java process, Task Scheduler is really weird, it creates some crazy permissions, and in general it's not easy to kill processes started from it. Hope you're able to debug and find a workaround. Note that running the batch file above from a command line shuts down all processes fine. So it's only an issue when it's run from Task Scheduler. |
I've fixed the problem with Task Scheduler. The reason it failed is that the However when running from Task Scheduler, it makes sure that the windows created are not the active window, presumably to ensure that anything the user is doing is not hijacked because of a scheduled task running. The solution was to amend the scripts to explicitly make the telnet window active. So just download the updated I've updated the Windows release zip to contain the amended files. |
Hmm, I pulled the changes, but it still doesn't work for me. |
When you start a program using Task Scheduler with "Run whether user is logged on or not", it is started in session 0 (you can see this from the Session Id column in Task Manager). Programs in session 0 (typically services) can run a user interface, but are provided with no resources to actually make that user interface visible or to receive any input (for example you could actually run TWS or Gateway in this way, and they would work fine, but there's absolutely no way to access their GUIs.) This is why you can't see the telnet window. And since there is no way to get input to the telnet window, the .vbs script that tries to send keys to the telnet window simply cannot work. So there is no possibility of the current Stop.bat working as a scheduled task with "Run when user is logged on or not". Before I go any further, might I ask why you need to run Stop.bat as a scheduled task at all? The fact that it's a scheduled task implies that you know when you want to shut down TWS/Gateway and this doesn't change frequently - ie you're not using Stop.bat in a sort of tactical 'as and when' manner. So why can't you just set the But if you have a genuine use-case for closing down on a scheduled basis and the This would actually be a nice little community enhancement, if you or anyone else wants to do it, but please let me know first. |
I see. Yes in fact to keep IBG stable I sometimes have to shut it down explicitly. IBG sometimes gets into a state where I can connect to it but most operations fail, like getting account values. It seems to be running out of memory when too many messages get logged. Lots of stuff gets logged because I do a lot of connects and disconnects so I can use a single instance of ib_insync to keep tabs on two gateways at the same time. I could probably work around this, separate to two ib_insync instances to do less connect/disconnects, but I can't be totally sure that there's no other way for IB to get screwed up that might require a restart. I do need to check that IBG is up by connecting to it, I can't just do a simple timed startup and shutdown, because I sometimes manually log in with TWS, which kicks off the IBG connection, which shuts down IBG, and then it needs to get restarted. I don't want to forget one day. For now I have worked around the lack of Stop.bat by remembering the java process that IBC starts, then terminating it when I need to by pid. So it's not an urgent issue for me, would just be cleaner to have a working stop. No worries, don't do it just for me. I might volunteer to work on it normally, but I have no experience in networking and I suck at Java, so I wouldn't be efficient at it... |
Maybe use netcat on Linux and something like this on Windows https://nmap.org/ncat/ to be able to feed messages from a file rather than stdin? Since the windows installation instructions on turning on telnet are not trivial anyway, this might not be an extra burden... |
There are several points to be addressed here:
|
I've now fixed the bug I referred to in in point 4 of my most recent post above. Note that the description of the |
Thanks. I'm good for now with my workarounds. I'm using separate scripts per login now, so I don't need to connect/disconnect over and over, so IBG doesn't become unstable. Closing executables by pid is not the worst. But if stop.bat ever becomes fully functional, I can use it to clean up my code. Nothing urgent. Thanks for all your help! |
Hi! Using
|
@dmytro-sheyko you're absolutely right that the current approach on Windows is a complete kludge , and I should have done something more appropriate about it 15 years ago! But the fact is that no-one ever seemed to use the STOP command back then, and I suspect that even now hardly anyone uses it on Windows, and the kludge does actually work, so I never gave it much serious thought. Though I suspect I've wasted much more time on dealing with it as it is than it would have taken to do it right!... So thanks for your little program. There's a bit more to it than just the program though: the fact that it's Java means that we have to also provide a script to run it that will locate the correct Java to use, in exactly the same way as the IBC scripts, because the user may well not have a 'standard' Java installation (only the one that's installed with TWS/Gateway). I would envisage factoring out the Java-location code as a separate script, called from Stop.bat and StartIBC.bat. There is also the question of whether to also provide a Linux script - probably not necessary since telnet is so much more usable on Linux, but it might save someone some time working it out for themselves. Would you be interested in providing a PR to cover this and replace the existing mechanism? |
@rlktradewright, I've created pull request, please review. I did not extract the Java-location code as a separate script, but hopefully what I did is good enough to start. Also I did not touch Linux scripts. |
Hi, @rlktradewright! Any updates on Stop.bat on Windows? I still get into situations where Stop is not able to stop a running IBC process, telnet doesn't seem to connect. |
I wasn't aware that any update is needed. It works fine for me. Reading quickly through the above, I notice that I mentioned quite a few things that you need to take into account. So can you please give me a description of what you're trying to do with Stop.bat and the circumstances when it apparently doesn't work. |
If, by highlighting those two posts above, you're trying to draw my attention to them, then you're wasting your time. The pull request was a non-starter for all the reasons detailed in my review of it. And you haven't answered my question. As far as I'm concerned, Stop.bat works fine, so if you want me to 'fix' it you'll need to make clear what you think is wrong with it. Otherwise you're just wasting my time as well. |
Oops, please ignore my previous post. I was confused by an alert I received, though I can't seem to find any record of it now. I'm concerned that you're still having problems with Stop.bat. If there is a bug in it, or in IBC, obviously I'm keen to fix it, but I don't have any evidence of a bug in either. The most recent thing you said was:
That seems to imply that it does work sometimes, at least. So it looks like your problem is probably environmental in some way, but I can't speculate what without further information. However, here's a completely different approach. Below is a small Python script that does the job perfectly and can be run from Task Scheduler without the user being logged in, if you still want to be able to do this. I am not a Pythonista, so it's been a bit of a learning curve to get this working, and it could probably be improved to make it more robust, but it seems to work well provided the command string is correct (so you can replace STOP with RESTART to do a restart). You'll need to install Python on Windows (if you don't already have it), but that's a trivial downlad-and-install from https://www.python.org/downloads/windows/. Once you've installed Python, create a Python script file called, for example, StopTWS.py, copy and paste the script below into it and save it in C:\IBC (unfortunately Github doesn't allow Python files to be attached to comments). The command to run it is then simply:
Here's the script:
I hope this will prove useful to you. I'll probably include it in some form in the User Guide and the Windows IBC download as a Stop.BAT replacement. By the way, this script should also work on Linux, but I haven't tested this (I broke my Linux VM trying to get Python properly installed on it, so...). |
Hi! I'm running into a lot of instability with my watchdog script because IBC is not being shut down properly by Stop.bat. I'm working in Windows 10 on a fast machine.
A minor issue is that the timeout before "q" is too short, so the telnet window doesn't always close, stays up with this message:
I worked around it by adding an additional "q" at the end of SendStopCommand.vbs with a couple seconds delay.
The much worse problem is that if Stop.bat is called too soon after StartGateway.bat, IBC ends up in a broken state. The telnet window says the process shut down, but it didn't, and any subsequent calls to Stop.bat will fail with the telnet window stuck, blank, doesn't accept any input, only way to close it is to hard close it. Only way to stop IBC at this point is to kill it manually. Easy to repro. Manually run StartGateway.bat, then as quickly as possible manually run Stop.bat. Then try to run Stop.bat again.
I also saw this message one time in the telnet window: "ERROR null source ". Really depends on timing of when you run Stop vs Start.
I also don't know how to cleanly kill the stuck telnet app, because it was started by the batch file with "start", so killing the process of the batch file does nothing:
import subprocess
import time
p = subprocess.Popen("C:\IBC\Stop.bat")
time.sleep(2)
p.kill()
I could kill all "telnet" processes, but that's not clean.
It would really help me out if you can offer a solution to these issues. Thanks!
The text was updated successfully, but these errors were encountered: