-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ctrl+C causes Python interpreter with pysat loaded to crash not gracefully yield an exception #53
Comments
Yep, that's a problem. It used to work properly a few months ago but now this is what happens. I need to investigate what causes it. |
OK, I think this concrete issue seems to be caused by the signal handling, which works fine on other systems, e.g. on Linux or MacOS. Since I don't have access to Windows, it is kind of problematic for me to investigate and so to fix it. :) |
I have access to Windows, will try to reproduce it. |
Thanks, @dulanov! |
So, ...
C:\Users\ABC\AppData\Local\Temp\pip-install-i8zkex_7\python-sat_d5feacf086464b6a84d8a0a15faaa9e5\solvers\cadical103\internal.hpp(21): fatal error C1083: Datei (Include) kann nicht geöffnet werden: "unistd.h": No such file or directory
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.38.33130\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
... Here But (.venv) PS C:\Workspace\References\pysathq-pysat\examples> python.exe lbx.py -d -e all -s glucose3 -vv 004b0f451f7d96f6a572e9e76360f51a-spg_420_280.cnf.xz
Traceback (most recent call last):
File "C:\Workspace\References\pysathq-pysat\examples\lbx.py", line 584, in <module>
for i, mcs in enumerate(mcsls.enumerate()):
File "C:\Workspace\References\pysathq-pysat\examples\lbx.py", line 301, in enumerate
mcs = self.compute()
^^^^^^^^^^^^^^
File "C:\Workspace\References\pysathq-pysat\examples\lbx.py", line 284, in compute
self._filter_satisfied(update_setd=True)
File "C:\Workspace\References\pysathq-pysat\examples\lbx.py", line 374, in _filter_satisfied
setd = setd.union(set(cl))
^^^^
KeyboardInterrupt @GregoryMorse It seems to me that this problem no longer exists. |
Installation from pip should work on the platforms which I compiled it for (there are a few binary wheels provided). As for the ctrl+c issue, it got interrupted while in Python (not the "solvers" C extension). By any chance, can you try it on hard CNF instances where a SaT call takes a lot of time so that you interrupt it during a SAT call? |
This is strange because
May be instead of build should be install from, I also can fix it, no problem. |
It was exactly hard CNF formula (89Mb) and I tried several times, but I will try to more deeply analyze the moment of interrupt, ok. |
Well, I guess it means you are using a different platform. 🙂 |
Sure, but the traceback log you attached clearly shows the execution was interrupted in Python. It would help to catch it when it is inside one of the solver's code. |
Yes, it's not yet supported. It can be supported if @rjungbeck can compile pypblib for Python 3.12. 🙂 |
May be a good task to automate it using GitHub Actions ;) |
I found pretty simple but not trivial hard CNF borrowed from https://github.com/arminbiere/satch repository and this error now is reproducable: (.venv) PS C:\Workspace\References\pysathq-pysat\examples> python.exe lbx.py -d -e all -s glucose3 -vv .\cnfs\prime65537.cnf
(.venv) PS C:\Workspace\References\pysathq-pysat\examples> $LASTEXITCODE
-1073741784 On NT family (Windows) it means:
in WSL2 (Ubuntu 22.04 LTS) similar situation took: du@ABC:/mnt/c/Workspace/References/pysathq-pysat/examples$ python3 lbx.py -d -e all -s glucose3 -vv cnfs/prime65537.cnf
^CTraceback (most recent call last):
File "/mnt/c/Workspace/References/pysathq-pysat/examples/lbx.py", line 584, in <module>
for i, mcs in enumerate(mcsls.enumerate()):
File "/mnt/c/Workspace/References/pysathq-pysat/examples/lbx.py", line 301, in enumerate
mcs = self.compute()
File "/mnt/c/Workspace/References/pysathq-pysat/examples/lbx.py", line 285, in compute
self._compute()
File "/mnt/c/Workspace/References/pysathq-pysat/examples/lbx.py", line 408, in _compute
if self.oracle.solve(assumptions=self.ss_assumps + self.bb_assumps + [self.setd[i]]):
File "/home/du/.local/lib/python3.10/site-packages/pysat/solvers.py", line 564, in solve
return self.solver.solve(assumptions)
File "/home/du/.local/lib/python3.10/site-packages/pysat/solvers.py", line 2827, in solve
self.status = pysolvers.glucose3_solve(self.glucose, assumptions,
pysolvers.error: Caught keyboard interrupt
du@ABC:/mnt/c/Workspace/References/pysathq-pysat/examples$ echo $?
1 |
Thanks, @dulanov. It would be nice to understand what causes this on Windows. I understand it may be difficult. |
I'd guess the signal handler code is buggy on Windows and crashes. The easiest way to track the issue down if being able to reproduce it is to attach a debugger and see where it crashes. It could potentially be a concurrent programming issue and exist on Linux as well just be highly unlikely there by how signal handler is scheduled. |
Also this looks like a bug in glucose from the Python stack trace. |
Interesting, will try with more fresh Glucose (4+) and also if you right, |
The build process itself is automated via AppVeyor (similar to Github Axctions) . But it requires that the new target environment (Python 3.12) is configured in appveyor.yml, which must be done manually. |
I have made a Python 3.12 version of pypblib (https://github.com/rjungbeck/pypblib/releases/download/pypblib-v1.0.439/pypblib-0.0.4-cp312-cp312-win_amd64.wh) |
Thanks so much, @rjungbeck! I've now included this configuration in |
Error is reproducable with all solvers including minisat (by default). I found a place in (.venv) PS C:\Workspace\References\pysathq-pysat> python.exe .\examples\lbx.py -d -e all -s glucose3 -vv .\examples\cnfs\prime65537.cnf
Traceback (most recent call last):
File "C:\Workspace\References\pysathq-pysat\examples\lbx.py", line 584, in <module>
for i, mcs in enumerate(mcsls.enumerate()):
File "C:\Workspace\References\pysathq-pysat\examples\lbx.py", line 301, in enumerate
mcs = self.compute()
^^^^^^^^^^^^^^
File "C:\Workspace\References\pysathq-pysat\examples\lbx.py", line 285, in compute
self._compute()
File "C:\Workspace\References\pysathq-pysat\examples\lbx.py", line 408, in _compute
if self.oracle.solve(assumptions=self.ss_assumps + self.bb_assumps + [self.setd[i]]):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Workspace\References\pysathq-pysat\.venv\Lib\site-packages\python_sat-0.1.8.dev12-py3.12-win-amd64.egg\pysat\solvers.py", line 564, in solve
return self.solver.solve(assumptions)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Workspace\References\pysathq-pysat\.venv\Lib\site-packages\python_sat-0.1.8.dev12-py3.12-win-amd64.egg\pysat\solvers.py", line 2827, in solve
self.status = pysolvers.glucose3_solve(self.glucose, assumptions,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
(.venv) PS C:\Workspace\References\pysathq-pysat> $LASTEXITCODE
-1073741510 where exit Code Without this change (.venv) PS C:\Workspace\References\pysathq-pysat> python.exe .\examples\lbx.py -d -e all -s glucose4 -vv .\examples\cnfs\prime65537.cnf
(.venv) PS C:\Workspace\References\pysathq-pysat> $LASTEXITCODE
-1073741784 P.S. File |
Disabling the signal handler causes a pass through but it's not a good solution at all. Obviously something with the use of PyOS_setsig is the issue. One known issue is use of import_array() for numpy. Apparently it sets up signal handlers and can throw off the expected behavior. With a simple debug build and debugging session this is an easy solve. Just pinpoint the root cause of the crash and it won't be hard to back track abd figure out the code issue. So this discussion seems a bit silly. Disabling the signal handler is not a fix. This should be a portable Windows supported API. Some small detail is misimplemented. It's likely buggy on Linux too, it's just there for whatever reason it doesn't crash. This is with high likelihood a universal bug from a clean coding practice standpoint. Microsoft tends to be more standards compliant these days than anyone else. So stability on Windows or MSVC means stability on Linux or GCC. But the reverse is never true. In fact, despite all the hate towards Microsoft and praise towards Linux over the years, the best programmers are invariably always the ones who can write portable code. So again to conclude, this project has poor signal handling code which requires manual debugging to solve. |
@GregoryMorse I ended up in a rabbit hole ;) Apparently, we are dealing with the standard problem of processing kernel signals using [1] https://stackoverflow.com/questions/7334595/longjmp-out-of-signal-handler |
Very nice find. So undefined behavior seems to work on Linux not on Windows. That is why I often consider source quality as higher when a shared source works on both which is the case for this repo. So it should be redesigned to not use long jump it sounds like. I generally think setting a flag in the signal handler and periodically checking the flag in all potentially long running operations is the safest way to go. |
@alexeyignatiev looks like wrong usage of |
Thanks for the finding, @dulanov. The question is why I decided to replace POSIX functions that I first used with C standard library alternative ones. 🙂 May 2018 was so long ago (even before a working Windows version) that it looks hard to understand now but there must have been a justification. |
Any time Ctrl+C is used to interrupt running code, it will throw a
KeyboardInterrupt
. But something is going on after doing SAT solving which causes it to not work all of the time and instead just exit the interpreter. Either this indicates Ctrl+C is being explicitly caught and inappropriately dealt with or some type of cleanup code is probably cause a crash.Obviously important is that this is being run on Windows as Linux can have different behavior in this area as it uses a signal handling system which is a bit different. My event viewer clearly shows something is causing crashes and it never happens if I do not load pysat:
Since 0xc0000028 is a stack corruption issue, and everything else I am using is pure python, it implies that the compiled C code is responsible for this... Python 3.8 almost never crashes even under absurd memory load, etc.
I can attach a debugger and get a stack trace shortly as it will help narrow down which C code is the culprit.
This is not particularly helpful beyond showing it definitely is some type of memory mismanagement in the object deletion chain for LBX. Obviously something is not cleaning itself up properly and causing crashes.
The text was updated successfully, but these errors were encountered: