Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BSK not yet compatible with python 3.12 #765

Open
schaubh opened this issue Aug 17, 2024 · 7 comments · May be fixed by #899
Open

BSK not yet compatible with python 3.12 #765

schaubh opened this issue Aug 17, 2024 · 7 comments · May be fixed by #899
Assignees
Labels
enhancement New feature or request

Comments

@schaubh
Copy link
Contributor

schaubh commented Aug 17, 2024

Describe your use case
Python 3.12 is not yet supported. Need to work out updates to make Basilisk compatible with 3.12 and retain prior python compatibility.

@schaubh schaubh added the enhancement New feature or request label Aug 17, 2024
@schaubh schaubh self-assigned this Aug 17, 2024
@schaubh schaubh added this to Basilisk Aug 17, 2024
@sassy-asjp
Copy link
Contributor

I suspected the segfault under python3.12 was related to how SysModel is set up in SWIG separately for cSysModel and sysModel modules and hacked something together to test my suspicion. This diff fixes the segfault in python3.12, but isn't a proper thing that can be merged.

I'm not actively working on python3.12 support, but if anyone is, I'm sharing this snippet in case it is useful.

diff --git a/pyproject.toml b/pyproject.toml
index fe1a90f27..811e56264 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -14,7 +14,7 @@ requires = [
 [project]
 name        = 'Basilisk'
 dynamic     = ["version", "dependencies", "optional-dependencies"]
-requires-python = ">=3.8, <3.12"
+requires-python = ">=3.8, <3.13"
 
 readme      = "README.md"
 license     = {file = "LICENSE"}
diff --git a/src/architecture/_GeneralModuleFiles/py_sys_model.i b/src/architecture/_GeneralModuleFiles/py_sys_model.i
deleted file mode 100644
index 71b4a0b5e..000000000
--- a/src/architecture/_GeneralModuleFiles/py_sys_model.i
+++ /dev/null
@@ -1,57 +0,0 @@
-
-%module(directors="1") sysModel
-%{
-   #include "sys_model.h"
-%}
-
-%pythoncode %{
-import sys
-import traceback
-from Basilisk.architecture.swig_common_model import *
-%}
-%include "std_string.i"
-%include "swig_conly_data.i"
-%include "architecture/utilities/bskLogging.h"
-
-%feature("director") SysModel;
-%feature("pythonappend") SysModel::SysModel %{
-    self.__super_init_called__ = True%}
-%rename("_SysModel") SysModel;
-%include "sys_model.i"
-
-%pythoncode %{
-class SuperInitChecker(type):
-
-    def __call__(cls, *a, **kw):
-        rv = super(SuperInitChecker, cls).__call__(*a, **kw)
-        if not getattr(rv, "__super_init_called__", False):
-            error_msg = (
-               "Need to call parent __init__ in SysModel subclasses:\n"
-               f"class {cls.__name__}(sysModel.SysModel):\n"
-               "    def __init__(...):\n"
-               "        super().__init__()"
-            )
-            raise SyntaxError(error_msg)
-        return rv
-
-def logError(func):
-    """Decorator that prints any exceptions that happen when
-    the original function is called, and then raises them again."""
-    def inner(*arg, **kwargs):
-        try:
-            return func(*arg, **kwargs)
-        except Exception:
-            traceback.print_exc()
-            raise
-    return inner
-
-class SysModel(_SysModel, metaclass=SuperInitChecker):
-    bskLogger: BSKLogger = None
-
-    def __init_subclass__(cls):
-        # Make it so any exceptions in UpdateState and Reset
-        # print any exceptions before returning control to
-        # C++ (at which point exceptions will crash the program)
-        cls.UpdateState = logError(cls.UpdateState)
-        cls.Reset = logError(cls.Reset)
-%}
diff --git a/src/architecture/_GeneralModuleFiles/sys_model.i b/src/architecture/_GeneralModuleFiles/sys_model.i
index 63cf55ed7..639ea34cc 100644
--- a/src/architecture/_GeneralModuleFiles/sys_model.i
+++ b/src/architecture/_GeneralModuleFiles/sys_model.i
@@ -1,5 +1,5 @@
 
-%module cSysModel
+%module(directors="1") sysModel
 %{
    #include "sys_model.h"
 %}
@@ -18,6 +18,8 @@ from Basilisk.architecture.swig_common_model import *
 from typing import Union, Iterable
 %}
 
+%feature("director") SysModel;
+
 %extend SysModel
 {
     %pythoncode %{

@schaubh
Copy link
Contributor Author

schaubh commented Oct 17, 2024

Thanks for sharing your insight @sassy-asjp .

@sassy-asjp
Copy link
Contributor

@schaubh

I did a bit more reading and noticed https://stackoverflow.com/questions/78200321/python3-12-c-api-segfaults-with-openmp which seemed similar to the segfault we get. The director feature in SWIG likely is and was not handling the GIL correctly, which was revealed by GIL changes in python3.12. I tried changing the name of SelfInit to just S and the failing function succeeded, but crashes soon after.

The fix has to be in SWIG since it's in the auto-generated code using the director feature.

I think my original hunch was off base, and just happened to fix the segfault by breaking director feature entirely, causing the problematic code sections to never be reached in the first place.

@schaubh
Copy link
Contributor Author

schaubh commented Dec 27, 2024

Thanks for the additional insight @sassy-asjp!

@sassy-asjp
Copy link
Contributor

@schaubh

It seems like the problem is directors AND threads. I hacked out the thread functionality and this diff passes all tests though isn't something we can merge.

diff --git a/pyproject.toml b/pyproject.toml
index 89bddb0e3..b55fa0f06 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -14,7 +14,7 @@ requires = [
 [project]
 name        = 'Basilisk'
 dynamic     = ["version", "dependencies"]
-requires-python = ">=3.8, <3.12"
+requires-python = ">=3.8, <3.13"
 
 readme      = "README.md"
 license     = {file = "LICENSE"}
diff --git a/src/architecture/system_model/sim_model.cpp b/src/architecture/system_model/sim_model.cpp
index d7301dae5..593a11b2e 100755
--- a/src/architecture/system_model/sim_model.cpp
+++ b/src/architecture/system_model/sim_model.cpp
@@ -20,40 +20,6 @@
 #include "sim_model.h"
 #include <iostream>
 
-void activateNewThread(void *threadData)
-{
-
-    auto *theThread = static_cast<SimThreadExecution*> (threadData);
-
-    //std::cout << "Starting thread yes" << std::endl;
-    theThread->postInit();
-
-    while(theThread->threadValid())
-    {
-        theThread->lockThread();
-        if(theThread->selfInitNow){
-            theThread->selfInitProcesses();
-            theThread->selfInitNow = false;
-        }
-        else if(theThread->crossInitNow){
-            theThread->crossInitProcesses();
-            theThread->crossInitNow = false;
-        }
-        else if(theThread->resetNow){
-            theThread->resetProcesses();
-            theThread->resetNow = false;
-        }
-        else{
-            theThread->StepUntilStop();
-        }
-        //std::cout << "Stepping thread"<<std::endl;
-        theThread->unlockParent();
-
-    }
-    //std::cout << "Killing thread" << std::endl;
-
-}
-
 SimThreadExecution::SimThreadExecution(uint64_t threadIdent, uint64_t currentSimNanos) :
     currentThreadNanos(currentSimNanos), threadID(threadIdent)
 {
@@ -176,11 +142,6 @@ void SimThreadExecution::moveProcessMessages() const {
  @return void
  */
 void SimThreadExecution::waitOnInit() {
-    std::unique_lock<std::mutex> lck(this->initReadyLock);
-    while(!this->threadActive())
-    {
-        (this)->initHoldVar.wait(lck);
-    }
 }
 
 /*! This method allows the startup activities to alert the parent thread once
@@ -189,9 +150,6 @@ void SimThreadExecution::waitOnInit() {
  @return void
  */
 void SimThreadExecution::postInit() {
-    std::unique_lock<std::mutex> lck(this->initReadyLock);
-    this->threadReady();
-    this->initHoldVar.notify_one();
 }
 
 /*! This method is used by the "child" thread to walk through all of its tasks
@@ -275,7 +233,7 @@ void SimModel::StepUntilStop(uint64_t SimStopTime, int64_t stopPri)
         simThread->stopThreadNanos = SimStopTime;
         simThread->stopThreadPriority = stopPri;
         if(simThread->procCount() > 0) {
-            simThread->unlockThread();
+            simThread->StepUntilStop();
         }
     }
     this->NextTaskTime = (uint64_t) ~0;
@@ -283,7 +241,6 @@ void SimModel::StepUntilStop(uint64_t SimStopTime, int64_t stopPri)
     for(auto const& simThread : this->threadList)
     {
         if(simThread->procCount() > 0) {
-            simThread->lockParent();
             this->NextTaskTime = simThread->NextTaskTime < this->NextTaskTime ?
                                  simThread->NextTaskTime : this->NextTaskTime;
             this->CurrentNanos = simThread->CurrentNanos < this->CurrentNanos ?
@@ -322,11 +279,7 @@ void SimModel::selfInitSimulation()
 {
     for(auto const& simThread : this->threadList)
     {
-        simThread->selfInitNow = true;
-        simThread->unlockThread();
-    }
-    for(auto const& simThread : this->threadList) {
-        simThread->lockParent();
+        simThread->selfInitProcesses();
     }
     this->NextTaskTime = 0;
     this->CurrentNanos = 0;
@@ -340,17 +293,9 @@ void SimModel::selfInitSimulation()
  */
 void SimModel::resetInitSimulation() const
 {
-
-
     for(auto const& simThread : this->threadList)
     {
-        simThread->resetNow = true;
-        simThread->unlockThread();
-    }
-    for(auto const& simThread : this->threadList)
-    {
-        simThread->lockParent();
-
+        simThread->resetProcesses();
     }
 }
 
@@ -464,8 +409,6 @@ void SimModel::resetThreads(uint64_t threadCount)
 void SimModel::deleteThreads() {
     for(auto const& simThread : this->threadList)
     {
-        simThread->killThread();
-        simThread->unlockThread();
         if(simThread->threadContext && simThread->threadContext->joinable()) {
             simThread->threadContext->join();
             delete simThread->threadContext;
@@ -475,6 +418,9 @@ void SimModel::deleteThreads() {
     this->threadList.clear();
 }
 
+void dummy() {
+}
+
 /*! This method provides a seamless allocation of processes onto active threads
     for any processes that haven't already been placed onto a thread.  If the
     user has allocated N threads, this method just walks through those threads
@@ -505,8 +451,7 @@ void SimModel::assignRemainingProcs() {
         simThread->nextProcPriority = (*it)->processPriority;
         simThread->NextTaskTime = 0;
         simThread->CurrentNanos = 0;
-        //simThread->lockThread();
-        simThread->threadContext = new std::thread(activateNewThread, simThread);
+        simThread->threadContext = new std::thread(dummy);
     }
     for(auto const& simThread : this->threadList)
     {

I guess moving forward there are really two courses of action:

  • Properly remove the thread functionality now, and consider reimplementing thread functionality whenever SWIG is fixed.
  • Wait on SWIG bug fix.

I'll see if I can find some time to create a minimal example showing the bug and post it to my bug ticket with SWIG, but no one has said anything on my ticket since I opened it so idk what the timeline on fixing the SWIG bug would be.

@schaubh
Copy link
Contributor Author

schaubh commented Jan 8, 2025

howdy @sassy-asjp , thanks for the great update. This certainly helps narrow down what could be going on with Python 3.12 and BSK. I did try running the fastest SWIG 4.3 and it didn't make a difference. Their release notes claim to be supporting Python 3.12 and the in development 3.13 at this point.

Knowing that the issue is with the multi-threading is great to know. To paraphrase your message above, you don't see anything wrong with the code we are using, there appears to be an issue with SWIG itself. If you come across a work-around idea let me know. I didn't write this multi-threading code, but I'm glad to reach out to him to get his input and support. At this point we might wait and see if the SWIG community responds to your bug report?

@sassy-asjp
Copy link
Contributor

I poked around trying to manually take the Python GIL before potential director calls, and realized that we have been using multithreading incorrectly in a subtle way.

SWIG modules were built without multithreading feature, which means that SWIG will hold onto the Python GIL through the entire native procedure call. I'm not sure what the original justification was, or whether it was a mistake, but multithreading feature adds extra synchronization code which can reduce performance and might have seemed unnecessary. Basilisk's use of multithreading is only for multithreaded computation (not waiting on disk, etc.) and none of the code messes with the GIL (no risk of deadlock).

This originally worked for those reasons, however changes in Python 3.12 likely to support the per interpreter GIL feature or GIL disabled builds. I turned on the threads feature on the relevant modules and it seems to work.

@sassy-asjp sassy-asjp linked a pull request Jan 11, 2025 that will close this issue
@schaubh schaubh linked a pull request Jan 11, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

2 participants