Refactor interactive task scheduler #798

jan-janssen · 2025-08-31T14:42:18Z

Summary by CodeRabbit

New Features
- Optional caching for task results to avoid re-computation.
- Enhanced error logging and optional object-size logging for diagnostics.
- More flexible worker configuration (cores, worker ID, hostname override).
Refactor
- Streamlined interactive execution to a thread-based flow for improved responsiveness and stability.
- Improved task completion signaling and graceful shutdown behavior.
- Consolidated single-task execution API for simpler integration.
Tests
- Updated tests to use the new multi-task execution helper and adjusted imports accordingly.

coderabbitai · 2025-08-31T14:42:24Z

Walkthrough

Introduces _execute_multiple_tasks in blockallocation and rewires BlockAllocationTaskScheduler threads to it. Reworks onetoone to thread-based single-task execution via execute_task_dict using an interface bootstrapped by spawner utilities. Simplifies shared.py to a single execute_task_dict API and adds task_done. Tests updated to import/use _execute_multiple_tasks from blockallocation.

Changes

Cohort / File(s)	Summary
Interactive Block Allocation Execution Path `executorlib/task_scheduler/interactive/blockallocation.py`	Adds `_execute_multiple_tasks(...)` to run MPI-based tasks via interface_bootup/get_interactive_execute_command and execute_task_dict; handles shutdown and queue signaling with `task_done`; worker thread targets updated to use the new function; new imports added.
One-to-One Scheduler Refactor `executorlib/task_scheduler/interactive/onetoone.py`	Converts process-based single-task execution to thread-based; renames `_execute_task_in_separate_process` → `_execute_single_task`; adds `_execute_task_in_thread` that builds an interface and calls `execute_task_dict`; adjusts thread targets and imports; adds caching/logging/worker_id params.
Shared Execution API Simplification `executorlib/task_scheduler/interactive/shared.py`	Removes multi-task orchestration; exposes `execute_task_dict(...)` as the sole public API (with/without cache branches); requires a provided interface; adds `task_done(future_queue)`; removes interface bootstrapping and old helpers.
Tests: Import and usage updates `tests/test_fluxpythonspawner.py`, `tests/test_mpiexecspawner.py`, `tests/test_singlenodeexecutor_shell_executor.py`, `tests/test_singlenodeexecutor_shell_interactive.py`	Replace imports/calls of `execute_multiple_tasks` with `blockallocation._execute_multiple_tasks`; some calls now pass `spawner` and `init_function`; usage pattern otherwise unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Q as FutureQueue
  participant WT as WorkerThread (_execute_multiple_tasks)
  participant S as Spawner
  participant IF as Interface
  participant T as Task (execute_task_dict)

  Q->>WT: get() task_dict or shutdown
  alt task_dict contains fn/future
    WT->>S: create connections (cores, kwargs)
    WT->>IF: interface_bootup(get_interactive_execute_command, connections)
    WT->>T: execute_task_dict(task_dict, interface, cache/log opts)
    T-->>WT: result/exception set on future
    WT->>Q: task_done()
    WT->>IF: shutdown on error
  else shutdown signal
    WT->>IF: shutdown(wait)
    WT->>Q: task_done()
    opt queue_join_on_shutdown
      WT->>Q: join()
    end
    WT-->>WT: exit
  end

sequenceDiagram
  autonumber
  participant Sched as OneProcessTaskScheduler
  participant Th as Thread (_execute_task_in_thread)
  participant S as Spawner
  participant IF as Interface
  participant Exec as execute_task_dict

  Sched->>Th: start with task_dict, cores, opts
  Th->>S: create connections (cores, kwargs)
  Th->>IF: interface_bootup(get_interactive_execute_command, connections)
  Th->>Exec: execute_task_dict(task_dict, interface, cache/log opts)
  Exec-->>Th: set future result/exception
  Th->>IF: shutdown on error

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Execute single task #796 — Touches the same interactive scheduler flow, changing worker entrypoints and threading; overlaps with this refactor.
Interactive: refactor task done #795 — Adjusts task completion signaling and queue handling, related to the new task_done usage.
Rename Interface to Spawner #398 — Refactors toward spawner-based interfaces; aligns with the new interface_bootup/get_interactive_execute_command wiring here.

Suggested reviewers

liamhuber

Poem

I thump my paw: new threads awake,
From queue to spawner, clean paths we take.
One task, one hop—no fork to spawn,
Cached carrots saved before the dawn.
With tidy queues and futures done,
I bunny-hop—another run! 🥕✨

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

for more information, see https://pre-commit.ci

codecov · 2025-08-31T14:47:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.75%. Comparing base (9b4f6a7) to head (3b7b8d3).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #798   +/-   ##
=======================================
  Coverage   97.74%   97.75%           
=======================================
  Files          32       32           
  Lines        1466     1468    +2     
=======================================
+ Hits         1433     1435    +2     
  Misses         33       33

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

executorlib/task_scheduler/interactive/shared.py (2)

58-65: Do not shut down the interface on per-task exceptions; it kills the worker for subsequent tasks

Calling interface.shutdown() here tears down the long-lived worker. After one task fails, the thread keeps looping with a dead interface and future tasks will fail/hang. Let the backend keep running and only set the future’s exception.

Apply:
@@ def _execute_task_without_cache(interface: SocketInterface, task_dict: dict):
-        except Exception as thread_exception:
-            interface.shutdown(wait=True)
-            f.set_exception(exception=thread_exception)
+        except Exception as thread_exception:
+            f.set_exception(exception=thread_exception)
@@ def _execute_task_with_cache(
-            except Exception as thread_exception:
-                interface.shutdown(wait=True)
-                f.set_exception(exception=thread_exception)
+            except Exception as thread_exception:
+                f.set_exception(exception=thread_exception)
Also applies to: 104-106

94-95: Cache hit detection should use an existence check, not membership in a list

Comparing absolute paths to the result of get_cache_files() is brittle. Use os.path.isfile for correctness and performance.
-    if file_name not in get_cache_files(cache_directory=cache_directory):
+    if not os.path.isfile(file_name):

executorlib/task_scheduler/interactive/blockallocation.py (2)

85-92: Infinite loop when shrinking max_workers

Filtering alive threads in a tight while without waiting never reduces len(self._process) until threads actually exit; result: busy-spin/hang.

-            if self._max_workers > max_workers:
-                for _ in range(self._max_workers - max_workers):
-                    self._future_queue.queue.insert(0, {"shutdown": True, "wait": True})
-                while len(self._process) > max_workers:
-                    self._process = [
-                        process for process in self._process if process.is_alive()
-                    ]
+            if self._max_workers > max_workers:
+                # Ask extra workers to exit
+                for _ in range(self._max_workers - max_workers):
+                    self._future_queue.queue.insert(0, {"shutdown": True, "wait": True})
+                # Wait until the desired number of workers remain
+                from time import sleep
+                while True:
+                    alive = [p for p in self._process if p.is_alive()]
+                    if len(alive) <= max_workers:
+                        self._process = alive
+                        break
+                    sleep(0.05)

93-101: New workers do not receive a worker_id

Existing threads get worker_id in init, but workers added later don’t. Pass stable IDs for observability and resource distribution.

-                new_process_lst = [
-                    Thread(
-                        target=_execute_multiple_tasks,
-                        kwargs=self._process_kwargs,
-                    )
-                    for _ in range(max_workers - self._max_workers)
-                ]
+                start_id = self._max_workers
+                new_process_lst = [
+                    Thread(
+                        target=_execute_multiple_tasks,
+                        kwargs=self._process_kwargs | {"worker_id": start_id + i},
+                    )
+                    for i in range(max_workers - self._max_workers)
+                ]

executorlib/task_scheduler/interactive/onetoone.py (1)

218-269: Leak: spawned interfaces are never shut down in _execute_task_in_thread.

A new interface is booted per task but not closed, risking orphaned subprocesses and leaked resources. Ensure shutdown in a finally block.

Apply this diff:

 def _execute_task_in_thread(
@@
-    execute_task_dict(
-        task_dict=task_dict,
-        interface=interface_bootup(
-            command_lst=get_interactive_execute_command(
-                cores=cores,
-            ),
-            connections=spawner(cores=cores, **kwargs),
-            hostname_localhost=hostname_localhost,
-            log_obj_size=log_obj_size,
-            worker_id=worker_id,
-        ),
-        cache_directory=cache_directory,
-        cache_key=cache_key,
-        error_log_file=error_log_file,
-    )
+    interface = interface_bootup(
+        command_lst=get_interactive_execute_command(cores=cores),
+        connections=spawner(cores=cores, **kwargs),
+        hostname_localhost=hostname_localhost,
+        log_obj_size=log_obj_size,
+        worker_id=worker_id,
+    )
+    try:
+        execute_task_dict(
+            task_dict=task_dict,
+            interface=interface,
+            cache_directory=cache_directory,
+            cache_key=cache_key,
+            error_log_file=error_log_file,
+        )
+    finally:
+        # Ensure spawned backend is always torn down
+        interface.shutdown(wait=True)

🧹 Nitpick comments (11)

executorlib/task_scheduler/interactive/shared.py (4)
108-110: Ensure Future state transitions are consistent on cache hits

Call set_running_or_notify_cancel() before set_result() for symmetry with the no-cache path.
-        future = task_dict["future"]
-        future.set_result(result)
+        future = task_dict["future"]
+        if not future.done() and future.set_running_or_notify_cancel():
+            future.set_result(result)
49-65: Avoid mutating the incoming task_dict; send a payload without 'future'

Popping 'future' only on the cache-miss path makes mutation inconsistent and risks leaking internal keys to the backend if code changes. Build a payload containing only allowed keys.
-    f = task_dict.pop("future")
+    f = task_dict["future"]
+    payload = {k: task_dict[k] for k in ("fn", "args", "kwargs") if k in task_dict}
@@
-            f.set_result(interface.send_and_receive_dict(input_dict=task_dict))
+            f.set_result(interface.send_and_receive_dict(input_dict=payload))
And analogously in the cache-miss block of _execute_task_with_cache.

Also applies to: 86-103

25-29: Docstring default is misleading for cache_directory

Signature defaults to None (no caching), but the docstring says Defaults to "executorlib_cache". Please align the docstring.
-        cache_directory (str, optional): The directory to store cache files. Defaults to "executorlib_cache".
+        cache_directory (str, optional): If provided, directory to store cache files; otherwise no caching.
44-46: Silently suppressing task_done() errors can hide bugs

Suppressing ValueError avoids crashes but can mask double-acks. Consider logging once when suppression occurs.
executorlib/task_scheduler/interactive/blockallocation.py (1)

196-218: Docstring nit: “single tasks” → “tasks from a queue”

Minor wording fix.

tests/test_mpiexecspawner.py (1)

13-13: Import/call-site updates look correct; consider avoiding private API in tests

Switching to _execute_multiple_tasks matches the refactor. Relying on a private name in tests is brittle; consider re-exporting a public alias (e.g., execute_multiple_tasks) for test use.

Also applies to: 263-269, 445-450, 461-466, 477-482, 495-500, 518-524, 537-543

tests/test_singlenodeexecutor_shell_executor.py (1)

8-8: Aligned with new entrypoint; same note on private API

The import and invocations are correct. Prefer a public alias over importing a leading-underscore function.

Also applies to: 35-40, 61-66, 88-93

tests/test_singlenodeexecutor_shell_interactive.py (1)

9-9: Updated to new entrypoint and parameters; good

Including spawner and init_function is consistent with the new signature. Same caveat about private API imports.

Also applies to: 91-97

tests/test_fluxpythonspawner.py (1)

114-119: Minor: Avoid redundant queue joins in tests.

_execute_multiple_tasks already joins the queue on shutdown by default (queue_join_on_shutdown=True). The extra q.join() right after the call is redundant.

Option A: Keep as-is (harmless).

Option B: Pass queue_join_on_shutdown=False and keep q.join().

Option C (cleanest): Keep default and drop the trailing q.join().

Also applies to: 129-135
executorlib/task_scheduler/interactive/onetoone.py (2)
71-99: Docstring and shutdown semantics nit.

Wording: “Execute a single tasks…” → “Execute tasks…”.

Consider mirroring blockallocation’s queue_join_on_shutdown flag for consistency and test control.
-    Execute a single tasks in parallel using the message passing interface (MPI).
+    Execute tasks in parallel using the message passing interface (MPI).
210-216: Wrapper now targets a thread function — name is misleading.

_wrap_execute_task_in_separate_process now starts a thread; consider renaming for clarity (e.g., _submit_task_threaded) in a follow-up to reduce confusion.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9b4f6a7 and 3b7b8d3.

📒 Files selected for processing (7)

executorlib/task_scheduler/interactive/blockallocation.py (4 hunks)
executorlib/task_scheduler/interactive/onetoone.py (3 hunks)
executorlib/task_scheduler/interactive/shared.py (2 hunks)
tests/test_fluxpythonspawner.py (3 hunks)
tests/test_mpiexecspawner.py (8 hunks)
tests/test_singlenodeexecutor_shell_executor.py (4 hunks)
tests/test_singlenodeexecutor_shell_interactive.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (7)

tests/test_singlenodeexecutor_shell_executor.py (1)

executorlib/task_scheduler/interactive/blockallocation.py (1)

_execute_multiple_tasks (182-250)

tests/test_mpiexecspawner.py (1)

executorlib/task_scheduler/interactive/blockallocation.py (2)

BlockAllocationTaskScheduler (18-179)

_execute_multiple_tasks (182-250)

tests/test_fluxpythonspawner.py (1)

executorlib/task_scheduler/interactive/blockallocation.py (2)

BlockAllocationTaskScheduler (18-179)

_execute_multiple_tasks (182-250)

executorlib/task_scheduler/interactive/blockallocation.py (5)

executorlib/standalone/command.py (1)

get_interactive_execute_command (94-116)

executorlib/standalone/inputcheck.py (2)

check_resource_dict (63-71)

check_resource_dict_is_empty (74-81)

executorlib/standalone/interactive/communication.py (3)

interface_bootup (156-204)

send_dict (46-57)

shutdown (120-134)

executorlib/standalone/interactive/spawner.py (4)

BaseSpawner (9-59)

MpiExecSpawner (141-158)

shutdown (42-49)

shutdown (117-129)

executorlib/task_scheduler/interactive/shared.py (2)

execute_task_dict (11-41)

task_done (44-46)

executorlib/task_scheduler/interactive/shared.py (2)

executorlib/standalone/interactive/communication.py (1)

SocketInterface (14-153)

executorlib/standalone/serialize.py (1)

serialize_funct (31-85)

tests/test_singlenodeexecutor_shell_interactive.py (1)

executorlib/task_scheduler/interactive/blockallocation.py (1)

_execute_multiple_tasks (182-250)

executorlib/task_scheduler/interactive/onetoone.py (4)

executorlib/standalone/command.py (1)

get_interactive_execute_command (94-116)

executorlib/standalone/interactive/communication.py (1)

interface_bootup (156-204)

executorlib/standalone/interactive/spawner.py (2)

BaseSpawner (9-59)

MpiExecSpawner (141-158)

executorlib/task_scheduler/interactive/shared.py (1)

execute_task_dict (11-41)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: unittest_old
GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
GitHub Check: unittest_flux_openmpi
GitHub Check: unittest_openmpi (ubuntu-latest, 3.11)
GitHub Check: unittest_openmpi (macos-latest, 3.13)
GitHub Check: unittest_openmpi (ubuntu-22.04-arm, 3.13)
GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
GitHub Check: unittest_openmpi (ubuntu-24.04-arm, 3.13)
GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
GitHub Check: unittest_win
GitHub Check: unittest_slurm_mpich
GitHub Check: notebooks_integration

🔇 Additional comments (4)

executorlib/task_scheduler/interactive/blockallocation.py (1)

69-71: Switching worker target to _execute_multiple_tasks looks good

The new entrypoint is clearer and keeps orchestration close to block allocation.

tests/test_fluxpythonspawner.py (1)

8-8: Import update to new internal API looks correct.

Switching to BlockAllocationTaskScheduler and _execute_multiple_tasks from blockallocation matches the refactor intent.

executorlib/task_scheduler/interactive/onetoone.py (2)

5-6: New imports align with the thread-based execution path.

get_interactive_execute_command and interface_bootup are the right dependencies for per-task interface bootstrapping.

65-67: Switching scheduler target to _execute_single_task is consistent.

Thread target rename is OK and matches the new control flow.

coderabbitai · 2025-08-31T14:52:29Z

executorlib/task_scheduler/interactive/blockallocation.py

+def _execute_multiple_tasks(
+    future_queue: queue.Queue,
+    cores: int = 1,
+    spawner: type[BaseSpawner] = MpiExecSpawner,
+    hostname_localhost: Optional[bool] = None,
+    init_function: Optional[Callable] = None,
+    cache_directory: Optional[str] = None,
+    cache_key: Optional[str] = None,
+    queue_join_on_shutdown: bool = True,
+    log_obj_size: bool = False,
+    error_log_file: Optional[str] = None,
+    worker_id: Optional[int] = None,
+    **kwargs,
+) -> None:
+    """
+    Execute a single tasks in parallel using the message passing interface (MPI).
+
+    Args:
+       future_queue (queue.Queue): task queue of dictionary objects which are submitted to the parallel process
+       cores (int): defines the total number of MPI ranks to use
+       spawner (BaseSpawner): Spawner to start process on selected compute resources
+       hostname_localhost (boolean): use localhost instead of the hostname to establish the zmq connection. In the
+                                     context of an HPC cluster this essential to be able to communicate to an
+                                     Executor running on a different compute node within the same allocation. And
+                                     in principle any computer should be able to resolve that their own hostname
+                                     points to the same address as localhost. Still MacOS >= 12 seems to disable
+                                     this look up for security reasons. So on MacOS it is required to set this
+                                     option to true
+       init_function (Callable): optional function to preset arguments for functions which are submitted later
+       cache_directory (str, optional): The directory to store cache files. Defaults to "executorlib_cache".
+       cache_key (str, optional): By default the cache_key is generated based on the function hash, this can be
+                                  overwritten by setting the cache_key.
+       queue_join_on_shutdown (bool): Join communication queue when thread is closed. Defaults to True.
+       log_obj_size (bool): Enable debug mode which reports the size of the communicated objects.
+       error_log_file (str): Name of the error log file to use for storing exceptions raised by the Python functions
+                             submitted to the Executor.
+       worker_id (int): Communicate the worker which ID was assigned to it for future reference and resource
+                        distribution.
+    """
+    interface = interface_bootup(


🛠️ Refactor suggestion

Harden worker against unexpected exceptions and ensure cleanup

If an exception escapes the loop, the remote process is left running. Wrap with try/except/finally to always shut down the interface.

def _execute_multiple_tasks( @@ - interface = interface_bootup( + interface = interface_bootup( @@ - if init_function is not None: - interface.send_dict( - input_dict={"init": True, "fn": init_function, "args": (), "kwargs": {}} - ) - while True: - task_dict = future_queue.get() - if "shutdown" in task_dict and task_dict["shutdown"]: - interface.shutdown(wait=task_dict["wait"]) - task_done(future_queue=future_queue) - if queue_join_on_shutdown: - future_queue.join() - break - elif "fn" in task_dict and "future" in task_dict: - execute_task_dict( - task_dict=task_dict, - interface=interface, - cache_directory=cache_directory, - cache_key=cache_key, - error_log_file=error_log_file, - ) - task_done(future_queue=future_queue) + try: + if init_function is not None: + interface.send_dict( + input_dict={"init": True, "fn": init_function, "args": (), "kwargs": {}} + ) + while True: + task_dict = future_queue.get() + if "shutdown" in task_dict and task_dict["shutdown"]: + interface.shutdown(wait=task_dict["wait"]) + task_done(future_queue=future_queue) + if queue_join_on_shutdown: + future_queue.join() + break + elif "fn" in task_dict and "future" in task_dict: + try: + execute_task_dict( + task_dict=task_dict, + interface=interface, + cache_directory=cache_directory, + cache_key=cache_key, + error_log_file=error_log_file, + ) + finally: + task_done(future_queue=future_queue) + except Exception: + # Best-effort cleanup if something unexpected happens + interface.shutdown(wait=True) + raise

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _execute_multiple_tasks(

future_queue: queue.Queue,

cores: int = 1,

spawner: type[BaseSpawner] = MpiExecSpawner,

hostname_localhost: Optional[bool] = None,

init_function: Optional[Callable] = None,

cache_directory: Optional[str] = None,

cache_key: Optional[str] = None,

queue_join_on_shutdown: bool = True,

log_obj_size: bool = False,

error_log_file: Optional[str] = None,

worker_id: Optional[int] = None,

**kwargs,

) -> None:

"""

Execute a single tasks in parallel using the message passing interface (MPI).

Args:

future_queue (queue.Queue): task queue of dictionary objects which are submitted to the parallel process

cores (int): defines the total number of MPI ranks to use

spawner (BaseSpawner): Spawner to start process on selected compute resources

hostname_localhost (boolean): use localhost instead of the hostname to establish the zmq connection. In the

context of an HPC cluster this essential to be able to communicate to an

Executor running on a different compute node within the same allocation. And

in principle any computer should be able to resolve that their own hostname

points to the same address as localhost. Still MacOS >= 12 seems to disable

this look up for security reasons. So on MacOS it is required to set this

option to true

init_function (Callable): optional function to preset arguments for functions which are submitted later

cache_directory (str, optional): The directory to store cache files. Defaults to "executorlib_cache".

cache_key (str, optional): By default the cache_key is generated based on the function hash, this can be

overwritten by setting the cache_key.

queue_join_on_shutdown (bool): Join communication queue when thread is closed. Defaults to True.

log_obj_size (bool): Enable debug mode which reports the size of the communicated objects.

error_log_file (str): Name of the error log file to use for storing exceptions raised by the Python functions

submitted to the Executor.

worker_id (int): Communicate the worker which ID was assigned to it for future reference and resource

distribution.

"""

interface = interface_bootup(

interface = interface_bootup(

cores=cores,

spawner=spawner,

hostname_localhost=hostname_localhost,

log_obj_size=log_obj_size,

worker_id=worker_id,

**kwargs,

)

try:

if init_function is not None:

interface.send_dict(

input_dict={"init": True, "fn": init_function, "args": (), "kwargs": {}}

)

while True:

task_dict = future_queue.get()

if "shutdown" in task_dict and task_dict["shutdown"]:

interface.shutdown(wait=task_dict["wait"])

task_done(future_queue=future_queue)

if queue_join_on_shutdown:

future_queue.join()

break

elif "fn" in task_dict and "future" in task_dict:

try:

execute_task_dict(

task_dict=task_dict,

interface=interface,

cache_directory=cache_directory,

cache_key=cache_key,

error_log_file=error_log_file,

)

finally:

task_done(future_queue=future_queue)

except Exception:

# Best‐effort cleanup if something unexpected happens

interface.shutdown(wait=True)

raise

🤖 Prompt for AI Agents

In executorlib/task_scheduler/interactive/blockallocation.py around lines 182 to 221, the function _execute_multiple_tasks can let exceptions escape the task loop and leave the remote interface/process running; wrap the main execution loop and any code that can raise in a try/except/finally so that exceptions are caught, logged to error_log_file (or re-raised after logging) and the interface is always cleaned up in the finally block (call interface.shutdown() and any spawner/cleanup routines), ensuring any threads/queues are joined and resources released even on error.

Refactor interactive task scheduler

916a91c

[pre-commit.ci] auto fixes from pre-commit.com hooks

3b7b8d3

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Aug 31, 2025

View reviewed changes

jan-janssen merged commit dcf3961 into main Aug 31, 2025
35 checks passed

jan-janssen deleted the refactor branch August 31, 2025 15:00

This was referenced Aug 31, 2025

Interactive: Separate future and taskdict #800

Merged

Interactive: Interrupt interface bootup when the executor is shutdown during bootup #801

Merged

Implement a fail save init function #804

Merged

Doc string fixes #805

Merged

coderabbitai bot mentioned this pull request Sep 8, 2025

Worker for overflow queue #763

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor interactive task scheduler #798

Refactor interactive task scheduler #798

Uh oh!

jan-janssen commented Aug 31, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 31, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

codecov bot commented Aug 31, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor interactive task scheduler #798

Refactor interactive task scheduler #798

Uh oh!

Conversation

jan-janssen commented Aug 31, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

codecov bot commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jan-janssen commented Aug 31, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 31, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Aug 31, 2025 •

edited

Loading