Fix slurm job executor example #784

jan-janssen · 2025-08-21T09:19:25Z

Summary by CodeRabbit

Documentation
- Updated HPC notebooks: aligned job example with the current Slurm job workflow, added a GPU availability demonstration, and improved formatting and captured example outputs for clarity.
Refactor
- Renamed an HPC executor in the public API to align terminology and usage; this may require updating imports and references in your code.

review-notebook-app · 2025-08-21T09:19:30Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

coderabbitai · 2025-08-21T09:19:31Z

Walkthrough

Replaced SlurmAllocationExecutor with SlurmJobExecutor in the HPC job notebook example and updated notebook formatting/outputs. Public API reflects the class rename from SlurmAllocationExecutor to SlurmJobExecutor in executorlib. Control flow and submit/result usage remain unchanged.

Changes

Cohort / File(s)	Summary
Notebook: Slurm job example `notebooks/3-hpc-job.ipynb`	Replaced `SlurmAllocationExecutor` with `SlurmJobExecutor` inside the context manager; submit and result handling unchanged.
Notebook: HPC cluster demo & formatting `notebooks/2-hpc-cluster.ipynb`	Reflowed cell metadata and sources, added execution outputs and a GPU sampling demo, reformatted code blocks and markdown for readability.
Public API rename (executorlib) `executorlib/*`	Renamed exported class `SlurmAllocationExecutor` → `SlurmJobExecutor` in the public API.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Notebook
  participant Executor as SlurmJobExecutor
  participant Scheduler as Slurm Scheduler
  participant Job as Compute Job

  User->>Notebook: run Slurm example
  Notebook->>Executor: with SlurmJobExecutor(...)
  Executor->>Scheduler: submit(command, resources)
  Scheduler->>Job: dispatch
  Job-->>Scheduler: finish (status, output)
  Scheduler-->>Executor: return job result
  Executor-->>Notebook: result
  Notebook-->>User: print result

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Fix bug in README #561 — Aligns example usages to SlurmJobExecutor; directly related to the notebook change.
Update Documentation #558 — Introduces/renames SlurmJobExecutor in codebase; matches the public API rename.
Move SLURM to separate module #528 — Refactors SLURM executors/backends, likely the precursor for this rename.

Poem

In the queue I nibble logs and run,
SlurmJob hops in, the change is done.
Allocation name now neatly shed,
Jobs submit, then curl in bed.
I twitch—results arrive, hooray! 🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch slurm_example_fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

codecov · 2025-08-21T09:21:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.67%. Comparing base (b3cf4c8) to head (6946ab6).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #784   +/-   ##
=======================================
  Coverage   97.67%   97.67%           
=======================================
  Files          33       33           
  Lines        1463     1463           
=======================================
  Hits         1429     1429           
  Misses         34       34

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

notebooks/3-hpc-job.ipynb (1)
45-49: Make the snippet copy-paste runnable (include import and tiny precondition note).

Readers often copy the fenced block directly; adding the import and a brief note avoids confusion.

Apply this minimal diff to the fenced example:
- with SlurmJobExecutor() as exe:
+ from executorlib import SlurmJobExecutor  # required when using this block standalone
+ # Note: run inside an existing SLURM allocation (e.g., via `salloc` or within an `sbatch` script)
+ with SlurmJobExecutor() as exe:
     future = exe.submit(sum, [1, 1])
     print(future.result())

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b3cf4c8 and dcbb5a5.

📒 Files selected for processing (1)

notebooks/3-hpc-job.ipynb (1 hunks)

🔇 Additional comments (1)

notebooks/3-hpc-job.ipynb (1)

46-46: Switch to SlurmJobExecutor is correct and consistent with the import.

The example now matches the import in the preceding code cell and reflects the current public API.

coderabbitai · 2025-08-21T09:27:23Z

notebooks/3-hpc-job.ipynb

    "```python\n",
-    "with SlurmAllocationExecutor() as exe:\n",
+    "with SlurmJobExecutor() as exe:\n",
    "    future = exe.submit(sum, [1, 1])\n",
    "    print(future.result())\n",
    "```"


💡 Verification agent

🧩 Analysis chain

Sanity-check for lingering references to the old class name across the repo.

Since this change is part of a rename/cleanup, verify no other notebooks/docs still mention SlurmAllocationExecutor.

🏁 Script executed:

#!/bin/bash # Find any remaining references to the old class name. rg -n -S "SlurmAllocationExecutor|SlurmAllocationExecutor\(" -C2 # Also check for mixed guidance in notebooks and docs mentioning both names. rg -n -S "SlurmJobExecutor" -g '!**/site/**' -C2

Length of output: 56702

Lingering test class name referencing SlurmAllocationExecutor needs update

The only remaining reference to the old class name was found in your test suite. Please rename the test class to align with the new executor name:

• tests/test_singlenodeexecutor_plot_dependency.py:156

- class TestSlurmAllocationExecutorWithDependencies(unittest.TestCase): + class TestSlurmJobExecutorWithDependencies(unittest.TestCase):

No other occurrences of SlurmAllocationExecutor were detected across notebooks, docs, or code. Please make this change to keep test names consistent with the renamed executor.

🤖 Prompt for AI Agents

In notebooks/3-hpc-job.ipynb around lines 45 to 49 there is a lingering reference to the old executor name; update the test class in tests/test_singlenodeexecutor_plot_dependency.py at line 156 to use the new executor name (replace SlurmAllocationExecutor with SlurmJobExecutor) so the test class name matches the renamed executor across the codebase.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (5)

notebooks/2-hpc-cluster.ipynb (5)
95-97: Avoid site-specific partition in example.

The hard-coded partition s.cmfe will confuse users. Prefer a placeholder or a generic partition.
-            "partition": "s.cmfe",
+            "partition": "<your_partition>",
158-165: Unify cache directory across examples and modernize cleanup code.

Examples alternate between ./file and ./cache. Use one path consistently, and improve the cleanup cell with pathlib + contextlib.suppress.
@@
-with FluxClusterExecutor(cache_directory="./file") as exe:
+with FluxClusterExecutor(cache_directory="./cache") as exe:
@@
-with FluxClusterExecutor(cache_directory="./file") as exe:
+with FluxClusterExecutor(cache_directory="./cache") as exe:
@@
-import os
-import shutil
-
-cache_dir = "./file"
-if os.path.exists(cache_dir):
-    print(os.listdir(cache_dir))
-    try:
-        shutil.rmtree(cache_dir)
-    except OSError:
-        pass
+from pathlib import Path
+from contextlib import suppress
+import shutil
+
+cache_dir = Path("./cache")
+if cache_dir.exists():
+    print([p.name for p in cache_dir.iterdir()])
+    with suppress(OSError):
+        shutil.rmtree(cache_dir)
Also applies to: 212-215, 274-281

222-243: Make GPU discovery example independent of TensorFlow.

TensorFlow is heavy and often unavailable on head/login nodes. Suggest a lightweight, broadly portable snippet that first respects CUDA_VISIBLE_DEVICES and then falls back to nvidia-smi if present.
-    "```python\n",
-    "def get_available_gpus():\n",
-    "    import socket\n",
-    "    from tensorflow.python.client import device_lib\n",
-    "    local_device_protos = device_lib.list_local_devices()\n",
-    "    return [\n",
-    "        (x.name, x.physical_device_desc, socket.gethostname()) \n",
-    "        for x in local_device_protos if x.device_type == 'GPU'\n",
-    "    ]\n",
-    "```\n",
+    "```python\n",
+    "def get_available_gpus():\n",
+    "    import os, socket, shutil, subprocess\n",
+    "    host = socket.gethostname()\n",
+    "    devices = []\n",
+    "    # Respect CUDA_VISIBLE_DEVICES when set\n",
+    "    cvd = os.environ.get(\"CUDA_VISIBLE_DEVICES\")\n",
+    "    if cvd:\n",
+    "        ids = [i.strip() for i in cvd.split(\",\") if i.strip()]\n",
+    "        devices.extend([(f\"GPU:{i}\", \"via CUDA_VISIBLE_DEVICES\", host) for i in ids])\n",
+    "    # Fallback to nvidia-smi when available on workers\n",
+    "    if shutil.which(\"nvidia-smi\"):\n",
+    "        out = subprocess.check_output([\n",
+    "            \"nvidia-smi\", \"--query-gpu=index,name,uuid\", \"--format=csv,noheader\"\n",
+    "        ], text=True)\n",
+    "        for line in out.strip().splitlines():\n",
+    "            idx, name, uuid = [x.strip() for x in line.split(\",\")]\n",
+    "            devices.append((f\"GPU:{idx}\", f\"{name} {uuid}\", host))\n",
+    "    return devices\n",
+    "```\n",
285-301: Kernel metadata may not exist in user environments (“Flux”).

Unless your CI publishes a “Flux” kernelspec, users opening this notebook will fall back to a different kernel or fail. Consider using a generic Python kernel in the metadata or confirm the kernelspec is available in your docs environment.
-  "kernelspec": {
-   "display_name": "Flux",
-   "language": "python",
-   "name": "flux"
-  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
If you keep “Flux”, please confirm it exists in your binder/CI image.

69-89: Slurm time and memory template: minor robustness note.

--time currently floors seconds to minutes (run_time_max // 60) with a 1-minute lower bound. That’s fine, but it silently rounds down. Consider rounding up to avoid under-allocating for short runs.
-#SBATCH --time={{ [1, run_time_max // 60]|max }}
+#SBATCH --time={{ [1, (run_time_max + 59) // 60]|max }}
This keeps the interface (seconds) but allocates enough minutes for near-threshold jobs.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between dcbb5a5 and 6946ab6.

📒 Files selected for processing (1)

notebooks/2-hpc-cluster.ipynb (2 hunks)

🧰 Additional context used

🪛 Ruff (0.12.2)

notebooks/2-hpc-cluster.ipynb

7-7: Module level import not at top of file

(E402)

32-32: Module level import not at top of file

(E402)

33-33: Module level import not at top of file

(E402)

38-41: Use contextlib.suppress(OSError) instead of try-except-pass

Replace with contextlib.suppress(OSError)

(SIM105)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)

GitHub Check: unittest_mpich (ubuntu-latest, 3.13)
GitHub Check: unittest_mpich (macos-latest, 3.13)
GitHub Check: unittest_mpich (ubuntu-latest, 3.11)
GitHub Check: unittest_mpich (ubuntu-24.04-arm, 3.13)
GitHub Check: unittest_flux_mpich
GitHub Check: unittest_openmpi (macos-latest, 3.13)
GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
GitHub Check: notebooks_integration
GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
GitHub Check: unittest_openmpi (ubuntu-24.04-arm, 3.13)
GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
GitHub Check: unittest_openmpi (ubuntu-latest, 3.11)
GitHub Check: unittest_openmpi (ubuntu-22.04-arm, 3.13)
GitHub Check: notebooks
GitHub Check: unittest_win
GitHub Check: unittest_slurm_mpich
GitHub Check: unittest_flux_openmpi

🔇 Additional comments (3)

notebooks/2-hpc-cluster.ipynb (3)

129-139: Dependency-chaining example LGTM.

Clear and correct use of futures-as-arguments to establish dependencies in submission mode.

187-194: MPI example LGTM, returns rank-wise results as documented.

The example correctly requests cores=2 and returns a list of (i, size, rank) tuples.

Also applies to: 212-215

148-156: Outputs cleared in notebooks/2-hpc-cluster.ipynb
All code cell outputs have been removed and execution counts reset to null across the notebook. No remaining outputs detected—ready to commit the cleaned notebook.

coderabbitai · 2025-08-21T09:38:57Z

notebooks/2-hpc-cluster.ipynb

+    "In contrast to the [Single Node Executor](https://executorlib.readthedocs.io/en/latest/1-single-node.html) and the [HPC Job Executor](https://executorlib.readthedocs.io/en/latest/3-hpc-job.html) the HPC Submission Executors do not communicate via the [zero message queue](https://zeromq.org) but instead store the python functions on the file system and uses the job scheduler to handle the dependencies of the Python functions. Consequently, the block allocation `block_allocation` and the init function `init_function` are not available in the HPC Cluster Executors. At the same time it is possible to close the Python process which created the `Executor`, wait until the execution of the submitted Python functions is completed and afterwards reload the results from the cache.\n",
+    "\n",
+    "Internally the HPC submission mode is using the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io) to connect to HPC job schedulers and the [h5py](https://www.h5py.org) package for serializing the Python functions to store them on the file system. Both packages are optional dependency of executorlib. The installation of the [pysqa](https://pysqa.readthedocs.io) package and the [h5py](https://www.h5py.org) package are covered in the installation section. "
+   ]


⚠️ Potential issue

Fix typos and tighten language in the intro; use correct “ZeroMQ” casing.

A few user-facing typos and minor grammar nits here. Proposed inline fixes below.

- "# HPC Cluster Executor\n", - "In contrast to the [Single Node Executor](https://executorlib.readthedocs.io/en/latest/1-single-node.html) and the [HPC Job Executor](https://executorlib.readthedocs.io/en/latest/3-hpc-job.html) the HPC Submission Executors do not communicate via the [zero message queue](https://zeromq.org) but instead store the python functions on the file system and uses the job scheduler to handle the dependencies of the Python functions. Consequently, the block allocation `block_allocation` and the init function `init_function` are not available in the HPC Cluster Executors. At the same time it is possible to close the Python process which created the `Executor`, wait until the execution of the submitted Python functions is completed and afterwards reload the results from the cache.\n", + "# HPC Cluster Executor\n", + "In contrast to the [Single Node Executor](https://executorlib.readthedocs.io/en/latest/1-single-node.html) and the [HPC Job Executor](https://executorlib.readthedocs.io/en/latest/3-hpc-job.html), the HPC Submission Executors do not communicate via [ZeroMQ](https://zeromq.org) but instead store the Python functions on the file system and use the job scheduler to handle their dependencies. Consequently, the block allocation `block_allocation` and the init function `init_function` are not available in the HPC Cluster Executors. At the same time, it is possible to close the Python process that created the `Executor`, wait until the submitted Python functions are completed, and afterwards reload the results from the cache.\n", @@ - "Internally the HPC submission mode is using the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io) to connect to HPC job schedulers and the [h5py](https://www.h5py.org) package for serializing the Python functions to store them on the file system. Both packages are optional dependency of executorlib. The installation of the [pysqa](https://pysqa.readthedocs.io) package and the [h5py](https://www.h5py.org) package are covered in the installation section. " + "Internally, the HPC submission mode uses the [Python Simple Queuing System adapter (pysqa)](https://pysqa.readthedocs.io) to connect to HPC job schedulers and the [h5py](https://www.h5py.org) package for serializing Python functions to store them on the file system. Both packages are optional dependencies of executorlib. The installation of [pysqa](https://pysqa.readthedocs.io) and [h5py](https://www.h5py.org) is covered in the installation section."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"In contrast to the [Single Node Executor](https://executorlib.readthedocs.io/en/latest/1-single-node.html) and the [HPC Job Executor](https://executorlib.readthedocs.io/en/latest/3-hpc-job.html) the HPC Submission Executors do not communicate via the [zero message queue](https://zeromq.org) but instead store the python functions on the file system and uses the job scheduler to handle the dependencies of the Python functions. Consequently, the block allocation `block_allocation` and the init function `init_function` are not available in the HPC Cluster Executors. At the same time it is possible to close the Python process which created the `Executor`, wait until the execution of the submitted Python functions is completed and afterwards reload the results from the cache.\n",

"\n",

"Internally the HPC submission mode is using the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io) to connect to HPC job schedulers and the [h5py](https://www.h5py.org) package for serializing the Python functions to store them on the file system. Both packages are optional dependency of executorlib. The installation of the [pysqa](https://pysqa.readthedocs.io) package and the [h5py](https://www.h5py.org) package are covered in the installation section. "

]

"# HPC Cluster Executor\n",

"In contrast to the [Single Node Executor](https://executorlib.readthedocs.io/en/latest/1-single-node.html) and the [HPC Job Executor](https://executorlib.readthedocs.io/en/latest/3-hpc-job.html), the HPC Submission Executors do not communicate via [ZeroMQ](https://zeromq.org) but instead store the Python functions on the file system and use the job scheduler to handle their dependencies. Consequently, the block allocation `block_allocation` and the init function `init_function` are not available in the HPC Cluster Executors. At the same time, it is possible to close the Python process that created the `Executor`, wait until the submitted Python functions are completed, and afterwards reload the results from the cache.\n",

"\n",

"Internally, the HPC submission mode uses the [Python Simple Queuing System adapter (pysqa)](https://pysqa.readthedocs.io) to connect to HPC job schedulers and the [h5py](https://www.h5py.org) package for serializing Python functions to store them on the file system. Both packages are optional dependencies of executorlib. The installation of [pysqa](https://pysqa.readthedocs.io) and [h5py](https://www.h5py.org) is covered in the installation section."

🤖 Prompt for AI Agents

In notebooks/2-hpc-cluster.ipynb around lines 9 to 12, correct typos and tighten wording: capitalize ZeroMQ, fix "adatper" to "adapter", change "packages are optional dependency" to "packages are optional dependencies of executorlib", make subject/verb agreement consistent (e.g., "store the Python functions on the file system and use the job scheduler"), change "which created" to "that created" for the Python process sentence and tighten it to "close the Python process that created the Executor, wait for submitted functions to complete, then reload results from cache", and ensure consistent capitalization of "Python" and variable names like `Executor`, `block_allocation`, and `init_function`.

coderabbitai · 2025-08-21T09:38:57Z

notebooks/2-hpc-cluster.ipynb

+    "## SLURM\n",
+    "The [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) job scheduler is currently the most commonly used job scheduler for HPC clusters. In the HPC submission mode executorlib internally uses the [sbatch](https://slurm.schedmd.com/sbatch.html) command this is in contrast to the [HPC allocatiom mode] which internally uses the [srun](https://slurm.schedmd.com/srun.html) command. \n",
+    "\n",
+    "The connection to the job scheduler is based on the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io). It provides a default configuration for most commonly used job schedulers including SLURM, in addition it is also possible to provide the submission template as part of the resource dictionary `resource_dict` or via the path to the configuration directory with the `pysqa_config_directory` parameter. All three options are covered in more detail on the [pysqa documentation](https://pysqa.readthedocs.io)."
+   ]


⚠️ Potential issue

SLURM section: typo fixes and wording around sbatch vs srun.

Minor typos and readability improvements; avoid the dangling link placeholder for “HPC allocation mode”.

- "The [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) job scheduler is currently the most commonly used job scheduler for HPC clusters. In the HPC submission mode executorlib internally uses the [sbatch](https://slurm.schedmd.com/sbatch.html) command this is in contrast to the [HPC allocatiom mode] which internally uses the [srun](https://slurm.schedmd.com/srun.html) command. \n", + "The [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) job scheduler is currently the most commonly used job scheduler for HPC clusters. In HPC submission mode, executorlib uses the [sbatch](https://slurm.schedmd.com/sbatch.html) command; in contrast, the HPC allocation mode uses [srun](https://slurm.schedmd.com/srun.html).\n", @@ - "The connection to the job scheduler is based on the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io). It provides a default configuration for most commonly used job schedulers including SLURM, in addition it is also possible to provide the submission template as part of the resource dictionary `resource_dict` or via the path to the configuration directory with the `pysqa_config_directory` parameter. All three options are covered in more detail on the [pysqa documentation](https://pysqa.readthedocs.io)." + "The connection to the job scheduler is based on the [Python Simple Queuing System adapter (pysqa)](https://pysqa.readthedocs.io). It provides a default configuration for most commonly used job schedulers, including SLURM. Alternatively, you can provide the submission template as part of the resource dictionary `resource_dict` or via a configuration directory path with the `pysqa_config_directory` parameter. All three options are covered in more detail in the [pysqa documentation](https://pysqa.readthedocs.io)."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"## SLURM\n",

"The [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) job scheduler is currently the most commonly used job scheduler for HPC clusters. In the HPC submission mode executorlib internally uses the [sbatch](https://slurm.schedmd.com/sbatch.html) command this is in contrast to the [HPC allocatiom mode] which internally uses the [srun](https://slurm.schedmd.com/srun.html) command. \n",

"\n",

"The connection to the job scheduler is based on the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io). It provides a default configuration for most commonly used job schedulers including SLURM, in addition it is also possible to provide the submission template as part of the resource dictionary `resource_dict` or via the path to the configuration directory with the `pysqa_config_directory` parameter. All three options are covered in more detail on the [pysqa documentation](https://pysqa.readthedocs.io)."

]

[

"## SLURM\n",

"The [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) job scheduler is currently the most commonly used job scheduler for HPC clusters. In HPC submission mode, executorlib uses the [sbatch](https://slurm.schedmd.com/sbatch.html) command; in contrast, the HPC allocation mode uses [srun](https://slurm.schedmd.com/srun.html).\n",

"\n",

"The connection to the job scheduler is based on the [Python Simple Queuing System adapter (pysqa)](https://pysqa.readthedocs.io). It provides a default configuration for most commonly used job schedulers, including SLURM. Alternatively, you can provide the submission template as part of the resource dictionary `resource_dict` or via a configuration directory path with the `pysqa_config_directory` parameter. All three options are covered in more detail in the [pysqa documentation](https://pysqa.readthedocs.io)."

]

🤖 Prompt for AI Agents

In notebooks/2-hpc-cluster.ipynb around lines 19 to 23, the SLURM section has typos and unclear wording about sbatch vs srun and a dangling placeholder for “HPC allocation mode”; update the text to fix spelling errors (e.g., "allocatiom" → "allocation", "adatper" → "adapter"), clarify that the submission mode uses sbatch while the allocation mode uses srun, and remove or replace the dangling link placeholder so both modes link properly (e.g., link “HPC allocation mode” to the srun docs); also improve punctuation for readability and ensure the pysqa sentence reads smoothly.

coderabbitai · 2025-08-21T09:38:57Z

notebooks/2-hpc-cluster.ipynb

+    "Specific parameters for `SlurmClusterExecutor` like the maximum run time `\"run_time_max\"`, the maximum memory `\"memory_max\"` or the submission template for the job submission script `\"submission_template\"` can be specified as part of the resource dictionary. Again it is possible to specify the resource dictonary `resource_dicionary` either for each function in the `submit()` function or during the initialization of the `SlurmClusterExecutor`."
+   ]


⚠️ Potential issue

Correct “resource dictionary” typos and variable name.

Use consistent spelling and the public API name resource_dict.

- "Specific parameters for `SlurmClusterExecutor` like the maximum run time `\"run_time_max\"`, the maximum memory `\"memory_max\"` or the submission template for the job submission script `\"submission_template\"` can be specified as part of the resource dictionary. Again it is possible to specify the resource dictonary `resource_dicionary` either for each function in the `submit()` function or during the initialization of the `SlurmClusterExecutor`." + "Specific parameters for `SlurmClusterExecutor`, such as the maximum run time `\"run_time_max\"`, maximum memory `\"memory_max\"`, or the submission template for the job submission script `\"submission_template\"`, can be specified as part of the resource dictionary. It is possible to specify the resource dictionary `resource_dict` either per call in `submit()` or during the initialization of the `SlurmClusterExecutor`."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"Specific parameters for `SlurmClusterExecutor` like the maximum run time `\"run_time_max\"`, the maximum memory `\"memory_max\"` or the submission template for the job submission script `\"submission_template\"` can be specified as part of the resource dictionary. Again it is possible to specify the resource dictonary `resource_dicionary` either for each function in the `submit()` function or during the initialization of the `SlurmClusterExecutor`."

]

"Specific parameters for `SlurmClusterExecutor`, such as the maximum run time `\"run_time_max\"`, maximum memory `\"memory_max\"`, or the submission template for the job submission script `\"submission_template\"`, can be specified as part of the resource dictionary. It is possible to specify the resource dictionary `resource_dict` either per call in `submit()` or during the initialization of the `SlurmClusterExecutor`."

🤖 Prompt for AI Agents

In notebooks/2-hpc-cluster.ipynb around lines 60 to 61, fix the typos and API name: replace the misspelled phrase "resource dictonary" and "resource_dicionary" with the correct spelling "resource dictionary" and the public API variable name "resource_dict" respectively so the text consistently uses "resource dictionary" and "resource_dict" in both the descriptive sentence and the code/variable references.

coderabbitai · 2025-08-21T09:38:57Z

notebooks/2-hpc-cluster.ipynb

+    "### Cleaning Cache\n",
+    "Finally, as the HPC Cluster Executors leverage the file system to communicate serialized Python functions, it is important to clean up the cache directory specified by the `cache_directory` parameter once the results of the submitted Python functions are no longer needed. The serialized Python functions are stored in binary format using the [cloudpickle](https://github.com/cloudpipe/cloudpickle) library for serialization. This format is design for caching but not for long-term storage. The user is responsible for the long-term storage of their data."
+   ]


⚠️ Potential issue

Typos: “a lot flexibility”, “design for caching”.

Small phrasing cleanups for the Cache section.

- "Finally, as the HPC Cluster Executors leverage the file system to communicate serialized Python functions, it is important to clean up the cache directory specified by the `cache_directory` parameter once the results of the submitted Python functions are no longer needed. The serialized Python functions are stored in binary format using the [cloudpickle](https://github.com/cloudpipe/cloudpickle) library for serialization. This format is design for caching but not for long-term storage. The user is responsible for the long-term storage of their data." + "Finally, as the HPC Cluster Executors leverage the file system to communicate serialized Python functions, it is important to clean up the cache directory specified by the `cache_directory` parameter once the results are no longer needed. The serialized Python functions are stored in binary format using the [cloudpickle](https://github.com/cloudpipe/cloudpickle) library. This format is designed for caching, not long‑term storage. The user is responsible for the long‑term storage of their data."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"### Cleaning Cache\n",

"Finally, as the HPC Cluster Executors leverage the file system to communicate serialized Python functions, it is important to clean up the cache directory specified by the `cache_directory` parameter once the results of the submitted Python functions are no longer needed. The serialized Python functions are stored in binary format using the [cloudpickle](https://github.com/cloudpipe/cloudpickle) library for serialization. This format is design for caching but not for long-term storage. The user is responsible for the long-term storage of their data."

]

"### Cleaning Cache\n",

"Finally, as the HPC Cluster Executors leverage the file system to communicate serialized Python functions, it is important to clean up the cache directory specified by the `cache_directory` parameter once the results are no longer needed. The serialized Python functions are stored in binary format using the [cloudpickle](https://github.com/cloudpipe/cloudpickle) library. This format is designed for caching, not long-term storage. The user is responsible for the long-term storage of their data."

]

Fix slurm job executor example

dcbb5a5

coderabbitai bot reviewed Aug 21, 2025

View reviewed changes

add partition parameter

6946ab6

coderabbitai bot reviewed Aug 21, 2025

View reviewed changes

jan-janssen merged commit 290dd15 into main Aug 21, 2025
137 of 147 checks passed

jan-janssen deleted the slurm_example_fix branch August 21, 2025 09:58

-    "In contrast to the [Single Node Executor](https://executorlib.readthedocs.io/en/latest/1-single-node.html) and the [HPC Job Executor](https://executorlib.readthedocs.io/en/latest/3-hpc-job.html) the HPC Submission Executors do not communicate via the [zero message queue](https://zeromq.org) but instead store the python functions on the file system and uses the job scheduler to handle the dependencies of the Python functions. Consequently, the block allocation `block_allocation` and the init function `init_function` are not available in the HPC Cluster Executors. At the same time it is possible to close the Python process which created the `Executor`, wait until the execution of the submitted Python functions is completed and afterwards reload the results from the cache.\n",
-    "\n",
-    "Internally the HPC submission mode is using the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io) to connect to HPC job schedulers and the [h5py](https://www.h5py.org) package for serializing the Python functions to store them on the file system. Both packages are optional dependency of executorlib. The installation of the [pysqa](https://pysqa.readthedocs.io) package and the [h5py](https://www.h5py.org) package are covered in the installation section. "
-   ]
+    "# HPC Cluster Executor\n",
+    "In contrast to the [Single Node Executor](https://executorlib.readthedocs.io/en/latest/1-single-node.html) and the [HPC Job Executor](https://executorlib.readthedocs.io/en/latest/3-hpc-job.html), the HPC Submission Executors do not communicate via [ZeroMQ](https://zeromq.org) but instead store the Python functions on the file system and use the job scheduler to handle their dependencies. Consequently, the block allocation `block_allocation` and the init function `init_function` are not available in the HPC Cluster Executors. At the same time, it is possible to close the Python process that created the `Executor`, wait until the submitted Python functions are completed, and afterwards reload the results from the cache.\n",
+    "\n",
+    "Internally, the HPC submission mode uses the [Python Simple Queuing System adapter (pysqa)](https://pysqa.readthedocs.io) to connect to HPC job schedulers and the [h5py](https://www.h5py.org) package for serializing Python functions to store them on the file system. Both packages are optional dependencies of executorlib. The installation of [pysqa](https://pysqa.readthedocs.io) and [h5py](https://www.h5py.org) is covered in the installation section."

		"Specific parameters for `SlurmClusterExecutor` like the maximum run time `\"run_time_max\"`, the maximum memory `\"memory_max\"` or the submission template for the job submission script `\"submission_template\"` can be specified as part of the resource dictionary. Again it is possible to specify the resource dictonary `resource_dicionary` either for each function in the `submit()` function or during the initialization of the `SlurmClusterExecutor`."
		]

Fix slurm job executor example #784

Fix slurm job executor example #784

Uh oh!

Conversation

jan-janssen commented Aug 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

review-notebook-app bot commented Aug 21, 2025

Uh oh!

coderabbitai bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

codecov bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jan-janssen commented Aug 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 21, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Aug 21, 2025 •

edited

Loading