Skip to content

Conversation

jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Aug 21, 2025

Summary by CodeRabbit

  • Documentation
    • Updated HPC notebooks: aligned job example with the current Slurm job workflow, added a GPU availability demonstration, and improved formatting and captured example outputs for clarity.
  • Refactor
    • Renamed an HPC executor in the public API to align terminology and usage; this may require updating imports and references in your code.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 21, 2025

Walkthrough

Replaced SlurmAllocationExecutor with SlurmJobExecutor in the HPC job notebook example and updated notebook formatting/outputs. Public API reflects the class rename from SlurmAllocationExecutor to SlurmJobExecutor in executorlib. Control flow and submit/result usage remain unchanged.

Changes

Cohort / File(s) Summary
Notebook: Slurm job example
notebooks/3-hpc-job.ipynb
Replaced SlurmAllocationExecutor with SlurmJobExecutor inside the context manager; submit and result handling unchanged.
Notebook: HPC cluster demo & formatting
notebooks/2-hpc-cluster.ipynb
Reflowed cell metadata and sources, added execution outputs and a GPU sampling demo, reformatted code blocks and markdown for readability.
Public API rename (executorlib)
executorlib/*
Renamed exported class SlurmAllocationExecutorSlurmJobExecutor in the public API.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Notebook
  participant Executor as SlurmJobExecutor
  participant Scheduler as Slurm Scheduler
  participant Job as Compute Job

  User->>Notebook: run Slurm example
  Notebook->>Executor: with SlurmJobExecutor(...)
  Executor->>Scheduler: submit(command, resources)
  Scheduler->>Job: dispatch
  Job-->>Scheduler: finish (status, output)
  Scheduler-->>Executor: return job result
  Executor-->>Notebook: result
  Notebook-->>User: print result
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

In the queue I nibble logs and run,
SlurmJob hops in, the change is done.
Allocation name now neatly shed,
Jobs submit, then curl in bed.
I twitch—results arrive, hooray! 🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch slurm_example_fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov
Copy link

codecov bot commented Aug 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.67%. Comparing base (b3cf4c8) to head (6946ab6).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #784   +/-   ##
=======================================
  Coverage   97.67%   97.67%           
=======================================
  Files          33       33           
  Lines        1463     1463           
=======================================
  Hits         1429     1429           
  Misses         34       34           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
notebooks/3-hpc-job.ipynb (1)

45-49: Make the snippet copy-paste runnable (include import and tiny precondition note).

Readers often copy the fenced block directly; adding the import and a brief note avoids confusion.

Apply this minimal diff to the fenced example:

- with SlurmJobExecutor() as exe:
+ from executorlib import SlurmJobExecutor  # required when using this block standalone
+ # Note: run inside an existing SLURM allocation (e.g., via `salloc` or within an `sbatch` script)
+ with SlurmJobExecutor() as exe:
     future = exe.submit(sum, [1, 1])
     print(future.result())
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b3cf4c8 and dcbb5a5.

📒 Files selected for processing (1)
  • notebooks/3-hpc-job.ipynb (1 hunks)
🔇 Additional comments (1)
notebooks/3-hpc-job.ipynb (1)

46-46: Switch to SlurmJobExecutor is correct and consistent with the import.

The example now matches the import in the preceding code cell and reflects the current public API.

Comment on lines 45 to 49
"```python\n",
"with SlurmAllocationExecutor() as exe:\n",
"with SlurmJobExecutor() as exe:\n",
" future = exe.submit(sum, [1, 1])\n",
" print(future.result())\n",
"```"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Sanity-check for lingering references to the old class name across the repo.

Since this change is part of a rename/cleanup, verify no other notebooks/docs still mention SlurmAllocationExecutor.


🏁 Script executed:

#!/bin/bash
# Find any remaining references to the old class name.
rg -n -S "SlurmAllocationExecutor|SlurmAllocationExecutor\(" -C2

# Also check for mixed guidance in notebooks and docs mentioning both names.
rg -n -S "SlurmJobExecutor" -g '!**/site/**' -C2

Length of output: 56702


Lingering test class name referencing SlurmAllocationExecutor needs update

The only remaining reference to the old class name was found in your test suite. Please rename the test class to align with the new executor name:

• tests/test_singlenodeexecutor_plot_dependency.py:156

- class TestSlurmAllocationExecutorWithDependencies(unittest.TestCase):
+ class TestSlurmJobExecutorWithDependencies(unittest.TestCase):

No other occurrences of SlurmAllocationExecutor were detected across notebooks, docs, or code. Please make this change to keep test names consistent with the renamed executor.

🤖 Prompt for AI Agents
In notebooks/3-hpc-job.ipynb around lines 45 to 49 there is a lingering
reference to the old executor name; update the test class in
tests/test_singlenodeexecutor_plot_dependency.py at line 156 to use the new
executor name (replace SlurmAllocationExecutor with SlurmJobExecutor) so the
test class name matches the renamed executor across the codebase.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
notebooks/2-hpc-cluster.ipynb (5)

95-97: Avoid site-specific partition in example.

The hard-coded partition s.cmfe will confuse users. Prefer a placeholder or a generic partition.

-            "partition": "s.cmfe",
+            "partition": "<your_partition>",

158-165: Unify cache directory across examples and modernize cleanup code.

Examples alternate between ./file and ./cache. Use one path consistently, and improve the cleanup cell with pathlib + contextlib.suppress.

@@
-with FluxClusterExecutor(cache_directory="./file") as exe:
+with FluxClusterExecutor(cache_directory="./cache") as exe:
@@
-with FluxClusterExecutor(cache_directory="./file") as exe:
+with FluxClusterExecutor(cache_directory="./cache") as exe:
@@
-import os
-import shutil
-
-cache_dir = "./file"
-if os.path.exists(cache_dir):
-    print(os.listdir(cache_dir))
-    try:
-        shutil.rmtree(cache_dir)
-    except OSError:
-        pass
+from pathlib import Path
+from contextlib import suppress
+import shutil
+
+cache_dir = Path("./cache")
+if cache_dir.exists():
+    print([p.name for p in cache_dir.iterdir()])
+    with suppress(OSError):
+        shutil.rmtree(cache_dir)

Also applies to: 212-215, 274-281


222-243: Make GPU discovery example independent of TensorFlow.

TensorFlow is heavy and often unavailable on head/login nodes. Suggest a lightweight, broadly portable snippet that first respects CUDA_VISIBLE_DEVICES and then falls back to nvidia-smi if present.

-    "```python\n",
-    "def get_available_gpus():\n",
-    "    import socket\n",
-    "    from tensorflow.python.client import device_lib\n",
-    "    local_device_protos = device_lib.list_local_devices()\n",
-    "    return [\n",
-    "        (x.name, x.physical_device_desc, socket.gethostname()) \n",
-    "        for x in local_device_protos if x.device_type == 'GPU'\n",
-    "    ]\n",
-    "```\n",
+    "```python\n",
+    "def get_available_gpus():\n",
+    "    import os, socket, shutil, subprocess\n",
+    "    host = socket.gethostname()\n",
+    "    devices = []\n",
+    "    # Respect CUDA_VISIBLE_DEVICES when set\n",
+    "    cvd = os.environ.get(\"CUDA_VISIBLE_DEVICES\")\n",
+    "    if cvd:\n",
+    "        ids = [i.strip() for i in cvd.split(\",\") if i.strip()]\n",
+    "        devices.extend([(f\"GPU:{i}\", \"via CUDA_VISIBLE_DEVICES\", host) for i in ids])\n",
+    "    # Fallback to nvidia-smi when available on workers\n",
+    "    if shutil.which(\"nvidia-smi\"):\n",
+    "        out = subprocess.check_output([\n",
+    "            \"nvidia-smi\", \"--query-gpu=index,name,uuid\", \"--format=csv,noheader\"\n",
+    "        ], text=True)\n",
+    "        for line in out.strip().splitlines():\n",
+    "            idx, name, uuid = [x.strip() for x in line.split(\",\")]\n",
+    "            devices.append((f\"GPU:{idx}\", f\"{name} {uuid}\", host))\n",
+    "    return devices\n",
+    "```\n",

285-301: Kernel metadata may not exist in user environments (“Flux”).

Unless your CI publishes a “Flux” kernelspec, users opening this notebook will fall back to a different kernel or fail. Consider using a generic Python kernel in the metadata or confirm the kernelspec is available in your docs environment.

-  "kernelspec": {
-   "display_name": "Flux",
-   "language": "python",
-   "name": "flux"
-  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },

If you keep “Flux”, please confirm it exists in your binder/CI image.


69-89: Slurm time and memory template: minor robustness note.

--time currently floors seconds to minutes (run_time_max // 60) with a 1-minute lower bound. That’s fine, but it silently rounds down. Consider rounding up to avoid under-allocating for short runs.

-#SBATCH --time={{ [1, run_time_max // 60]|max }}
+#SBATCH --time={{ [1, (run_time_max + 59) // 60]|max }}

This keeps the interface (seconds) but allocates enough minutes for near-threshold jobs.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between dcbb5a5 and 6946ab6.

📒 Files selected for processing (1)
  • notebooks/2-hpc-cluster.ipynb (2 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
notebooks/2-hpc-cluster.ipynb

7-7: Module level import not at top of file

(E402)


32-32: Module level import not at top of file

(E402)


33-33: Module level import not at top of file

(E402)


38-41: Use contextlib.suppress(OSError) instead of try-except-pass

Replace with contextlib.suppress(OSError)

(SIM105)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.13)
  • GitHub Check: unittest_mpich (macos-latest, 3.13)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.11)
  • GitHub Check: unittest_mpich (ubuntu-24.04-arm, 3.13)
  • GitHub Check: unittest_flux_mpich
  • GitHub Check: unittest_openmpi (macos-latest, 3.13)
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
  • GitHub Check: notebooks_integration
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
  • GitHub Check: unittest_openmpi (ubuntu-24.04-arm, 3.13)
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.11)
  • GitHub Check: unittest_openmpi (ubuntu-22.04-arm, 3.13)
  • GitHub Check: notebooks
  • GitHub Check: unittest_win
  • GitHub Check: unittest_slurm_mpich
  • GitHub Check: unittest_flux_openmpi
🔇 Additional comments (3)
notebooks/2-hpc-cluster.ipynb (3)

129-139: Dependency-chaining example LGTM.

Clear and correct use of futures-as-arguments to establish dependencies in submission mode.


187-194: MPI example LGTM, returns rank-wise results as documented.

The example correctly requests cores=2 and returns a list of (i, size, rank) tuples.

Also applies to: 212-215


148-156: Outputs cleared in notebooks/2-hpc-cluster.ipynb
All code cell outputs have been removed and execution counts reset to null across the notebook. No remaining outputs detected—ready to commit the cleaned notebook.

Comment on lines +9 to +12
"In contrast to the [Single Node Executor](https://executorlib.readthedocs.io/en/latest/1-single-node.html) and the [HPC Job Executor](https://executorlib.readthedocs.io/en/latest/3-hpc-job.html) the HPC Submission Executors do not communicate via the [zero message queue](https://zeromq.org) but instead store the python functions on the file system and uses the job scheduler to handle the dependencies of the Python functions. Consequently, the block allocation `block_allocation` and the init function `init_function` are not available in the HPC Cluster Executors. At the same time it is possible to close the Python process which created the `Executor`, wait until the execution of the submitted Python functions is completed and afterwards reload the results from the cache.\n",
"\n",
"Internally the HPC submission mode is using the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io) to connect to HPC job schedulers and the [h5py](https://www.h5py.org) package for serializing the Python functions to store them on the file system. Both packages are optional dependency of executorlib. The installation of the [pysqa](https://pysqa.readthedocs.io) package and the [h5py](https://www.h5py.org) package are covered in the installation section. "
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typos and tighten language in the intro; use correct “ZeroMQ” casing.

A few user-facing typos and minor grammar nits here. Proposed inline fixes below.

-    "# HPC Cluster Executor\n",
-    "In contrast to the [Single Node Executor](https://executorlib.readthedocs.io/en/latest/1-single-node.html) and the [HPC Job Executor](https://executorlib.readthedocs.io/en/latest/3-hpc-job.html) the HPC Submission Executors do not communicate via the [zero message queue](https://zeromq.org) but instead store the python functions on the file system and uses the job scheduler to handle the dependencies of the Python functions. Consequently, the block allocation `block_allocation` and the init function `init_function` are not available in the HPC Cluster Executors. At the same time it is possible to close the Python process which created the `Executor`, wait until the execution of the submitted Python functions is completed and afterwards reload the results from the cache.\n",
+    "# HPC Cluster Executor\n",
+    "In contrast to the [Single Node Executor](https://executorlib.readthedocs.io/en/latest/1-single-node.html) and the [HPC Job Executor](https://executorlib.readthedocs.io/en/latest/3-hpc-job.html), the HPC Submission Executors do not communicate via [ZeroMQ](https://zeromq.org) but instead store the Python functions on the file system and use the job scheduler to handle their dependencies. Consequently, the block allocation `block_allocation` and the init function `init_function` are not available in the HPC Cluster Executors. At the same time, it is possible to close the Python process that created the `Executor`, wait until the submitted Python functions are completed, and afterwards reload the results from the cache.\n",
@@
-    "Internally the HPC submission mode is using the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io) to connect to HPC job schedulers and the [h5py](https://www.h5py.org) package for serializing the Python functions to store them on the file system. Both packages are optional dependency of executorlib. The installation of the [pysqa](https://pysqa.readthedocs.io) package and the [h5py](https://www.h5py.org) package are covered in the installation section. "
+    "Internally, the HPC submission mode uses the [Python Simple Queuing System adapter (pysqa)](https://pysqa.readthedocs.io) to connect to HPC job schedulers and the [h5py](https://www.h5py.org) package for serializing Python functions to store them on the file system. Both packages are optional dependencies of executorlib. The installation of [pysqa](https://pysqa.readthedocs.io) and [h5py](https://www.h5py.org) is covered in the installation section."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"In contrast to the [Single Node Executor](https://executorlib.readthedocs.io/en/latest/1-single-node.html) and the [HPC Job Executor](https://executorlib.readthedocs.io/en/latest/3-hpc-job.html) the HPC Submission Executors do not communicate via the [zero message queue](https://zeromq.org) but instead store the python functions on the file system and uses the job scheduler to handle the dependencies of the Python functions. Consequently, the block allocation `block_allocation` and the init function `init_function` are not available in the HPC Cluster Executors. At the same time it is possible to close the Python process which created the `Executor`, wait until the execution of the submitted Python functions is completed and afterwards reload the results from the cache.\n",
"\n",
"Internally the HPC submission mode is using the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io) to connect to HPC job schedulers and the [h5py](https://www.h5py.org) package for serializing the Python functions to store them on the file system. Both packages are optional dependency of executorlib. The installation of the [pysqa](https://pysqa.readthedocs.io) package and the [h5py](https://www.h5py.org) package are covered in the installation section. "
]
"# HPC Cluster Executor\n",
"In contrast to the [Single Node Executor](https://executorlib.readthedocs.io/en/latest/1-single-node.html) and the [HPC Job Executor](https://executorlib.readthedocs.io/en/latest/3-hpc-job.html), the HPC Submission Executors do not communicate via [ZeroMQ](https://zeromq.org) but instead store the Python functions on the file system and use the job scheduler to handle their dependencies. Consequently, the block allocation `block_allocation` and the init function `init_function` are not available in the HPC Cluster Executors. At the same time, it is possible to close the Python process that created the `Executor`, wait until the submitted Python functions are completed, and afterwards reload the results from the cache.\n",
"\n",
"Internally, the HPC submission mode uses the [Python Simple Queuing System adapter (pysqa)](https://pysqa.readthedocs.io) to connect to HPC job schedulers and the [h5py](https://www.h5py.org) package for serializing Python functions to store them on the file system. Both packages are optional dependencies of executorlib. The installation of [pysqa](https://pysqa.readthedocs.io) and [h5py](https://www.h5py.org) is covered in the installation section."
🤖 Prompt for AI Agents
In notebooks/2-hpc-cluster.ipynb around lines 9 to 12, correct typos and tighten
wording: capitalize ZeroMQ, fix "adatper" to "adapter", change "packages are
optional dependency" to "packages are optional dependencies of executorlib",
make subject/verb agreement consistent (e.g., "store the Python functions on the
file system and use the job scheduler"), change "which created" to "that
created" for the Python process sentence and tighten it to "close the Python
process that created the Executor, wait for submitted functions to complete,
then reload results from cache", and ensure consistent capitalization of
"Python" and variable names like `Executor`, `block_allocation`, and
`init_function`.

Comment on lines +19 to +23
"## SLURM\n",
"The [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) job scheduler is currently the most commonly used job scheduler for HPC clusters. In the HPC submission mode executorlib internally uses the [sbatch](https://slurm.schedmd.com/sbatch.html) command this is in contrast to the [HPC allocatiom mode] which internally uses the [srun](https://slurm.schedmd.com/srun.html) command. \n",
"\n",
"The connection to the job scheduler is based on the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io). It provides a default configuration for most commonly used job schedulers including SLURM, in addition it is also possible to provide the submission template as part of the resource dictionary `resource_dict` or via the path to the configuration directory with the `pysqa_config_directory` parameter. All three options are covered in more detail on the [pysqa documentation](https://pysqa.readthedocs.io)."
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

SLURM section: typo fixes and wording around sbatch vs srun.

Minor typos and readability improvements; avoid the dangling link placeholder for “HPC allocation mode”.

-    "The [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) job scheduler is currently the most commonly used job scheduler for HPC clusters. In the HPC submission mode executorlib internally uses the [sbatch](https://slurm.schedmd.com/sbatch.html) command this is in contrast to the [HPC allocatiom mode] which internally uses the [srun](https://slurm.schedmd.com/srun.html) command. \n",
+    "The [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) job scheduler is currently the most commonly used job scheduler for HPC clusters. In HPC submission mode, executorlib uses the [sbatch](https://slurm.schedmd.com/sbatch.html) command; in contrast, the HPC allocation mode uses [srun](https://slurm.schedmd.com/srun.html).\n",
@@
-    "The connection to the job scheduler is based on the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io). It provides a default configuration for most commonly used job schedulers including SLURM, in addition it is also possible to provide the submission template as part of the resource dictionary `resource_dict` or via the path to the configuration directory with the `pysqa_config_directory` parameter. All three options are covered in more detail on the [pysqa documentation](https://pysqa.readthedocs.io)."
+    "The connection to the job scheduler is based on the [Python Simple Queuing System adapter (pysqa)](https://pysqa.readthedocs.io). It provides a default configuration for most commonly used job schedulers, including SLURM. Alternatively, you can provide the submission template as part of the resource dictionary `resource_dict` or via a configuration directory path with the `pysqa_config_directory` parameter. All three options are covered in more detail in the [pysqa documentation](https://pysqa.readthedocs.io)."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"## SLURM\n",
"The [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) job scheduler is currently the most commonly used job scheduler for HPC clusters. In the HPC submission mode executorlib internally uses the [sbatch](https://slurm.schedmd.com/sbatch.html) command this is in contrast to the [HPC allocatiom mode] which internally uses the [srun](https://slurm.schedmd.com/srun.html) command. \n",
"\n",
"The connection to the job scheduler is based on the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io). It provides a default configuration for most commonly used job schedulers including SLURM, in addition it is also possible to provide the submission template as part of the resource dictionary `resource_dict` or via the path to the configuration directory with the `pysqa_config_directory` parameter. All three options are covered in more detail on the [pysqa documentation](https://pysqa.readthedocs.io)."
]
[
"## SLURM\n",
"The [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) job scheduler is currently the most commonly used job scheduler for HPC clusters. In HPC submission mode, executorlib uses the [sbatch](https://slurm.schedmd.com/sbatch.html) command; in contrast, the HPC allocation mode uses [srun](https://slurm.schedmd.com/srun.html).\n",
"\n",
"The connection to the job scheduler is based on the [Python Simple Queuing System adapter (pysqa)](https://pysqa.readthedocs.io). It provides a default configuration for most commonly used job schedulers, including SLURM. Alternatively, you can provide the submission template as part of the resource dictionary `resource_dict` or via a configuration directory path with the `pysqa_config_directory` parameter. All three options are covered in more detail in the [pysqa documentation](https://pysqa.readthedocs.io)."
]
🤖 Prompt for AI Agents
In notebooks/2-hpc-cluster.ipynb around lines 19 to 23, the SLURM section has
typos and unclear wording about sbatch vs srun and a dangling placeholder for
“HPC allocation mode”; update the text to fix spelling errors (e.g.,
"allocatiom" → "allocation", "adatper" → "adapter"), clarify that the submission
mode uses sbatch while the allocation mode uses srun, and remove or replace the
dangling link placeholder so both modes link properly (e.g., link “HPC
allocation mode” to the srun docs); also improve punctuation for readability and
ensure the pysqa sentence reads smoothly.

Comment on lines +60 to +61
"Specific parameters for `SlurmClusterExecutor` like the maximum run time `\"run_time_max\"`, the maximum memory `\"memory_max\"` or the submission template for the job submission script `\"submission_template\"` can be specified as part of the resource dictionary. Again it is possible to specify the resource dictonary `resource_dicionary` either for each function in the `submit()` function or during the initialization of the `SlurmClusterExecutor`."
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct “resource dictionary” typos and variable name.

Use consistent spelling and the public API name resource_dict.

-    "Specific parameters for `SlurmClusterExecutor` like the maximum run time `\"run_time_max\"`, the maximum memory `\"memory_max\"` or the submission template for the job submission script `\"submission_template\"` can be specified as part of the resource dictionary. Again it is possible to specify the resource dictonary `resource_dicionary` either for each function in the `submit()` function or during the initialization of the `SlurmClusterExecutor`."
+    "Specific parameters for `SlurmClusterExecutor`, such as the maximum run time `\"run_time_max\"`, maximum memory `\"memory_max\"`, or the submission template for the job submission script `\"submission_template\"`, can be specified as part of the resource dictionary. It is possible to specify the resource dictionary `resource_dict` either per call in `submit()` or during the initialization of the `SlurmClusterExecutor`."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"Specific parameters for `SlurmClusterExecutor` like the maximum run time `\"run_time_max\"`, the maximum memory `\"memory_max\"` or the submission template for the job submission script `\"submission_template\"` can be specified as part of the resource dictionary. Again it is possible to specify the resource dictonary `resource_dicionary` either for each function in the `submit()` function or during the initialization of the `SlurmClusterExecutor`."
]
"Specific parameters for `SlurmClusterExecutor`, such as the maximum run time `\"run_time_max\"`, maximum memory `\"memory_max\"`, or the submission template for the job submission script `\"submission_template\"`, can be specified as part of the resource dictionary. It is possible to specify the resource dictionary `resource_dict` either per call in `submit()` or during the initialization of the `SlurmClusterExecutor`."
🤖 Prompt for AI Agents
In notebooks/2-hpc-cluster.ipynb around lines 60 to 61, fix the typos and API
name: replace the misspelled phrase "resource dictonary" and
"resource_dicionary" with the correct spelling "resource dictionary" and the
public API variable name "resource_dict" respectively so the text consistently
uses "resource dictionary" and "resource_dict" in both the descriptive sentence
and the code/variable references.

Comment on lines +250 to +252
"### Cleaning Cache\n",
"Finally, as the HPC Cluster Executors leverage the file system to communicate serialized Python functions, it is important to clean up the cache directory specified by the `cache_directory` parameter once the results of the submitted Python functions are no longer needed. The serialized Python functions are stored in binary format using the [cloudpickle](https://github.com/cloudpipe/cloudpickle) library for serialization. This format is design for caching but not for long-term storage. The user is responsible for the long-term storage of their data."
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Typos: “a lot flexibility”, “design for caching”.

Small phrasing cleanups for the Cache section.

-    "Finally, as the HPC Cluster Executors leverage the file system to communicate serialized Python functions, it is important to clean up the cache directory specified by the `cache_directory` parameter once the results of the submitted Python functions are no longer needed. The serialized Python functions are stored in binary format using the [cloudpickle](https://github.com/cloudpipe/cloudpickle) library for serialization. This format is design for caching but not for long-term storage. The user is responsible for the long-term storage of their data."
+    "Finally, as the HPC Cluster Executors leverage the file system to communicate serialized Python functions, it is important to clean up the cache directory specified by the `cache_directory` parameter once the results are no longer needed. The serialized Python functions are stored in binary format using the [cloudpickle](https://github.com/cloudpipe/cloudpickle) library. This format is designed for caching, not long‑term storage. The user is responsible for the long‑term storage of their data."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"### Cleaning Cache\n",
"Finally, as the HPC Cluster Executors leverage the file system to communicate serialized Python functions, it is important to clean up the cache directory specified by the `cache_directory` parameter once the results of the submitted Python functions are no longer needed. The serialized Python functions are stored in binary format using the [cloudpickle](https://github.com/cloudpipe/cloudpickle) library for serialization. This format is design for caching but not for long-term storage. The user is responsible for the long-term storage of their data."
]
"### Cleaning Cache\n",
"Finally, as the HPC Cluster Executors leverage the file system to communicate serialized Python functions, it is important to clean up the cache directory specified by the `cache_directory` parameter once the results are no longer needed. The serialized Python functions are stored in binary format using the [cloudpickle](https://github.com/cloudpipe/cloudpickle) library. This format is designed for caching, not long-term storage. The user is responsible for the long-term storage of their data."
]

@jan-janssen jan-janssen merged commit 290dd15 into main Aug 21, 2025
137 of 147 checks passed
@jan-janssen jan-janssen deleted the slurm_example_fix branch August 21, 2025 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant