pythonworkflow · jan-janssen · Apr 24, 2025 · Apr 24, 2025
diff --git a/example_workflows/arithmetic/jobflow.ipynb b/example_workflows/arithmetic/jobflow.ipynb
@@ -24,58 +24,63 @@
   {
    "id": "982a4fbe-7cf9-45dd-84ae-9854149db0b9",
    "cell_type": "markdown",
-   "source": "# jobflow",
+   "source": [
+    "# jobflow"
+   ],
    "metadata": {}
   },
   {
    "id": "e6180712-d081-45c7-ba41-fc5191f10427",
    "cell_type": "markdown",
-   "source": "## Define workflow with jobflow",
+   "source": [
+    "## Define workflow with jobflow\n",
+    "\n",
+    "This tutorial will demonstrate how to use the PWD with `jobflow` and load the workflow with `aiida` and `pyiron`.\n",
+    "\n",
+    "[`jobflow`](https://joss.theoj.org/papers/10.21105/joss.05995) was developed to simplify the development of high-throughput workflows. It uses a decorator-based approach to define the “Job“s that can be connected to form complex workflows (“Flow“s). `jobflow` is the workflow language of the workflow library [`atomate2`](https://chemrxiv.org/engage/chemrxiv/article-details/678e76a16dde43c9085c75e9), designed to replace [atomate](https://www.sciencedirect.com/science/article/pii/S0927025617303919), which was central to the development of the [Materials Project](https://pubs.aip.org/aip/apm/article/1/1/011002/119685/Commentary-The-Materials-Project-A-materials) database."
+   ],
    "metadata": {}
   },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "First, we start by importing the job decorator and the Flow class from jobflow, as welll as the necessary modules from the python workflow definition and the example arithmetic workflow."
+   ],
+   "metadata": {
+    "collapsed": false
+   },
+   "id": "69bedfb9ec12c092"
+  },
   {
    "id": "000bbd4a-f53c-4eea-9d85-76f0aa2ca10b",
    "cell_type": "code",
-   "source": "from jobflow import job, Flow",
+   "source": [
+    "from jobflow import job, Flow"
+   ],
    "metadata": {
     "trusted": true,
     "ExecuteTime": {
-     "end_time": "2025-04-24T10:30:16.328511Z",
-     "start_time": "2025-04-24T10:30:16.309562Z"
+     "end_time": "2025-04-24T12:51:34.747117656Z",
+     "start_time": "2025-04-24T12:51:33.203979325Z"
     }
    },
-   "outputs": [
-    {
-     "ename": "ModuleNotFoundError",
-     "evalue": "No module named 'jobflow'",
-     "output_type": "error",
-     "traceback": [
-      "\u001B[31m---------------------------------------------------------------------------\u001B[39m",
-      "\u001B[31mModuleNotFoundError\u001B[39m                       Traceback (most recent call last)",
-      "\u001B[36mCell\u001B[39m\u001B[36m \u001B[39m\u001B[32mIn[4]\u001B[39m\u001B[32m, line 1\u001B[39m\n\u001B[32m----> \u001B[39m\u001B[32m1\u001B[39m \u001B[38;5;28;01mfrom\u001B[39;00m\u001B[38;5;250m \u001B[39m\u001B[34;01mjobflow\u001B[39;00m\u001B[38;5;250m \u001B[39m\u001B[38;5;28;01mimport\u001B[39;00m job, Flow\n",
-      "\u001B[31mModuleNotFoundError\u001B[39m: No module named 'jobflow'"
-     ]
-    }
-   ],
-   "execution_count": 4
+   "outputs": [],
+   "execution_count": 1
   },
   {
    "id": "06c2bd9e-b2ac-4b88-9158-fa37331c3418",
    "cell_type": "code",
-   "source": "from python_workflow_definition.jobflow import write_workflow_json",
+   "source": [
+    "from python_workflow_definition.jobflow import write_workflow_json"
+   ],
    "metadata": {
     "trusted": true
    },
    "outputs": [],
    "execution_count": 2
   },
   {
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2025-04-24T10:30:04.618439Z",
-     "start_time": "2025-04-24T10:30:04.598701Z"
-    }
-   },
+   "metadata": {},
    "cell_type": "code",
    "source": [
     "from workflow import (\n",
@@ -85,7 +90,17 @@
    ],
    "id": "f9217ce7b093b5fc",
    "outputs": [],
-   "execution_count": 1
+   "execution_count": null
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Using the job object decorator, the imported functions from the arithmetic workflow are transformed into jobflow “Job”s. These “Job”s can delay the execution of Python functions and can be chained into workflows (“Flow”s). A “Job” can return serializable outputs (e.g., a number, a dictionary, or a Pydantic model) or a so-called “Response” object, which enables the execution of dynamic workflows where the number of nodes is not known prior to the workflow’s execution. "
+   ],
+   "metadata": {
+    "collapsed": false
+   },
+   "id": "2639deadfae9c591"
   },
   {
    "metadata": {
@@ -95,7 +110,9 @@
     }
    },
    "cell_type": "code",
-   "source": "workflow_json_filename = \"jobflow_simple.json\"",
+   "source": [
+    "workflow_json_filename = \"jobflow_simple.json\""
+   ],
    "id": "1feba0898ee4e361",
    "outputs": [],
    "execution_count": 2
@@ -110,31 +127,17 @@
     "get_prod_and_div = job(_get_prod_and_div)"
    ],
    "metadata": {
-    "trusted": true,
-    "ExecuteTime": {
-     "end_time": "2025-04-24T10:30:05.169761Z",
-     "start_time": "2025-04-24T10:30:05.043635Z"
-    }
+    "trusted": true
    },
-   "outputs": [
-    {
-     "ename": "NameError",
-     "evalue": "name 'job' is not defined",
-     "output_type": "error",
-     "traceback": [
-      "\u001B[31m---------------------------------------------------------------------------\u001B[39m",
-      "\u001B[31mNameError\u001B[39m                                 Traceback (most recent call last)",
-      "\u001B[36mCell\u001B[39m\u001B[36m \u001B[39m\u001B[32mIn[3]\u001B[39m\u001B[32m, line 1\u001B[39m\n\u001B[32m----> \u001B[39m\u001B[32m1\u001B[39m get_sum = \u001B[43mjob\u001B[49m(_get_sum)\n\u001B[32m      2\u001B[39m get_prod_and_div = job(_get_prod_and_div, data=[\u001B[33m\"\u001B[39m\u001B[33mprod\u001B[39m\u001B[33m\"\u001B[39m, \u001B[33m\"\u001B[39m\u001B[33mdiv\u001B[39m\u001B[33m\"\u001B[39m])\n",
-      "\u001B[31mNameError\u001B[39m: name 'job' is not defined"
-     ]
-    }
-   ],
-   "execution_count": 3
+   "outputs": [],
+   "execution_count": null
   },
   {
    "id": "ecef1ed5-a8d3-48c3-9e01-4a40e55c1153",
    "cell_type": "code",
-   "source": "obj = get_prod_and_div(x=1, y=2)",
+   "source": [
+    "obj = get_prod_and_div(x=1, y=2)"
+   ],
    "metadata": {
     "trusted": true
    },
@@ -144,7 +147,9 @@
   {
    "id": "2b88a30a-e26b-4802-89b7-79ca08cc0af9",
    "cell_type": "code",
-   "source": "w = get_sum(x=obj.output.prod, y=obj.output.div)",
+   "source": [
+    "w = get_sum(x=obj.output.prod, y=obj.output.div)"
+   ],
    "metadata": {
     "trusted": true
    },
@@ -154,17 +159,31 @@
   {
    "id": "a5e5ca63-2906-47c9-bac6-adebf8643cba",
    "cell_type": "code",
-   "source": "flow = Flow([obj, w])",
+   "source": [
+    "flow = Flow([obj, w])"
+   ],
    "metadata": {
     "trusted": true
    },
    "outputs": [],
    "execution_count": 8
   },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "As jobflow itself is only a workflow language, the workflows are typically executed on high-performance computers with a workflow manager such as [Fireworks](https://onlinelibrary.wiley.com/doi/full/10.1002/cpe.3505) or [jobflow-remote](https://github.com/Matgenix/jobflow-remote). For smaller and test workflows, simple linear, non-parallel execution of the workflow graph can be performed with jobflow itself. All outputs of individual jobs are saved in a database. For high-throughput applications typically, a MongoDB database is used. For testing and smaller workflows, a memory database can be used instead."
+   ],
+   "metadata": {
+    "collapsed": false
+   },
+   "id": "27688edd256f1420"
+  },
   {
    "id": "e464da97-16a1-4772-9a07-0a47f152781d",
    "cell_type": "code",
-   "source": "write_workflow_json(flow=flow, file_name=workflow_json_filename)",
+   "source": [
+    "write_workflow_json(flow=flow, file_name=workflow_json_filename)"
+   ],
    "metadata": {
     "trusted": true
    },
@@ -174,7 +193,9 @@
   {
    "id": "bca646b2-0a9a-4271-966a-e5903a8c9031",
    "cell_type": "code",
-   "source": "!cat {workflow_json_filename}",
+   "source": [
+    "!cat {workflow_json_filename}"
+   ],
    "metadata": {
     "trusted": true
    },
@@ -187,16 +208,34 @@
    ],
    "execution_count": 10
   },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Finally, you can write the workflow data into a JSON file to be imported later."
+   ],
+   "metadata": {
+    "collapsed": false
+   },
+   "id": "65389ef27c38fdec"
+  },
   {
    "id": "87a27540-c390-4d34-ae75-4739bfc4c1b7",
    "cell_type": "markdown",
-   "source": "## Load Workflow with aiida",
+   "source": [
+    "## Load Workflow with aiida\n",
+    "\n",
+    "In this part, we will demonstrate how to import the `jobflow` workflow into `aiida` via the PWD."
+   ],
    "metadata": {}
   },
   {
    "id": "66a1b3a6-3d3b-4caa-b58f-d8bc089b1074",
    "cell_type": "code",
-   "source": "from aiida import load_profile\n\nload_profile()",
+   "source": [
+    "from aiida import load_profile\n",
+    "\n",
+    "load_profile()"
+   ],
    "metadata": {
     "trusted": true
    },
@@ -215,17 +254,32 @@
   {
    "id": "4679693b-039b-45cf-8c67-5b2b3d705a83",
    "cell_type": "code",
-   "source": "from python_workflow_definition.aiida import load_workflow_json",
+   "source": [
+    "from python_workflow_definition.aiida import load_workflow_json"
+   ],
    "metadata": {
     "trusted": true
    },
    "outputs": [],
    "execution_count": 12
   },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "We import the necessary modules from `aiida` and the PWD, as well as the workflow JSON file."
+   ],
+   "metadata": {
+    "collapsed": false
+   },
+   "id": "cc7127193d31d8ef"
+  },
   {
    "id": "68c41a61-d185-47e8-ba31-eeff71d8b2c6",
    "cell_type": "code",
-   "source": "wg = load_workflow_json(file_name=workflow_json_filename)\nwg",
+   "source": [
+    "wg = load_workflow_json(file_name=workflow_json_filename)\n",
+    "wg"
+   ],
    "metadata": {
     "trusted": true
    },
@@ -246,10 +300,22 @@
    ],
    "execution_count": 13
   },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Finally, we are now able to run the workflow with `aiida`."
+   ],
+   "metadata": {
+    "collapsed": false
+   },
+   "id": "4816325767559bbe"
+  },
   {
    "id": "05228ece-643c-420c-8df8-4ce3df379515",
    "cell_type": "code",
-   "source": "wg.run()",
+   "source": [
+    "wg.run()"
+   ],
    "metadata": {
     "trusted": true
    },
@@ -265,13 +331,19 @@
   {
    "id": "2c942094-61b4-4e94-859a-64f87b5bec64",
    "cell_type": "markdown",
-   "source": "## Load Workflow with pyiron_base",
+   "source": [
+    "## Load Workflow with pyiron_base\n",
+    "\n",
+    "In this part, we will demonstrate how to import the `jobflow` workflow into `pyiron` via the PWD."
+   ],
    "metadata": {}
   },
   {
    "id": "ea102341-84f7-4156-a7d1-c3ab1ea613a5",
    "cell_type": "code",
-   "source": "from python_workflow_definition.pyiron_base import load_workflow_json",
+   "source": [
+    "from python_workflow_definition.pyiron_base import load_workflow_json"
+   ],
    "metadata": {
     "trusted": true
    },
@@ -281,7 +353,10 @@
   {
    "id": "8f2a621d-b533-4ddd-8bcd-c22db2f922ec",
    "cell_type": "code",
-   "source": "delayed_object_lst = load_workflow_json(file_name=workflow_json_filename)\ndelayed_object_lst[-1].draw()",
+   "source": [
+    "delayed_object_lst = load_workflow_json(file_name=workflow_json_filename)\n",
+    "delayed_object_lst[-1].draw()"
+   ],
    "metadata": {
     "trusted": true
    },
@@ -300,7 +375,9 @@
   {
    "id": "cf80267d-c2b0-4236-bf1d-a57596985fc1",
    "cell_type": "code",
-   "source": "delayed_object_lst[-1].pull()",
+   "source": [
+    "delayed_object_lst[-1].pull()"
+   ],
    "metadata": {
     "trusted": true
    },
@@ -322,14 +399,24 @@
    "execution_count": 17
   },
   {
-   "id": "9d819ed0-689c-46a7-9eff-0afb5ed66efc",
-   "cell_type": "code",
-   "source": "",
+   "cell_type": "markdown",
+   "source": [
+    "Here, the procedure is the same as before: Import the necessary `pyiron_base` module from the PWD, import the workflow JSON file and run the workflow with pyiron."
+   ],
    "metadata": {
-    "trusted": true
+    "collapsed": false
    },
+   "id": "9414680d1cbc3b2e"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
    "outputs": [],
-   "execution_count": null
+   "source": [],
+   "metadata": {
+    "collapsed": false
+   },
+   "id": "c199b28f3c0399cc"
   }
  ]
 }