|
| 1 | +--- |
| 2 | +jupyter: |
| 3 | + jupytext: |
| 4 | + text_representation: |
| 5 | + extension: .md |
| 6 | + format_name: markdown |
| 7 | + format_version: '1.3' |
| 8 | + jupytext_version: 1.13.7 |
| 9 | + kernelspec: |
| 10 | + display_name: Python 3 |
| 11 | + language: python |
| 12 | + name: python3 |
| 13 | +--- |
| 14 | + |
| 15 | +# Airflow |
| 16 | + |
| 17 | +For pipelines that support Python based execution you can directly use the |
| 18 | +TorchX API. TorchX is designed to be easily integrated in to other applications |
| 19 | +via the programmatic API. No special Airflow integrations are needed. |
| 20 | + |
| 21 | +With TorchX, you can use Airflow for the pipeline orchestration and run your |
| 22 | +PyTorch application (i.e. distributed training) on a remote GPU cluster. |
| 23 | + |
| 24 | +```python |
| 25 | +import datetime |
| 26 | +import pendulum |
| 27 | + |
| 28 | +from airflow.utils.state import DagRunState, TaskInstanceState |
| 29 | +from airflow.utils.types import DagRunType |
| 30 | +from airflow.models.dag import DAG |
| 31 | +from airflow.decorators import task |
| 32 | + |
| 33 | + |
| 34 | +DATA_INTERVAL_START = pendulum.datetime(2021, 9, 13, tz="UTC") |
| 35 | +DATA_INTERVAL_END = DATA_INTERVAL_START + datetime.timedelta(days=1) |
| 36 | +``` |
| 37 | + |
| 38 | +To launch a TorchX job from Airflow you can create a Airflow Python task to |
| 39 | +import the runner, launch the job and wait for it to complete. If you're running |
| 40 | +on a remote cluster you may need to use the virtualenv task to install the |
| 41 | +`torchx` package. |
| 42 | + |
| 43 | +```python |
| 44 | +@task(task_id=f'hello_torchx') |
| 45 | +def run_torchx(message): |
| 46 | + """This is a function that will run within the DAG execution""" |
| 47 | + from torchx.runner import get_runner |
| 48 | + with get_runner() as runner: |
| 49 | + # Run the utils.sh component on the local_cwd scheduler. |
| 50 | + app_id = runner.run_component( |
| 51 | + "utils.sh", |
| 52 | + ["echo", message], |
| 53 | + scheduler="local_cwd", |
| 54 | + ) |
| 55 | + |
| 56 | + # Wait for the the job to complete |
| 57 | + status = runner.wait(app_id, wait_interval=1) |
| 58 | + |
| 59 | + # Raise_for_status will raise an exception if the job didn't succeed |
| 60 | + status.raise_for_status() |
| 61 | + |
| 62 | + # Finally we can print all of the log lines from the TorchX job so it |
| 63 | + # will show up in the workflow logs. |
| 64 | + for line in runner.log_lines(app_id, "sh", k=0): |
| 65 | + print(line, end="") |
| 66 | +``` |
| 67 | + |
| 68 | +Once we have the task defined we can put it into a Airflow DAG and run it like |
| 69 | +normal. |
| 70 | + |
| 71 | +```python |
| 72 | +from torchx.schedulers.ids import make_unique |
| 73 | + |
| 74 | +with DAG( |
| 75 | + dag_id=make_unique('example_python_operator'), |
| 76 | + schedule_interval=None, |
| 77 | + start_date=DATA_INTERVAL_START, |
| 78 | + catchup=False, |
| 79 | + tags=['example'], |
| 80 | +) as dag: |
| 81 | + run_job = run_torchx("Hello, TorchX!") |
| 82 | + |
| 83 | + |
| 84 | +dagrun = dag.create_dagrun( |
| 85 | + state=DagRunState.RUNNING, |
| 86 | + execution_date=DATA_INTERVAL_START, |
| 87 | + data_interval=(DATA_INTERVAL_START, DATA_INTERVAL_END), |
| 88 | + start_date=DATA_INTERVAL_END, |
| 89 | + run_type=DagRunType.MANUAL, |
| 90 | +) |
| 91 | +ti = dagrun.get_task_instance(task_id="hello_torchx") |
| 92 | +ti.task = dag.get_task(task_id="hello_torchx") |
| 93 | +ti.run(ignore_ti_state=True) |
| 94 | +assert ti.state == TaskInstanceState.SUCCESS |
| 95 | +``` |
| 96 | + |
| 97 | +If all goes well you should see `Hello, TorchX!` printed above. |
| 98 | + |
| 99 | +## Next Steps |
| 100 | + |
| 101 | +* Checkout the [runner API documentation](../runner.rst) to learn more about |
| 102 | + programmatic usage of TorchX |
| 103 | +* Browse through the collection of [builtin components](../components/overview.rst) |
| 104 | + which can be used in your Airflow pipeline |
0 commit comments