From bd21e5a610b6c93fb998f74b8711e6b66e3fbfde Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Georges=20Lorr=C3=A9?= <35808396+GeorgesLorre@users.noreply.github.com> Date: Fri, 18 Aug 2023 15:53:22 +0200 Subject: [PATCH] Update docs with the new CLI commands (#370) Co-authored-by: Philippe Moussalli --- docs/getting_started.md | 7 +++++ docs/pipeline.md | 69 ++++++++++++++++++++++++++++++++++++++--- 2 files changed, 71 insertions(+), 5 deletions(-) diff --git a/docs/getting_started.md b/docs/getting_started.md index 1ef68e440..fe8328695 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -309,3 +309,10 @@ fondant explore --data-directory "path/to/your/data" ``` Note that if you use a remote path (S3, GCS) you can also pass credentials using the `--credentials` flag. For all the options of the data explorer run `fondant explore --help`. + + + +## Running at scale + +You can find more information on how to configure and run your pipeline on different runners [here](pipeline.md) + diff --git a/docs/pipeline.md b/docs/pipeline.md index beb355601..756830b72 100644 --- a/docs/pipeline.md +++ b/docs/pipeline.md @@ -115,15 +115,67 @@ where processing one row significantly increases the number of rows in the datas By setting a lower value for input partition rows, you can mitigate issues where the processed data grows larger than the available memory before being written to disk. -## Compiling a pipeline +## Compiling and Running a pipeline Once all your components are added to your pipeline you can use different compilers to run your pipeline: +!!! note "IMPORTANT" + When using other runners you will need to make sure that your new environment has access to: + - The base_path of your pipeline (can be storage bucket like S3, GCS, etc) + - The images used in your pipeline (make sure you have access to the registries where the images are stored) + ### Kubeflow -TODO: update this once kubeflow compiler is implemented -~~Once the pipeline is built, you need to initialize the client with the kubeflow host path (more info about the host path can be found in the [infrastructure documentation](https://github.com/ml6team/fondant/blob/main/docs/infrastructure.md)) -and use it to compile and run the pipeline with the `compile_and_run()` method. This performs static checking to ensure that all required arguments are provided to the components and that the required input data subsets are available. If the checks pass, a URL will be provided, allowing you to visualize and monitor the execution of your pipeline.~~ +The Kubeflow compiler will take your pipeline and compile it to a Kubeflow pipeline spec. This spec can be used to run your pipeline on a Kubeflow cluster. There are 2 ways to compile your pipeline to a Kubeflow spec: + +- Using the CLI: +```bash +fondant compile --kubeflow --output +``` + +- Using the compiler directly: +```python +from fondant.compiler import KubeFlowCompiler + + +pipeline = ... + +compiler = KubeFlowCompiler() +compiler.compile(pipeline=pipeline, output_path="pipeline.yaml") +``` + +Both of these options will produce a kubeflow specification as a file, if you also want to immediately start a run you can also use the runner we provide (see below). + +### Running a Kubeflow compiled pipeline + +You will need a Kubeflow cluster to run your pipeline on and specify the host of that cluster. More info on setting up a Kubeflow pipelines deployment and the host path can be found in the [infrastructure documentation](infrastructure.md). + +There are 2 ways to run a Kubeflow compiled pipeline: + +- Using the CLI: +```bash +fondant run --kubeflow --host +``` +NOTE: that the pipeline ref is the path to the compiled pipeline spec OR a reference to an fondant pipeline in which case the compiler will compile the pipeline first before running. + + +- Using the compiler directly: +```python +from fondant.compiler import KubeFlowCompiler +from fondant.runner import KubeflowRunner + +# Your pipeline definition here + +if __name__ == "__main__": +    compiler = KubeFlowCompiler() +    compiler.compile(pipeline=pipeline, output_path="pipeline.yaml") +    runner = KubeflowRunner( +        host="YOUR KUBEFLOW HOST", +    ) +    runner.run(input_spec="pipeline.yaml") +``` + +Once your pipeline is running you can monitor it using the Kubeflow UI. ### Docker-Compose @@ -188,4 +240,11 @@ Navigate to the folder where your docker compose is located and run (you need to docker compose up ``` -This will start the pipeline and provide logs per component(service) \ No newline at end of file +Or you can use the fondant cli to run the pipeline: +```bash +fondant run --local +``` + +NOTE: that the pipeline ref is the path to the compiled pipeline spec OR a reference to an fondant pipeline in which case the compiler will compile the pipeline first before running. + +This will start the pipeline and provide logs per component(service). \ No newline at end of file