Skip to content

Commit

Permalink
Merge pull request #158 from AstraZeneca/improved_docs
Browse files Browse the repository at this point in the history
chore: update workflows
  • Loading branch information
vijayvammi authored Jun 12, 2024
2 parents 54c3217 + 2325a7b commit f7bfb2b
Show file tree
Hide file tree
Showing 10 changed files with 55 additions and 241 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,5 +34,5 @@ jobs:
argo version
- run: |
python -m poetry install --without docs,binary,perf,tutorial
python -m poetry install --without docs,binary,perf,tutorial,compare
poetry run tox
2 changes: 1 addition & 1 deletion .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
argo version
- run: python -m pip install poetry
- run: |
python -m poetry install --without docs,binary,perf,tutorial
python -m poetry install --without docs,binary,perf,tutorial,compare
poetry run tox
Release:
Expand Down
Binary file modified docs/assets/work_dark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
217 changes: 0 additions & 217 deletions docs/concepts/the-big-picture.md

This file was deleted.

Binary file added docs/image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
27 changes: 17 additions & 10 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@
<span style="font-size:0.75em;">
<a href="https://www.flaticon.com/free-icons/runner" title="runner icons">Runner icons created by Leremy - Flaticon</a>
</span>
---


<hr style="border:2px dotted orange">

## Example

Expand Down Expand Up @@ -78,8 +80,9 @@ The difference between native driver and runnable orchestration:
- [x] Reproducible by default, runnable stores metadata about code/data/config for every execution.
- [x] The pipeline is `runnable` in any environment.

<hr style="border:2px dotted orange">

## why runnable?
## Why runnable?

Obviously, there are a lot of orchestration tools. A well maintained and curated [list is
available here](https://github.com/EthicalML/awesome-production-machine-learning/).
Expand All @@ -106,16 +109,14 @@ Broadly, they could be classed into ```native``` or ```meta``` orchestrators.
- Easy to get started and run locally.
- Ideal for quick experimentation or research activities.

```runnable``` is a _meta_ orchestrator with simple API, geared towards data engineering, data science activities.
```runnable``` is a _meta_ orchestrator with simple API, geared towards data engineering, data science projects.
It works in conjunction with _native_ orchestrators and an alternative to [kedro](https://docs.kedro.org/en/stable/index.html)
or [metaflow](https://metaflow.org/).

```runnable``` could also function as an SDK for _native_ orchestrators as it always compiles pipeline definitions
to _native_ orchestrators.




```runnable``` stands out based on these design principles.

<div class="grid cards" markdown>

- :material-clock-fast:{ .lg .middle } __Easy to adopt, its mostly your code__
Expand All @@ -126,13 +127,13 @@ or [metaflow](https://metaflow.org/).

- No API's or decorators or any imposed structure.

[:octicons-arrow-right-24: Getting started](concepts/the-big-picture.md)
[:octicons-arrow-right-24: Getting started](concepts/index.md)

- :building_construction:{ .lg .middle } __Bring your infrastructure__

---

Minimal disruption to your current infrastructure patterns.
```runnable``` is not a platform. It works with your platforms.

- ```runnable``` composes pipeline definitions suited to your infrastructure.

Expand Down Expand Up @@ -173,7 +174,13 @@ or [metaflow](https://metaflow.org/).

Moving away from runnable is as simple as deleting relevant files.

- Your application code remains as it is.


</div>

## Comparisons/alternatives
<hr style="border:2px dotted orange">

## Comparisons

--8<-- "examples/comparisons/README.md"
42 changes: 32 additions & 10 deletions examples/comparisons/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,17 @@ func_task = PythonTask(name="function", function=func, returns=["z"], catalog=ca

Below are the implementations in alternative frameworks. Note that
the below are the best of our understanding of the frameworks, please let us
know if there are alternate implementations.
know if there are better implementations.


Along with the observations, we have implemented [MNIST example in pytorch](https://github.com/pytorch/examples/blob/main/mnist/main.py)
in multiple frameworks for comparing actual implementations against popular examples.

<hr style="border:2px dotted orange">

### metaflow

The function in metaflow's step would rougly be:
The function in metaflow's step would roughly be:

```python
from metaflow import step, conda, FlowSpec
Expand All @@ -62,8 +68,6 @@ class Flow(FlowSpec)

The differences:



##### dependency management:

```runnable``` depends on the activated virtualenv for dependencies which is natural to python.
Expand Down Expand Up @@ -99,9 +103,17 @@ pipelines themselves and can run in isolation. This is not true in ```metaflow``

##### unit testing pipelines

```runnable``` pipelines are testable using ```mocked``` executor where the executables can be mocked/patched. In ```metaflow```, it depends on how the
python function is wrapped in the pipeline.
```runnable``` pipelines are testable using ```mocked``` executor where the executables can be mocked/patched.
In ```metaflow```, it depends on how the python function is wrapped in the pipeline.

##### distributed training

```metaflow``` supports distributed training.

As of now, ```runnable``` does not support distributed training but is in the works.


<hr style="border:2px dotted orange">

### kedro

Expand All @@ -128,17 +140,17 @@ def create_pipeline(**kwargs) -> Pipeline:

```

##### Structure
##### Footprint

Kedro needs a structure and configuration to set up a new project and provides
a CLI to get started.
```kedro``` has a larger footprint in the domain code by the configuration files. It is tightly structured and
provides a CLI to get started.

To use ```runnable``` as part of the project requires
adding a pipeline definition file (in python or yaml) and an optional configuration file.

##### dataflow

Kedro needs the data flowing through the pipeline via catalog.yaml which
Kedro needs the data flowing through the pipeline via ```catalog.yaml``` which
provides a central place to understand the data.

In ```runnable```, the data is presented to the individual tasks as
Expand All @@ -147,3 +159,13 @@ requested by the ```catalog``` instruction.
##### notebooks

Kedro supports notebooks for exploration but not as tasks of the pipeline.

##### dynamic pipelines

```kedro``` does not support dynamic pipelines or map state.

##### distributed training

```kedro``` supports distributed training via a [plugin](https://github.com/getindata/kedro-azureml).

As of now, ```runnable``` does not support distributed training but is in the works.
1 change: 1 addition & 0 deletions examples/comparisons/kedro/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Please [follow this repository](https://github.com/toohsk/kedro_gradio/tree/mnist-example) for the setup.
1 change: 1 addition & 0 deletions examples/comparisons/kfp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
The best implementation would be [similar to this](https://medium.com/@lorenzo.colombi/kubeflow-pipeline-v2-tutorial-end-to-end-mnist-classifier-example-dc66714c2649).
Loading

0 comments on commit f7bfb2b

Please sign in to comment.