-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add getting started documentation #250
Conversation
) | ||
|
||
my_pipeline.add_op(load_from_hf_hub, dependencies=[]) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it is the best example since it seems quite complex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might just have to explain it a bit better, as it is the most useful component to easily load some data to get started. Not sure if we want to describe this as a third kind of component, which is a "generic" component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GeorgesLorre! don't forget to add the file to the documentation index.
) | ||
|
||
my_pipeline.add_op(load_from_hf_hub, dependencies=[]) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might just have to explain it a bit better, as it is the most useful component to easily load some data to get started. Not sure if we want to describe this as a third kind of component, which is a "generic" component.
|
||
## Running your pipeline | ||
|
||
A Fondant pipeline needs to be compiled before it can be ran. This means translating the user friendly Fondant pipeline definition into something that can be executed by a runner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave out the compilation step here and only explain the combined functionality which you get when using fondant run
.
pyproject.toml
Outdated
kfp = { version = ">= 1.8.19"} | ||
|
||
kfp = { version = ">= 1.8.19", optional = true } | ||
kubernetes = { version = ">= 18.20.0", optional = true } | ||
pandas = { version = ">= 1.3.5", optional = true } | ||
|
||
[tool.poetry.extras] | ||
pipelines = ["kfp", "kubernetes"] | ||
pipelines = ["kubernetes"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for these changes? kfp
is quite a big dependency which you don't need when installing fondant
in a component, which is why we added it as an optional dependency to the pipeline
extra.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kfp is used by the pipeline.py so it is needed for fondant compile
and fondant run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so then the user needs to install fondant[pipelines]
(fondant[kfp]
might be better). But we don't want to install this in every component.
docs/getting_started.md
Outdated
|
||
Note that if you use a local `base_path` in your pipeline declaration that this path will be mounted in the docker containers. This means that the data will be stored locally on your machine. If you use a cloud storage path, the data will be stored in the cloud. | ||
|
||
Now that we have compiled our pipeline, we can run it: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should mentioned that the docker images are stored in the Github container registry. I didn't pulled images from there before which leads me into an additional docker login step.
Basically following the steps here: https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-to-the-container-registry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to login when the images are public ? Or only for the private ones ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GeorgesLorre, some minor comments, but otherwise looks good!
README.md
Outdated
@@ -85,6 +85,11 @@ Eg. generating logos: | |||
|
|||
<p align="right">(<a href="#chocolate_bar-fondant">back to top</a>)</p> | |||
|
|||
## 💨 Getting Started |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this before the example pipelines
dataframe[[("images", "data")]].map(extract_dimensions) | ||
dataframe[[("images", "width"), ("images", "height")]] = dataframe[ | ||
[("images", "data")] | ||
].apply(lambda x: extract_dimensions(x.iloc[0]), axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the column name here instead of the index? That would make it easier to understand.
docs/getting_started.md
Outdated
|
||
my_pipeline = Pipeline( | ||
pipeline_name='my_pipeline', | ||
base_path='/home/username/my_pipeline', <--- Make sure to update this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base_path='/home/username/my_pipeline', <--- Make sure to update this | |
base_path='/home/username/my_pipeline', # TODO: update this |
docs/getting_started.md
Outdated
|
||
Now that we have a pipeline, we can add components to it. Components are the building blocks of your pipeline. They are the individual steps that will be executed in your pipeline. There are 2 main types of components: | ||
|
||
- reusable components: These are components that are already created by the community and can be easily used in your pipeline. You can find a list of reusable components [here](https://github.com/ml6team/fondant/tree/main/components). They often have arguments that you can set to configure them for your use case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- reusable components: These are components that are already created by the community and can be easily used in your pipeline. You can find a list of reusable components [here](https://github.com/ml6team/fondant/tree/main/components). They often have arguments that you can set to configure them for your use case. | |
- **reusable components**: These are components that are already created by the community and can be easily used in your pipeline. You can find a list of reusable components [here](https://github.com/ml6team/fondant/tree/main/components). They often have arguments that you can set to configure them for your use case. |
docs/getting_started.md
Outdated
|
||
- reusable components: These are components that are already created by the community and can be easily used in your pipeline. You can find a list of reusable components [here](https://github.com/ml6team/fondant/tree/main/components). They often have arguments that you can set to configure them for your use case. | ||
|
||
- custom components: These are the components you create to solve your use case. A custom component can be easily created by adding a `fondant_component.yaml`, `dockerfile` and `main.py` file to your component subdirectory. The `fondant_component.yaml` file contains the specification of your component. You can find more information about it [here](https://github.com/ml6team/fondant/blob/main/docs/component_spec.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- custom components: These are the components you create to solve your use case. A custom component can be easily created by adding a `fondant_component.yaml`, `dockerfile` and `main.py` file to your component subdirectory. The `fondant_component.yaml` file contains the specification of your component. You can find more information about it [here](https://github.com/ml6team/fondant/blob/main/docs/component_spec.md) | |
- **custom components**: These are the components you create to solve your use case. A custom component can be easily created by adding a `fondant_component.yaml`, `dockerfile` and `main.py` file to your component subdirectory. The `fondant_component.yaml` file contains the specification of your component. You can find more information about it [here](https://github.com/ml6team/fondant/blob/main/docs/component_spec.md) |
docs/getting_started.md
Outdated
logger.info("Filtering dataset...") | ||
|
||
dataframe[[("images", "width"), ("images", "height")]] = \ | ||
dataframe[[("images", "data")]].apply(lambda x:extract_dimensions(x.iloc[0]), axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
No description provided.