-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Pipeline interface #886
Comments
With Fondant we want to become Dataset focused this means that the current way of defining a Fondant workflow could use some changes to promote this: This is the current way: # Define pipeline
pipeline = Pipeline(
name="foo", description="bar", base_path="baz")
)
# Register components
data1 = pipeline.read(a_component)
data2 = data1.apply(a_different_component) Ideally we want to go to something like this: # Register components
data1 = Dataset.read("some_ref_to_a_manifest")
data2 = data1.apply(a_different_component) Where you get new datasets by applying operations on existing datasets. The current Pipeline class has a couple of responsibilities:
We need to redistribute these responsibilities if we want to remove the pipeline interface. Move to the Compiler/Runner:
Move to the Dataset:
The different compilers/runners will then need to work with a dataset as input we will also need logic here to build the correct graph of operations and translate it into the runner specific pipeline spec. |
As discussed offline with @GeorgesLorre, first step is to merge the |
No description provided.
The text was updated successfully, but these errors were encountered: