Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Dataset the first class interface #853

Closed
7 tasks done
Tracked by #830
GeorgesLorre opened this issue Feb 14, 2024 · 0 comments
Closed
7 tasks done
Tracked by #830

Make Dataset the first class interface #853

GeorgesLorre opened this issue Feb 14, 2024 · 0 comments

Comments

@GeorgesLorre
Copy link
Collaborator

GeorgesLorre commented Feb 14, 2024

Given our vision we want to promote the idea of a dataset.

Now fondant is very pipeline focused where we see a pipeline as a graph of operations with intermediate datasets. We should rethink the our primary interface to enable the dataset first approach:

We should pack more functionality in the Dataset class:

  • view data preview (html formatted like pandas)
  • view data schema / metadata
  • view lineage
  • ...

We should abstract the Pipeline idea. Compiling and starting a pipeline is a cheap operation so it should be less static. If we store the correct information on the dataset class we can create pipelines from a dataset (and its dependencies).

Tasks

  1. mrchtr
  2. Core
    mrchtr
  3. GeorgesLorre
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants