Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCatalog]: Enhance _FrozenDatasets public API #3926

Closed
ElenaKhaustova opened this issue Jun 4, 2024 · 1 comment
Closed

[DataCatalog]: Enhance _FrozenDatasets public API #3926

ElenaKhaustova opened this issue Jun 4, 2024 · 1 comment
Assignees
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@ElenaKhaustova
Copy link
Contributor

Description

Users face challenges with understanding and effectively utilizing the _FrozenDatasets public API due to unclear documentation and limitations. They struggle to get dataset by name, iterate through datasets and get metadata. They express uncertainty about the advantages of using _FrozenDatasets, and find it unintuitive to work with due to its underscore prefix and limited functionality compared to the private API.

We propose:

  1. Enhance the FrozenDatasets public API to provide more comprehensive functionality, including the ability to iterate over the datasets ([DataCatalog]: Iterate through datasets objects in the catalog #3916), access some metadata (type of dataset, type of file, filepath), and utilize methods like get_by_name() for flexible dataset retrieval.
  2. Increase users' awareness of the _FrozenDatasets API through tutorials and documentation updates. Highlight the public API's capabilities and provide guidance on how to use it effectively for dataset management and retrieval.
  3. Consider allowing DataCatalog modifications and getting rid of _FrozenDatasets - this is a broader question related to another issue that will be linked later.

Context

Some quotes from the user feedback:

  • "_FrozenDataset class is very confusing because we don't know exactly what's protected. I think the class itself starts with an underscore, so it doesn't really feel safe to loop over a catalog.datasets and to run into a private class. And I even don't know how to handle it whether when I use catalog.datasets, I think it's just a standard dictionary."
  • "_FrozenDataset does not have the get accessor, so one cannot get dataset by name thus prefer using private _get_datset() method."
  • "There's no straightforward way to iterate over frozen datasets like for dataset in catalog.datasets, so you have to iterate via names and use private _get_dataset() method."

Screenshot 2024-06-03 at 15 52 14

  • "Users often don't even know that catalog api exists."
  • "The public API primarily offers basic functions such as searching datasets by name and performing load and save operations. This restrictiveness often necessitates the use of private APIs to access more detailed metadata not available through the public API, so one has to break it."

Screenshot 2024-06-04 at 23 43 09

@ElenaKhaustova ElenaKhaustova added the Issue: Feature Request New feature or improvement to existing feature label Jun 4, 2024
@ElenaKhaustova ElenaKhaustova changed the title [DataCatalog]: Enhance FrozenDatasets public API [DataCatalog]: Enhance _FrozenDatasets public API Jun 4, 2024
@ElenaKhaustova ElenaKhaustova self-assigned this Aug 5, 2024
@merelcht merelcht moved this to To Do in Kedro Framework Aug 5, 2024
@ElenaKhaustova
Copy link
Contributor Author

Solved in #4151

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Archived in project
Development

No branches or pull requests

4 participants