-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataCatalog]: Simplify the way to access catalog #3923
Comments
The boilerplate required to extract the catalog from the session is clear. Do we have any insight on what's difficult about from kedro.config import OmegaConfigLoader
from kedro.io import DataCatalog
conf_loader = OmegaConfigLoader(conf_source="conf", base_env="base", default_run_env="local")
conf_catalog = conf_loader["catalog"]
catalog = DataCatalog.from_config(conf_catalog) ? (Asking because this was discussed in #2967) |
If we focus this issue on how to access the catalog for an existing project or session though, this is more of a Kedro Framework issue and not a |
From reading this issue it sounds to me that these users aren't aware of getting the catalog via the configloader like @astrojuanlu shows in the snippet above. We have worked on improving that massively for the |
During backlog refinement we decided to close this issue as there isn't a specific action for us to take. |
Description
Currently, there are two ways of accessing catalog: use
DataCatalog.load_from_config()
method or instantiate aKedroSession
, load context and access catalog from there.Users point that:
We propose to explore the feasibility of developing a clear and intuitive API for accessing the catalog directly from a Kedro project, eliminating the need for a session / hiding session creation.
Context
The current method for acquiring the Data Catalog is cumbersome and involves multiple complex steps, making it less user-friendly. The necessity to initiate a Kedro session and create a context adds unnecessary complexity for users who simply want to access the catalog. The pain point identified involves the complexity and inconsistency in accessing the data catalog from a Kedro project. The user highlights that obtaining the catalog typically requires navigating the Kedro documentation to find the appropriate code snippet to copy and paste, which is cumbersome and inefficient. To address this issue, the user created a custom function,
catalog_from_project()
, to streamline the process. This function simplifies the task but also suggests that such a utility might be beneficial if included directly within Kedro itself, improving accessibility and user experience.Frequent changes in this methods for acquiring a Kedro catalog across different versions (such as changes from Kedro 0.16 to 0.17) create difficulties in maintaining compatibility. This variability requires developers to implement complex logic in plugins like Kedro-viz to adapt to version differences.
Some users suggest having read-only
DataCatalog
Instance: creating a data catalog instance, at least for read-only use cases, which do not rely on creating a full-blown Kedro session.Implementation Notes
The session creation step is needed to apply hooks that can change the catalog upon loading, so it can be hard to eliminate session creation completely. We can consider encapsulating session creation logic and providing an interface such as
from kedro.framework.project.session.context import catalog
or/andfrom kedro.framework.project import catalog
with or without session creation.The text was updated successfully, but these errors were encountered: