Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCatalog]: Autocompletion support for accessing datasets #3914

Closed
ElenaKhaustova opened this issue Jun 3, 2024 · 7 comments
Closed

[DataCatalog]: Autocompletion support for accessing datasets #3914

ElenaKhaustova opened this issue Jun 3, 2024 · 7 comments
Assignees
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@ElenaKhaustova
Copy link
Contributor

ElenaKhaustova commented Jun 3, 2024

Description

Users struggle to find datasets within the catalog, particularly when dealing with a large number of datasets. They express the need for autocomplete functionality when accessing datasets in the catalog.

We propose implementing autocompletion support for accessing datasets in the catalog, enabling users to receive suggestions for dataset names as they type.

Relates to #1721

Context

  • "We have a lot of catalog entries which is common because we store a lot of intermediate results, we go back and forth with the YAML file to find how they named the dataset."
  • "I think that the tab completion would be nice to have."
  • "The current separate configuration structure requires excessive navigation."
@ElenaKhaustova ElenaKhaustova added the Issue: Feature Request New feature or improvement to existing feature label Jun 3, 2024
@astrojuanlu
Copy link
Member

Re-stating what I said about dynamic properties (aka "pandas .column access") in #1721:

The problem with doing the dynamic properties is that some dataset names that are valid in YAML would become illegal in that way (same problem as with pandas columns) and also it would pollute the namespace of the DataCatalog (again, same problem)

Is this something that could be addressed with https://github.com/kedro-org/vscode-kedro @noklam ?

@noklam
Copy link
Contributor

noklam commented Jun 6, 2024

I think address this with VSCode extension is possible, but I think we should exhaust solutions that work for most of the things first. I know inherit from dict is bad for a reason, but this is almost the most ideal solution that satisfy all my needs. WDYT?

image

I like:

  • catalog is a dictionary-like interface, just like ConfigLoader
  • iterating on datasets feel very natural to me, though it was pointed out how should we handle datasets factory?
  • no implementation needed for auto-completion, works everywhere.

@astrojuanlu
Copy link
Member

This looks fantastic, and if it works on IPython I'm sure it will work in other places. Wondering if dicts are special-cased or if it's enough for a class to implement __getitem__ and keys().

According to https://stackoverflow.com/a/38732914, it's supported since 2014 ipython/ipython#5304

@astrojuanlu
Copy link
Member

@noklam
Copy link
Contributor

noklam commented Jun 6, 2024

Quote from Slack discussion, I think we have a promising solution now! (TypedDict). Is this ready enough to put in a sprint? Do we want to discuss on the API? I proposed one and asked in Slack

image
Noted that this is non-breaking and we can add it to the current DataCatalog without introducing one, but of course we also want to align we don't add new API that we are gonna deprecate next.

I suggest we list out all the requirements first, then we can decide whether dict, TypeDict,UserDict or something else is better.
Nice find, I find it's also important to test a few different targets (ipython, notebook, vscode, pycharm), my gut feeling is that there are no standard protocol but up to these IDEs to decide. dir is the well known one for attributes . autocompletion, the dictionary [bracket is more mysterious.

@astrojuanlu
Copy link
Member

If we want IPython and Jupyter autocompletion, there's no need to change the inheritance relationship of the DataCatalog class, it suffices with adding a _ipython_key_completions_() method

https://ipython.readthedocs.io/en/stable/config/integrating.html#tab-completion

See (the code snippet I linked above)

@merelcht
Copy link
Member

If we want IPython and Jupyter autocompletion, there's no need to change the inheritance relationship of the DataCatalog class, it suffices with adding a _ipython_key_completions_() method

https://ipython.readthedocs.io/en/stable/config/integrating.html#tab-completion

See (the code snippet I linked above)

If we can indeed do it this way, I'm all for it. Changing the DataCatalog to inherit from TypeDict is something we could experiment with for the newly design "DataCatalog2" (for lack of a better name). Right now, I need more clarity on the implications of making DataCatalog inherit from TypeDict and if that also influences the mutability etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Archived in project
Development

No branches or pull requests

4 participants