-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kedro catalog factory list
CLI command
#2796
Add kedro catalog factory list
CLI command
#2796
Conversation
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Implementation wise, manually tested and it works as expected! 💯 |
I think this is an interesting point. I tend to agree that |
This new command should also be added to the docs with a short explanation. |
I had a look at the existing 'list' style commands and we have:
As a note here - it could be more intuitive to reduce the number of keywords by changing The convention appears to be putting the verb at the end of the command with additional parameters being used to narrow down the output. So initial thoughts would be that However, it does depend on whether there are future plans for providing different types of outputs to allow users to better understand their catalog i.e. returning a full 'expanded' catalog, or providing a dataset name and returning which pattern it matched. In this case we could consider the addition of another key word like |
Thanks for the detailed analysis @amandakys ! I think reading this and keeping in mind that we have #2791 coming up I would suggest going for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! I left some small comments. When those are resolved + the name of the command this will be pretty much ready for merging 🙂
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
kedro catalog factories
CLI commandkedro catalog factory list
CLI command
Fixed! |
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the implementation looks good and concise the test are also well-structured and verifies the core functionality. I've just left some minor comments and nitpicks. I think it will be ready to merge once the the other reviewer comments are also addressed. Happy to approve once thats done. 😄
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Still in draft mode as I've got another question for reviewers: I've refactored Does anyone have any opinions on this? |
Also to note - the code that is tested by
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @AhdraMeraliQB ! ⭐ The recursive approach in the cli test is great 👍
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementations look all good, I have other opinion but I have seen most people agree with the current naming, so I don't want to block it because of this.
I just want to leave my comment. I am not a fan of kedro catalog factory list
because it is hard to remember and I have never seen any cli that is 4 levels deep. While data factory
is something we called, I really think this should be just an advance feature of catalog
. Similarly we have kedro pipeline
instead of kedro modular_pipeline
.
In additional, having everything as one command is easier to discover and read in documentations compare to multiple commands. I feel like this is a bit similar to our discussion with pipeline structure (flat vs hierarchical)
I agree with @noklam that I agree we should try to limit bikeshedding |
In my mind I prefer just kedro catalog list and introduce new arguments - - resolve for the addition functions Pro:
Cons:
If we are going to change these CLI, it's a breaking change and we cannot add it later. Introducing a new command is not breaking but it is more intrusive we introduce a new one and later reactor it into other commands. |
After discussion with @merelcht, @noklam, @amandakys, @ankatiyar, and @astrojuanlu we have decided to implement this functionality under the command |
Ignore the target branch for now, #2793 to be merged first.
Description
This PR introduces the command `kedro catalog factory list <edited> which lists out all dataset factories in the catalog config in order of how they are matched (i.e. if an explicitly named dataset could match several of the factories in the catalog, it would be resolved with the first on the list.)
The priority is determined as such:
Question for reviewers
An alternative suggestion for the name of the command was
kedro catalog patterns
. I went with factories to attempt to streamline the terminology used in the docs and other communications, but am otherwise impartial. Any thoughts on this?Added note on streamlining the terminology
It looks like
dataset factory
is equiv tofactory pattern
,catalog factory
,dataset pattern
,dataset factory pattern
and others that have been used interchangeably. As per #2670 any mentions of a catalog entry that makes use of placeholders will be referred to as adataset factory
.Catalog factories
refer to all dataset factories within a specific catalog.What this means in practice / Why this is useful
As seen in the test case, a catalog is created with the following patterns:
an_example_{place}_{holder}
,an_example_{placeholder}
,an_{example_placeholder}
,on_{example_placeholder}
,The explicitly declared
an_example_data_set
could match any of patterns 1, 2, or 3:an_example_{data}_{set}
,an_example_{data_set}
,an_{example_data_set}
. Priority matching means that it will be resolved with the first pattern.kedro catalog factories
allows users to confirm the order in which factory patterns are considered for matching.Checklist
RELEASE.md
file