-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to maintain external datasets contributions #535
Comments
Link: #517 (comment) Maybe we can close this ticket? |
kedro-org/kedro#517 was a different (although related) discussion. In the middle of it though, I raised the question "Should we accept every dataset that is in good shape in kedro-datasets?" and the answer seemed to be yes. However, this was at the very end of our meeting and there was nearly not enough time to weigh pros and cons of this. So I'd say we keep it open. Having said that though, there's a number of pull requests open already, and I think it's unfair that we hold them because of lack of firm consensus on this topic. |
For example, consider discoverability. The fact that the current monorepo approach already hinders the visibility of the individual plugins, as described in #401 For datasets inside kedro-datasets, the effect is even larger. On top of that, the actual business logic of custom datasets is hidden behind private methods that don't get documented by default kedro-org/kedro#1936 (comment) |
(And this is aside from the maintenance issues @noklam mentioned) |
I think we are underestimating the maintenance burden of the current approach. Lots of people in the team have trouble building the docs locally, because one has to install all the dependencies of all datasets for that to work. @rashidakanchwala can attest - she struggled a lot, and now I'm unable to do it myself (troubleshooting some weird conflicts raised by pip). On the other hand, there have been users in the past that have been confused and couldn't even run the test suite. It happened for #360 and also for #435. I think it's time to seriously consider breaking kedro-datasets apart. |
I do keep wondering if we could have a Low-code dataset contribution workflow on the website that allowed us to accept contributions and manage the test suite for users. |
A user literally ran out of disk space when trying to install kedro-datasets test dependencies while troubleshooting a pip conflict #597 (comment) |
This happened to me this week while running tests to figure out the issues with the kedro-datasets dependencies 😬 |
Description
Why this is raised?
With more incoming datasets PR, it become harder to maintain all the datasets. Particularly for the exotic datasets, we don't have the setup for every possible environment (e.g. snowflake/databricks). This create challenge for maintaining all the datasets since we don't have the re
This also lead to the question "Does every datasets belongs to
kedro-datasets
?The answer is no, since there are few popular datasets maintained separately in
kedro-mlflow
as well.Possible Action
More Discussion
How to we want to maintain the contributions? How do we draw the line that something should be a separate plugins or going into
kedro-datasets
Cc @astrojuanluIdea raised during retro:
kedro-mlflow
has its own datasets.The text was updated successfully, but these errors were encountered: