Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kedro-dataset release process #405

Open
noklam opened this issue Mar 16, 2023 · 9 comments
Open

Kedro-dataset release process #405

noklam opened this issue Mar 16, 2023 · 9 comments

Comments

@noklam
Copy link
Contributor

noklam commented Mar 16, 2023

Introduction

How are we going to release when certain libraries are not compatible? i.e. if tensorflow has no support for Python3.11, how do we handle this in our CI?

Background

Since the separation of kedro-datasets, it's now possible to upgrade kedro / kedro-datasets separately. Prior to this, kedro was always compatible will all datasets so we didn't have this challenge before.

Problem

  • How do we make our CI works and allow certain DataSets to skip CI?
  • Should the user always install the latest version?
    • For example, let's say version 1.0.10 support Python3.10 for Tensorflow and 1.0.11 add support for Python3.11. In theory, if users are using Python<3.11, it would not be a problem if they install 1.0.11.

Possible Solution

  • We could create some kind of tag/decorators to skip tests in "file" or "module" level to skip tests. It may get a little bit messy w
@noklam
Copy link
Contributor Author

noklam commented Mar 17, 2023

Pre-requisite of

  • #2048

@astrojuanlu
Copy link
Member

I didn't think this through 100 % but maybe the whole point of having a meta-package like this is that we test it cohesively for certain versions of Python and dependencies? Otherwise maybe it would be better to just have each dataset on a separate package to avoid this conundrum, at the cost of increasing the overhead a bit.

@noklam
Copy link
Contributor Author

noklam commented Mar 22, 2023

kedro-org/kedro#2417 Related

@astrojuanlu
Copy link
Member

These days I'm working more with kedro-datasets and I'm feeling the pain of installing all the dependencies myself, so I understand where this frustration comes from.

But if we're packaging it as a single project in PyPI... I stand by my point, we should validate it as a whole.

@noklam
Copy link
Contributor Author

noklam commented Mar 22, 2023

@astrojuanlu How would we validate it as a whole? In this case, if tensorflow never release a Python3.11 version do we just don't release newer versions or do we bump the semantic version every time some libraries drop support and be okay with the rest?

This may be off-topic.

I was checking out examples of pyproject.toml yesterday and finding inspiration from pandas. They did not expose this ' optional-dependencyto PyPI (at least I couldn't figure out a way to dopip install pandas[hdf]` or equivalent).
https://pandas.pydata.org/docs/getting_started/install.html

Instead, they take a more passive path which just provide a list of optional dependency which you will just see the error and do the install yourself.
https://github.com/pandas-dev/pandas/blob/5c155883fdc5059ee7e21b20604a021d3aa92b01/pyproject.toml#L58

@astrojuanlu
Copy link
Member

How would we validate it as a whole? In this case, if tensorflow never release a Python3.11 version do we just don't release newer versions or do we bump the semantic version every time some libraries drop support and be okay with the rest?

This may be off-topic.

I share your concerns, it's just hard. But maybe it's an excuse to consider unbundling kedro-datasets into different packages.

Instead, they take a more passive path which just provide a list of optional dependency which you will just see the error and do the install yourself.

Yeah I think pandas does a great job at offering these optional dependencies as "progressive enhancement". But kedro-datasets is a collection of disjoint things, so I'm not sure they can be compared on equal grounds.

I know I'm not being very helpful, sorry about that 😬 My point is that I think we're trying to solve a problem that is just very hard, potentially introducing lots of complexity in our tests and CI and import mechanisms (see also kedro-org/kedro#138).

@astrojuanlu
Copy link
Member

5 months in, do you think there are any outstanding pain points we should address?

@noklam
Copy link
Contributor Author

noklam commented Aug 22, 2023

@astrojuanlu

  1. We have SnowparkTableDataSet which support Python 3.8 only. (thus build docs only works on Python 3.8, if one day A dataset works on Python 3.8 only and B Dataset works on Python 3.9 only then we have a new problem)
  2. There are no test written for it, so the problem is not surfaced.
  3. We may eventually have tests need to be run conditionally with specific Python Version

I wouldn't close the issue, but it seems that it is not causing any problem so we may just leave it for now.

@astrojuanlu
Copy link
Member

We have SnowparkTableDataSet which support Python 3.8 only. (thus build docs only works on Python 3.8, if one day A dataset works on Python 3.8 only and B Dataset works on Python 3.9 only then we have a new problem)

Yeah I think special cases like these merit having a separate package. Otherwise kedro-datasets will soon become a dumpster fire.

@astrojuanlu astrojuanlu transferred this issue from kedro-org/kedro Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants