Kedro-dataset release process #405

noklam · 2023-03-16T19:15:54Z

Introduction

How are we going to release when certain libraries are not compatible? i.e. if tensorflow has no support for Python3.11, how do we handle this in our CI?

Background

Since the separation of kedro-datasets, it's now possible to upgrade kedro / kedro-datasets separately. Prior to this, kedro was always compatible will all datasets so we didn't have this challenge before.

Problem

How do we make our CI works and allow certain DataSets to skip CI?
Should the user always install the latest version?
- For example, let's say version 1.0.10 support Python3.10 for Tensorflow and 1.0.11 add support for Python3.11. In theory, if users are using Python<3.11, it would not be a problem if they install 1.0.11.

Possible Solution

We could create some kind of tag/decorators to skip tests in "file" or "module" level to skip tests. It may get a little bit messy w

The text was updated successfully, but these errors were encountered:

noklam · 2023-03-17T13:22:56Z

Pre-requisite of

#2048

astrojuanlu · 2023-03-20T15:22:44Z

I didn't think this through 100 % but maybe the whole point of having a meta-package like this is that we test it cohesively for certain versions of Python and dependencies? Otherwise maybe it would be better to just have each dataset on a separate package to avoid this conundrum, at the cost of increasing the overhead a bit.

noklam · 2023-03-22T15:16:01Z

kedro-org/kedro#2417 Related

astrojuanlu · 2023-03-22T15:26:52Z

These days I'm working more with kedro-datasets and I'm feeling the pain of installing all the dependencies myself, so I understand where this frustration comes from.

But if we're packaging it as a single project in PyPI... I stand by my point, we should validate it as a whole.

noklam · 2023-03-22T15:37:58Z

@astrojuanlu How would we validate it as a whole? In this case, if tensorflow never release a Python3.11 version do we just don't release newer versions or do we bump the semantic version every time some libraries drop support and be okay with the rest?

This may be off-topic.

I was checking out examples of pyproject.toml yesterday and finding inspiration from pandas. They did not expose this ' optional-dependencyto PyPI (at least I couldn't figure out a way to dopip install pandas[hdf]` or equivalent).
https://pandas.pydata.org/docs/getting_started/install.html

Instead, they take a more passive path which just provide a list of optional dependency which you will just see the error and do the install yourself.
https://github.com/pandas-dev/pandas/blob/5c155883fdc5059ee7e21b20604a021d3aa92b01/pyproject.toml#L58

astrojuanlu · 2023-03-22T15:51:14Z

How would we validate it as a whole? In this case, if tensorflow never release a Python3.11 version do we just don't release newer versions or do we bump the semantic version every time some libraries drop support and be okay with the rest?

This may be off-topic.

I share your concerns, it's just hard. But maybe it's an excuse to consider unbundling kedro-datasets into different packages.

Instead, they take a more passive path which just provide a list of optional dependency which you will just see the error and do the install yourself.

Yeah I think pandas does a great job at offering these optional dependencies as "progressive enhancement". But kedro-datasets is a collection of disjoint things, so I'm not sure they can be compared on equal grounds.

I know I'm not being very helpful, sorry about that 😬 My point is that I think we're trying to solve a problem that is just very hard, potentially introducing lots of complexity in our tests and CI and import mechanisms (see also kedro-org/kedro#138).

astrojuanlu · 2023-08-22T13:38:46Z

5 months in, do you think there are any outstanding pain points we should address?

noklam · 2023-08-22T13:54:01Z

@astrojuanlu

We have SnowparkTableDataSet which support Python 3.8 only. (thus build docs only works on Python 3.8, if one day A dataset works on Python 3.8 only and B Dataset works on Python 3.9 only then we have a new problem)
There are no test written for it, so the problem is not surfaced.
We may eventually have tests need to be run conditionally with specific Python Version

I wouldn't close the issue, but it seems that it is not causing any problem so we may just leave it for now.

astrojuanlu · 2023-08-22T13:56:14Z

We have SnowparkTableDataSet which support Python 3.8 only. (thus build docs only works on Python 3.8, if one day A dataset works on Python 3.8 only and B Dataset works on Python 3.9 only then we have a new problem)

Yeah I think special cases like these merit having a separate package. Otherwise kedro-datasets will soon become a dumpster fire.

noklam mentioned this issue Mar 23, 2023

Support python 3.11 kedro-org/kedro#2270

Closed

astrojuanlu transferred this issue from kedro-org/kedro Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kedro-dataset release process #405

Kedro-dataset release process #405

noklam commented Mar 16, 2023 •

edited

Loading

noklam commented Mar 17, 2023

astrojuanlu commented Mar 20, 2023

noklam commented Mar 22, 2023 •

edited

Loading

astrojuanlu commented Mar 22, 2023

noklam commented Mar 22, 2023

astrojuanlu commented Mar 22, 2023

astrojuanlu commented Aug 22, 2023

noklam commented Aug 22, 2023

astrojuanlu commented Aug 22, 2023

Kedro-dataset release process #405

Kedro-dataset release process #405

Comments

noklam commented Mar 16, 2023 • edited Loading

Introduction

Background

Problem

Possible Solution

noklam commented Mar 17, 2023

astrojuanlu commented Mar 20, 2023

noklam commented Mar 22, 2023 • edited Loading

astrojuanlu commented Mar 22, 2023

noklam commented Mar 22, 2023

astrojuanlu commented Mar 22, 2023

astrojuanlu commented Aug 22, 2023

noklam commented Aug 22, 2023

astrojuanlu commented Aug 22, 2023

noklam commented Mar 16, 2023 •

edited

Loading

noklam commented Mar 22, 2023 •

edited

Loading