kedro-datasets release 1.5.0 doesn't reflect SparkDataSet well #290

ElenaMironovaQB · 2023-08-02T07:12:40Z

Description

Short description of the problem here.

After yesterdays release of kedro-datasets==1.5.0, our CI started failing during system tests which do a kedro run for a pipeline with spark (see the screenshot). As far as i can see, SparkDataSet is still defined with the same name as before. When we used kedro-datasets==1.4.2 the same tests were running smoothly. I also couldn't find anything specific in the release notes.

Context

How has this bug affected you? What were you trying to accomplish?

Our system tests which run kedro on pipelines with spark stopped running.
More discussion on slack: https://kedro-org.slack.com/archives/C03RKP2LW64/p1690896281915309

Steps to Reproduce

Run a pipeline, where kedro-datasets[spark.SparkDataSet] is used
[Second Step]
[And so on...]

Expected Result

Tell us what should happen.

The pipeline should run successfully till the end

Actual Result

Tell us what happens instead.

![Screenshot 2023-08-01 at 15 24 22](https://github.com/kedro-org/kedro-plugins/assets/64854268/a721559d-4687-42ac-bb9f-f83b351b6001)

-- Separate them if you have more than one.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

Kedro version used (pip show kedro or kedro -V): 0.18.11
Kedro plugin and kedro plugin version used (pip show kedro-airflow): kedro-datasets==1.5.0
Python version used (python -V): 3.8
Operating system and version: ubuntu-2004:202201-02

The text was updated successfully, but these errors were encountered:

noklam · 2023-08-10T13:52:56Z

I suspect this is the root cause #263

noklam · 2023-08-10T14:24:22Z

@sbrugman is pip install kedro-datasets[pandas.CSVDataSet] still possible? I think this is an undesire side-effect. I did some quick search and seem that the standard pyproject.toml doesn't support pip install kedor-datasets[pandas.CSVDataSet] but only pip install kedro-datasets[pandas].

At this point I don't think we want to bring in more advance tool like poetry just for this.

ElenaMironovaQB changed the title ~~<Title> kedro-datasets release 1.5.0 doesn't reflect SparkDataSet well~~ kedro-datasets release 1.5.0 doesn't reflect SparkDataSet well Aug 2, 2023

merelcht added the bug Something isn't working label Aug 2, 2023

This was referenced Aug 4, 2023

Automatically trigger kedro-starters release on the release of kedro kedro-org/kedro#2889

Merged

Create github action workflows for automatic release kedro-org/kedro-starters#140

Merged

DimedS self-assigned this Aug 9, 2023

ankatiyar mentioned this issue Aug 9, 2023

kedro-airflow builds are failing #295

Closed

DimedS linked a pull request Aug 11, 2023 that will close this issue

fix(datasets): Correct pyproject.toml syntax for optional dependencies #302

Merged

4 tasks

noklam mentioned this issue Aug 14, 2023

Kedro release 0.18.13 - official support for Python 3.11 kedro-org/kedro#2919

Closed

18 tasks

DimedS closed this as completed in #302 Aug 14, 2023

noklam self-assigned this Aug 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kedro-datasets release 1.5.0 doesn't reflect SparkDataSet well #290

kedro-datasets release 1.5.0 doesn't reflect SparkDataSet well #290

ElenaMironovaQB commented Aug 2, 2023

noklam commented Aug 10, 2023

noklam commented Aug 10, 2023

kedro-datasets release 1.5.0 doesn't reflect SparkDataSet well #290

kedro-datasets release 1.5.0 doesn't reflect SparkDataSet well #290

Comments

ElenaMironovaQB commented Aug 2, 2023

Description

Context

Steps to Reproduce

Expected Result

Actual Result

Your Environment

noklam commented Aug 10, 2023

noklam commented Aug 10, 2023