[KED-1796] Problems with relative paths under Windows #412

fdroessler · 2020-06-15T07:56:47Z

Hopefully this error is not due to a change in kedro that I have missed but I had a couple of people reporting this to me.

Description

Switching from absolute paths to relative paths results in some (maybe cached) state where kedro tries to load a non-existing version of a dataset.

Context

We have developed a ML pipeline for which the data was stored in a separate partition (D:/data) and the kedro projekt repo was on something like C:/code/project. Subsequently on a different machine, a colleague cloned the repo and copied the data over into the data folder within the project repo. Now the code was in C:/code/ and the data in C:/code/data/ and the paths in catalog.yml were adjusted to reflect the new location using relative paths. When starting IPython via the kedro command (kedro ipython) everything seemed ok but trying to load a versioned dataset resulted in an error. The error said that a specific version of the dataset could not be loaded from disk. Error: VersionNotFoundError:

Trying to explicitly load a version that exists on the machine resulted in an error because for some reason a C:/ was appended to the file URI so that it looked like this: <path to versioned file>C:.
`01_raw/preprocessed_shuttles.csv/2020-06-15T07.44.54.647Z/C:'

If instead of relative paths data/dataset absolute paths are used C:/..../kedro_project/data/dataset everything works as expected.

Steps to Reproduce using space flight tutorial

Catalog pointing to a different location initially.
At this pointhttps://kedro.readthedocs.io/en/stable/03_tutorial/04_create_pipelines.html#persisting-pre-processed-data use versioned datasets
Copy all files over to the project data directory and change catalog.yml to relative paths: data/...
Start ipython using kedro ipython and try loading catalog.load("preprocessed_shuttles") or catalog.load("preprocessed_shuttles", version='2020-06-15T07.44.54.647Z') to see both errors.

Expected Result

Kedro should load the latest dataset on disk independent of absolute and relative paths.

Actual Result

Copying the data and changing the yaml to relative paths results in error messages trying to load none existing versions.

VersionNotFoundError: Did not find any versions for CSVDataSet(filepath=C:\Users\user\git\space\data\01_raw\preprocessed_shuttles.csv, protocol=file, save_args={'index': False}, version=Version(load=None, save='2020-06-15T07.45.43.589Z'))

DataSetError: Failed while loading data from data set CSVDataSet(filepath=C:\Users\user\git\space\data\01_raw\preprocessed_shuttles.csv, protocol=file, save_args={'index': False}, version=Version(load='2020-06-15T07.44.54.647Z', save=None)).
[WinError 123] Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbezeichnung ist falsch: 'C:/Users/user/git/space/data/01_raw/preprocessed_shuttles.csv/2020-06-15T07.44.54.647Z/C:'

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

Kedro version used (pip show kedro or kedro -V): 0.16.1
Python version used (python -V): 3.7.7
Operating system and version: Windows 10

The text was updated successfully, but these errors were encountered:

idanov · 2020-06-18T15:42:31Z

@fdroessler Thank you for reporting this! We've been working through a backlog of some Windows specific issues and we'll add this one to the list. Once we have a solution for this, we'll share what that is and release a bugfix.

idanov · 2020-06-18T15:46:12Z

Linking in issue #390 since it seems that it's reporting the same problem.

andrii-ivaniuk · 2020-06-26T13:03:13Z

@fdroessler Thank you for reporting this issue. It was fixed in 390c02f commit.

fdroessler added the Issue: Bug Report 🐞 Bug that needs to be fixed label Jun 15, 2020

idanov changed the title ~~Problems with relative paths under Windows~~ [KED-1796] Problems with relative paths under Windows Jun 18, 2020

andrii-ivaniuk closed this as completed Jun 26, 2020

astrojuanlu mentioned this issue Oct 20, 2023

Add ability to Specify a "root" for DataCatalog.from_config #2965

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KED-1796] Problems with relative paths under Windows #412

[KED-1796] Problems with relative paths under Windows #412

fdroessler commented Jun 15, 2020 •

edited

Loading

idanov commented Jun 18, 2020

idanov commented Jun 18, 2020

andrii-ivaniuk commented Jun 26, 2020

[KED-1796] Problems with relative paths under Windows #412

[KED-1796] Problems with relative paths under Windows #412

Comments

fdroessler commented Jun 15, 2020 • edited Loading

Description

Context

Steps to Reproduce using space flight tutorial

Expected Result

Actual Result

Your Environment

idanov commented Jun 18, 2020

idanov commented Jun 18, 2020

andrii-ivaniuk commented Jun 26, 2020

fdroessler commented Jun 15, 2020 •

edited

Loading