Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-1796] Problems with relative paths under Windows #412

Closed
fdroessler opened this issue Jun 15, 2020 · 3 comments
Closed

[KED-1796] Problems with relative paths under Windows #412

fdroessler opened this issue Jun 15, 2020 · 3 comments
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed

Comments

@fdroessler
Copy link
Contributor

fdroessler commented Jun 15, 2020

Hopefully this error is not due to a change in kedro that I have missed but I had a couple of people reporting this to me.

Description

Switching from absolute paths to relative paths results in some (maybe cached) state where kedro tries to load a non-existing version of a dataset.

Context

We have developed a ML pipeline for which the data was stored in a separate partition (D:/data) and the kedro projekt repo was on something like C:/code/project. Subsequently on a different machine, a colleague cloned the repo and copied the data over into the data folder within the project repo. Now the code was in C:/code/ and the data in C:/code/data/ and the paths in catalog.yml were adjusted to reflect the new location using relative paths. When starting IPython via the kedro command (kedro ipython) everything seemed ok but trying to load a versioned dataset resulted in an error. The error said that a specific version of the dataset could not be loaded from disk. Error: VersionNotFoundError:

Trying to explicitly load a version that exists on the machine resulted in an error because for some reason a C:/ was appended to the file URI so that it looked like this: <path to versioned file>C:.
`01_raw/preprocessed_shuttles.csv/2020-06-15T07.44.54.647Z/C:'

If instead of relative paths data/dataset absolute paths are used C:/..../kedro_project/data/dataset everything works as expected.

Steps to Reproduce using space flight tutorial

  1. Catalog pointing to a different location initially.
  2. At this pointhttps://kedro.readthedocs.io/en/stable/03_tutorial/04_create_pipelines.html#persisting-pre-processed-data use versioned datasets
  3. Copy all files over to the project data directory and change catalog.yml to relative paths: data/...
  4. Start ipython using kedro ipython and try loading catalog.load("preprocessed_shuttles") or catalog.load("preprocessed_shuttles", version='2020-06-15T07.44.54.647Z') to see both errors.

Expected Result

Kedro should load the latest dataset on disk independent of absolute and relative paths.

Actual Result

Copying the data and changing the yaml to relative paths results in error messages trying to load none existing versions.

VersionNotFoundError: Did not find any versions for CSVDataSet(filepath=C:\Users\user\git\space\data\01_raw\preprocessed_shuttles.csv, protocol=file, save_args={'index': False}, version=Version(load=None, save='2020-06-15T07.45.43.589Z'))
DataSetError: Failed while loading data from data set CSVDataSet(filepath=C:\Users\user\git\space\data\01_raw\preprocessed_shuttles.csv, protocol=file, save_args={'index': False}, version=Version(load='2020-06-15T07.44.54.647Z', save=None)).
[WinError 123] Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbezeichnung ist falsch: 'C:/Users/user/git/space/data/01_raw/preprocessed_shuttles.csv/2020-06-15T07.44.54.647Z/C:'

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): 0.16.1
  • Python version used (python -V): 3.7.7
  • Operating system and version: Windows 10
@fdroessler fdroessler added the Issue: Bug Report 🐞 Bug that needs to be fixed label Jun 15, 2020
@idanov
Copy link
Member

idanov commented Jun 18, 2020

@fdroessler Thank you for reporting this! We've been working through a backlog of some Windows specific issues and we'll add this one to the list. Once we have a solution for this, we'll share what that is and release a bugfix.

@idanov idanov changed the title Problems with relative paths under Windows [KED-1796] Problems with relative paths under Windows Jun 18, 2020
@idanov
Copy link
Member

idanov commented Jun 18, 2020

Linking in issue #390 since it seems that it's reporting the same problem.

@andrii-ivaniuk
Copy link
Contributor

@fdroessler Thank you for reporting this issue. It was fixed in 390c02f commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed
Projects
None yet
Development

No branches or pull requests

3 participants