You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hopefully this error is not due to a change in kedro that I have missed but I had a couple of people reporting this to me.
Description
Switching from absolute paths to relative paths results in some (maybe cached) state where kedro tries to load a non-existing version of a dataset.
Context
We have developed a ML pipeline for which the data was stored in a separate partition (D:/data) and the kedro projekt repo was on something like C:/code/project. Subsequently on a different machine, a colleague cloned the repo and copied the data over into the data folder within the project repo. Now the code was in C:/code/ and the data in C:/code/data/ and the paths in catalog.yml were adjusted to reflect the new location using relative paths. When starting IPython via the kedro command (kedro ipython) everything seemed ok but trying to load a versioned dataset resulted in an error. The error said that a specific version of the dataset could not be loaded from disk. Error: VersionNotFoundError:
Trying to explicitly load a version that exists on the machine resulted in an error because for some reason a C:/ was appended to the file URI so that it looked like this: <path to versioned file>C:.
`01_raw/preprocessed_shuttles.csv/2020-06-15T07.44.54.647Z/C:'
If instead of relative paths data/dataset absolute paths are used C:/..../kedro_project/data/dataset everything works as expected.
Steps to Reproduce using space flight tutorial
Catalog pointing to a different location initially.
At this pointhttps://kedro.readthedocs.io/en/stable/03_tutorial/04_create_pipelines.html#persisting-pre-processed-data use versioned datasets
Copy all files over to the project data directory and change catalog.yml to relative paths: data/...
Start ipython using kedro ipython and try loading catalog.load("preprocessed_shuttles") or catalog.load("preprocessed_shuttles", version='2020-06-15T07.44.54.647Z') to see both errors.
Expected Result
Kedro should load the latest dataset on disk independent of absolute and relative paths.
Actual Result
Copying the data and changing the yaml to relative paths results in error messages trying to load none existing versions.
VersionNotFoundError: Did not find any versions for CSVDataSet(filepath=C:\Users\user\git\space\data\01_raw\preprocessed_shuttles.csv, protocol=file, save_args={'index': False}, version=Version(load=None, save='2020-06-15T07.45.43.589Z'))
DataSetError: Failed while loading data from data set CSVDataSet(filepath=C:\Users\user\git\space\data\01_raw\preprocessed_shuttles.csv, protocol=file, save_args={'index': False}, version=Version(load='2020-06-15T07.44.54.647Z', save=None)).
[WinError 123] Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbezeichnung ist falsch: 'C:/Users/user/git/space/data/01_raw/preprocessed_shuttles.csv/2020-06-15T07.44.54.647Z/C:'
Your Environment
Include as many relevant details about the environment in which you experienced the bug:
Kedro version used (pip show kedro or kedro -V): 0.16.1
Python version used (python -V): 3.7.7
Operating system and version: Windows 10
The text was updated successfully, but these errors were encountered:
@fdroessler Thank you for reporting this! We've been working through a backlog of some Windows specific issues and we'll add this one to the list. Once we have a solution for this, we'll share what that is and release a bugfix.
idanov
changed the title
Problems with relative paths under Windows
[KED-1796] Problems with relative paths under Windows
Jun 18, 2020
Hopefully this error is not due to a change in kedro that I have missed but I had a couple of people reporting this to me.
Description
Switching from absolute paths to relative paths results in some (maybe cached) state where kedro tries to load a non-existing version of a dataset.
Context
We have developed a ML pipeline for which the data was stored in a separate partition (
D:/data
) and the kedro projekt repo was on something likeC:/code/project
. Subsequently on a different machine, a colleague cloned the repo and copied the data over into the data folder within the project repo. Now the code was inC:/code/
and the data inC:/code/data/
and the paths incatalog.yml
were adjusted to reflect the new location using relative paths. When starting IPython via the kedro command (kedro ipython
) everything seemed ok but trying to load a versioned dataset resulted in an error. The error said that a specific version of the dataset could not be loaded from disk. Error:VersionNotFoundError:
Trying to explicitly load a version that exists on the machine resulted in an error because for some reason a
C:/
was appended to the file URI so that it looked like this:<path to versioned file>C:
.`01_raw/preprocessed_shuttles.csv/2020-06-15T07.44.54.647Z/C:'
If instead of relative paths
data/dataset
absolute paths are usedC:/..../kedro_project/data/dataset
everything works as expected.Steps to Reproduce using space flight tutorial
data/...
kedro ipython
and try loadingcatalog.load("preprocessed_shuttles")
orcatalog.load("preprocessed_shuttles", version='2020-06-15T07.44.54.647Z')
to see both errors.Expected Result
Kedro should load the latest dataset on disk independent of absolute and relative paths.
Actual Result
Copying the data and changing the yaml to relative paths results in error messages trying to load none existing versions.
Your Environment
Include as many relevant details about the environment in which you experienced the bug:
pip show kedro
orkedro -V
): 0.16.1python -V
): 3.7.7The text was updated successfully, but these errors were encountered: