Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Error message for VersionNotFoundError to handle Permission related issues better #1881

Merged
merged 10 commits into from
Oct 3, 2022
2 changes: 1 addition & 1 deletion docs/source/deployment/databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ Then press `Confirm` button. Your cluster will be restarted to apply the changes

Congratulations, you are now ready to run your Kedro project from the Databricks!

[Create your Databricks notebook](https://docs.databricks.com/notebooks/notebooks-manage.html#create-a-notebook) and remember to [attach it to the cluster](https://docs.databricks.com/notebooks/notebooks-manage.html#attach) you have just configured.
[Create your Databricks notebook](https://docs.databricks.com/notebooks/notebooks-manage.html#create-a-notebook) and remember to [attach it to the cluster](https://docs.databricks.com/notebooks/notebooks-manage.html#attach-a-notebook-to-a-cluster) you have just configured.

In your newly-created notebook, put each of the below code snippets into a separate cell, then [run all cells](https://docs.databricks.com/notebooks/notebooks-use.html#run-notebooks):

Expand Down
11 changes: 9 additions & 2 deletions kedro/io/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -538,9 +538,16 @@ def _fetch_latest_load_version(self) -> str:
most_recent = next(
(path for path in version_paths if self._exists_function(path)), None
)

protocol = getattr(self, "_protocol", None)
if not most_recent:
raise VersionNotFoundError(f"Did not find any versions for {self}")
if protocol in CLOUD_PROTOCOLS:
message = (
f"Did not find any versions for {self} This could be "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"Did not find any versions for {self} This could be "
f"Did not find any versions for {self}. This could be "

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MerelTheisenQB Correct, it's possible to get an error with some other functions , but all datasets rely on self._fs.glob and it won't give you any error with permission issues.

It seems that this is the expected behavior of glob. For example, if you actually ls the system, you will get a permission error. However, this is a very expensive operation especially with object stores, thus I didn't like it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change in 211c449

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @noklam 🙂

f"due to insufficient permission."
)
else:
message = f"Did not find any versions for {self}"
raise VersionNotFoundError(message)

return PurePath(most_recent).parent.name

Expand Down
13 changes: 12 additions & 1 deletion tests/io/test_data_catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
LambdaDataSet,
MemoryDataSet,
)
from kedro.io.core import VERSION_FORMAT, generate_timestamp
from kedro.io.core import VERSION_FORMAT, Version, generate_timestamp


@pytest.fixture
Expand Down Expand Up @@ -652,3 +652,14 @@ def test_replacing_nonword_characters(self):
assert "ds2_spark" in catalog.datasets.__dict__
assert "ds3__csv" in catalog.datasets.__dict__
assert "jalapeño" in catalog.datasets.__dict__

def test_no_versions_cloud_protocol(self):
"""Check the error if no versions are available for load from cloud storage"""
version = Version(load=None, save=None)
versioned_dataset = CSVDataSet("s3://bucket/file.csv", version=version)
pattern = re.escape(
f"Did not find any versions for {versioned_dataset} "
f"This could be due to insufficient permission."
)
with pytest.raises(DataSetError, match=pattern):
versioned_dataset.load()