Skip to content

Commit

Permalink
Fixed bug for double-uploading of unreleased wheels in air-gapped set…
Browse files Browse the repository at this point in the history
…ups (#103)

With the changes from #99, we were not hitting `if wheel.name ==
self._local_wheel.name` condition for unreleased wheels, resulting in
undefined behavior:

```
product_info = ProductInfo.from_class(WheelsV2)
with WheelsV2(new_installation, product_info) as whl:
    whl.upload_wheel_dependencies(["databricks"])
    installation_files = new_installation.files()
    # only Databricks SDK has to be uploaded
    assert len(installation_files) == 1
```
  • Loading branch information
nfx authored May 12, 2024
1 parent 50b5474 commit 41e4aab
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 6 deletions.
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Baseline for Databricks Labs projects written in Python. Sources are validated w
* [Rendering on Dark Background](#rendering-on-dark-background)
* [Rendering in Databricks Notebooks](#rendering-in-databricks-notebooks)
* [Integration With Your App](#integration-with-your-app)
* [Integration with `console_script` Entrypoints](#integration-with-consolescript-entrypoints)
* [Integration with `console_script` Entrypoints](#integration-with-console_script-entrypoints)
* [Parallel Task Execution](#parallel-task-execution)
* [Collecting Results](#collecting-results)
* [Collecting Errors from Background Tasks](#collecting-errors-from-background-tasks)
Expand All @@ -35,7 +35,7 @@ Baseline for Databricks Labs projects written in Python. Sources are validated w
* [Saving `@dataclass` configuration](#saving-dataclass-configuration)
* [Saving CSV files](#saving-csv-files)
* [Loading `@dataclass` configuration](#loading-dataclass-configuration)
* [Brute-forcing `SerdeError` with `as_dict()` and `from_dict()`](#brute-forcing-serdeerror-with-asdict-and-fromdict)
* [Brute-forcing `SerdeError` with `as_dict()` and `from_dict()`](#brute-forcing-serdeerror-with-as_dict-and-from_dict)
* [Configuration Format Evolution](#configuration-format-evolution)
* [Uploading Untyped Files](#uploading-untyped-files)
* [Listing All Files in the Install Folder](#listing-all-files-in-the-install-folder)
Expand All @@ -48,6 +48,7 @@ Baseline for Databricks Labs projects written in Python. Sources are validated w
* [Application Name Detection](#application-name-detection)
* [Using `ProductInfo` with integration tests](#using-productinfo-with-integration-tests)
* [Publishing Wheels to Databricks Workspace](#publishing-wheels-to-databricks-workspace)
* [Publishing upstream dependencies to workspaces without Public Internet access](#publishing-upstream-dependencies-to-workspaces-without-public-internet-access)
* [Databricks CLI's `databricks labs ...` Router](#databricks-clis-databricks-labs--router)
* [Account-level Commands](#account-level-commands)
* [Commands with interactive prompts](#commands-with-interactive-prompts)
Expand Down Expand Up @@ -933,7 +934,7 @@ This will print something like:

You can also do `wheels.upload_to_dbfs()`, though you're not able to set any access control over it.

### Publishing upstream dependencies to Databricks Workspace without Public Internet access
### Publishing upstream dependencies to workspaces without Public Internet access

Python wheel may have dependencies that are not included in the wheel itself. These dependencies are usually other Python packages that your wheel relies on. During installation on regular Databricks Workspaces, these dependencies get automatically fetched from [Python Package Index](https://pypi.org/).

Expand Down
10 changes: 8 additions & 2 deletions src/databricks/labs/blueprint/wheels.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from contextlib import AbstractContextManager
from dataclasses import dataclass
from datetime import datetime, timezone
from functools import cached_property
from pathlib import Path

from databricks.sdk import WorkspaceClient
Expand Down Expand Up @@ -229,7 +230,7 @@ def upload_to_wsfs(self) -> str:
"""Uploads the wheel to WSFS location of installation and returns the remote path."""
with self._local_wheel.open("rb") as f:
remote_wheel = self._installation.upload(f"wheels/{self._local_wheel.name}", f.read())
self._installation.save(Version(self._product_info.version(), remote_wheel, self._now_iso()))
self._installation.save(Version(self._current_version, remote_wheel, self._now_iso()))
return remote_wheel

def upload_wheel_dependencies(self, prefixes: list[str]) -> list[str]:
Expand All @@ -250,6 +251,11 @@ def upload_wheel_dependencies(self, prefixes: list[str]) -> list[str]:
remote_paths.append(remote_wheel)
return remote_paths

@cached_property
def _current_version(self):
# addresses double-uploaded bug for unreleased versions uploaded to airgapped workspaces
return self._product_info.version()

@staticmethod
def _now_iso():
"""Returns the current time in ISO format."""
Expand Down Expand Up @@ -303,7 +309,7 @@ def _override_version_to_unreleased(self, tmp_dir_path: Path):
relative_version_file = self._product_info.version_file().relative_to(checkout_root)
version_file = tmp_dir_path / relative_version_file
with version_file.open("w") as f:
f.write(f'__version__ = "{self._product_info.version()}"')
f.write(f'__version__ = "{self._current_version}"')

def _copy_root_to(self, tmp_dir: str | Path, dirs_exist_ok: bool = False):
"""Copies the root to a temporary directory."""
Expand Down
14 changes: 13 additions & 1 deletion tests/integration/test_wheels.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,16 @@ def test_upload_dbfs(ws, new_installation):
ws.dbfs.get_status(remote_wheel)


# TODO: to add an integration test for upload_wheel_dependencies (currently getting an access issue to the test environment)
def test_upload_upstreams(ws, new_installation):
product_info = ProductInfo.from_class(WheelsV2)
with WheelsV2(new_installation, product_info) as whl:
whl.upload_wheel_dependencies(["databricks"])

installation_files = new_installation.files()
# only Databricks SDK has to be uploaded
assert len(installation_files) == 1

whl.upload_to_wsfs()
installation_files = new_installation.files()
# SDK, Blueprint and version.json metadata
assert len(installation_files) == 3

0 comments on commit 41e4aab

Please sign in to comment.