Skip to content

Support dev and staging workspaces #514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 22, 2024
Merged

Support dev and staging workspaces #514

merged 8 commits into from
Jan 22, 2024

Conversation

mgyucht
Copy link
Contributor

@mgyucht mgyucht commented Jan 19, 2024

Changes

Copies the DatabricksEnvironment abstraction from the Go SDK to the Python SDK. This enables using the SDK for Azure development and staging workspaces, which use an alternate Azure login application ID.

Additionally, this PR changes azure_environment to accept the same values as the Go SDK and Terraform providers for both Databricks and Azure, i.e. PUBLIC, USGOVERNMENT, and CHINA.

The first commit of this PR is a refactor moving config & credentials provider code to separate files, as core.py is getting quite big and difficult to maintain. The second one is much smaller and has just the net new bits.

Tests

  • Unit test for the new environment lookup code.
  • Manual test Azure CLI auth against a staging Azure workspace.

@mgyucht mgyucht force-pushed the support-dev-and-staging branch 3 times, most recently from a700edb to f33cdd6 Compare January 19, 2024 12:05
@mgyucht mgyucht force-pushed the support-dev-and-staging branch from f33cdd6 to 1efaea2 Compare January 19, 2024 12:10
Comment on lines +288 to +291
super().__init__()
self._content = iter(content)

def iter_content(self, chunk_size: int = 1) -> Iterator[bytes]:
def iter_content(self, chunk_size: int = 1, decode_unicode=False) -> Iterator[bytes]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes some yellow squigglies in IntelliJ.

nfx
nfx previously requested changes Jan 19, 2024
from .retries import retried
from .version import __version__

__all__ = ['Config', 'DatabricksError']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm relying on credentials_provider and couple of other classes from core. don't break it. Please verify that https://github.com/databrickslabs/ucx still works before this PR is merged. If you make a PR to accommodate these changes - also fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added imports in core.py to make sure all credentials_provider.py and config.py classes are reexported. Manually tested and was able to import those definitions after installing.

@mgyucht mgyucht requested a review from nfx January 19, 2024 14:51
ALL_ENVS = [
DatabricksEnvironment(Cloud.AWS, ".dev.databricks.com"),
DatabricksEnvironment(Cloud.AWS, ".staging.cloud.databricks.com"),
DatabricksEnvironment(Cloud.AWS, ".cloud.databricks.us"), DEFAULT_ENVIRONMENT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, put DEFAULT_ENVIRONMENT in a new line (maybe in the first line).

Copy link
Contributor Author

@mgyucht mgyucht Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done by our formatter, so I think I will leave this as is.

@@ -15,19 +15,15 @@ class AzureEnvironment:
ARM_DATABRICKS_RESOURCE_ID = "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"

ENVIRONMENTS = dict(
PUBLIC=AzureEnvironment(name="AzurePublicCloud",
PUBLIC=AzureEnvironment(name="PUBLIC",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see, the name is not really used. Correct?

Copy link
Contributor Author

@mgyucht mgyucht Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, users don't need to specify this if they specify the host name, since we can derive the Azure environment from the hostname. However, if users only specify the workspace resource ID and are deploying in either China or Govcloud, they need to specify either ARM_ENVIRONMENT or the azure_environment parameter in WorkspaceClient/AccountClient/Config.

This PR does change this (before you would specify AzurePublicCloud and now you would specify PUBLIC or public), aligning with the Azure TF provider and with the Go SDK.

I added a test to cover this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: I added back compatibility for the old style of environment names just in case, so customers specifying these parameters now will not need to change.

resource_manager_endpoint="https://management.microsoftazure.de/",
active_directory_endpoint="https://login.microsoftonline.de/"),
USGOVERNMENT=AzureEnvironment(name="AzureUSGovernmentCloud",
USGOVERNMENT=AzureEnvironment(name="USGOVERNMENT",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we remove the German cloud?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added unnecessarily. Databricks is not run in the German cloud, and we don't include it in the Go SDK either. It is purely in maintenance mode since 2018: https://learn.microsoft.com/en-us/azure/germany/.

@codecov-commenter
Copy link

codecov-commenter commented Jan 22, 2024

Codecov Report

Attention: 167 lines in your changes are missing coverage. Please review.

Comparison is base (26d9eaa) 57.69% compared to head (67d5fb8) 57.77%.

Files Patch % Lines
databricks/sdk/credentials_provider.py 68.22% 122 Missing ⚠️
databricks/sdk/config.py 90.11% 35 Missing ⚠️
databricks/sdk/environments.py 69.69% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #514      +/-   ##
==========================================
+ Coverage   57.69%   57.77%   +0.07%     
==========================================
  Files          40       43       +3     
  Lines       27148    27222      +74     
==========================================
+ Hits        15664    15728      +64     
- Misses      11484    11494      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mgyucht mgyucht enabled auto-merge January 22, 2024 10:18
@mgyucht mgyucht dismissed nfx’s stale review January 22, 2024 10:21

Cloned ucx, ran make test, installed this version of the SDK in the test virtualenv, and reran unit tests, which passed.

@mgyucht mgyucht added this pull request to the merge queue Jan 22, 2024
Merged via the queue into main with commit f0b23a5 Jan 22, 2024
@mgyucht mgyucht deleted the support-dev-and-staging branch January 22, 2024 10:25
hectorcast-db added a commit that referenced this pull request Jan 23, 2024
Bugfixes:

* Fix Databricks OAuth M2M on Azure ([#513](#513)).

Other noteworthy changes:

* Use `[]` instead of `None` as default list value for deserialising responses ([#361](#361)).
* Support dev and staging workspaces ([#514](#514)).

API Changes:

 * Added `exists()` method for [w.tables](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/tables.html) workspace-level service.
 * Added [w.lakehouse_monitors](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/lakehouse_monitors.html) workspace-level service.
 * Added `databricks.sdk.service.catalog.CreateMonitor` dataclass.
 * Added `databricks.sdk.service.catalog.DeleteLakehouseMonitorRequest` dataclass.
 * Added `databricks.sdk.service.catalog.ExistsRequest` dataclass.
 * Added `databricks.sdk.service.catalog.GetLakehouseMonitorRequest` dataclass.
 * Added `databricks.sdk.service.catalog.MonitorCronSchedule` dataclass.
 * Added `databricks.sdk.service.catalog.MonitorCronSchedulePauseStatus` dataclass.
 * Added `databricks.sdk.service.catalog.MonitorCustomMetric` dataclass.
 * Added `databricks.sdk.service.catalog.MonitorCustomMetricType` dataclass.
 * Added `databricks.sdk.service.catalog.MonitorDataClassificationConfig` dataclass.
 * Added `databricks.sdk.service.catalog.MonitorDestinations` dataclass.
 * Added `databricks.sdk.service.catalog.MonitorInferenceLogProfileType` dataclass.
 * Added `databricks.sdk.service.catalog.MonitorInferenceLogProfileTypeProblemType` dataclass.
 * Added `databricks.sdk.service.catalog.MonitorInfo` dataclass.
 * Added `databricks.sdk.service.catalog.MonitorInfoStatus` dataclass.
 * Added `databricks.sdk.service.catalog.MonitorNotificationsConfig` dataclass.
 * Added `databricks.sdk.service.catalog.MonitorTimeSeriesProfileType` dataclass.
 * Added `databricks.sdk.service.catalog.TableExistsResponse` dataclass.
 * Added `databricks.sdk.service.catalog.UpdateMonitor` dataclass.
 * Changed `create_obo_token()` method for [w.token_management](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/token_management.html) workspace-level service with new required argument order.
 * Changed `get()` method for [w.token_management](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/token_management.html) workspace-level service to return `databricks.sdk.service.settings.GetTokenResponse` dataclass.
 * Changed `lifetime_seconds` field for `databricks.sdk.service.settings.CreateOboTokenRequest` to no longer be required.
 * Added `databricks.sdk.service.settings.GetTokenResponse` dataclass.

OpenAPI SHA: e05401ed5dd4974c5333d737ec308a7d451f749f, Date: 2024-01-23
@hectorcast-db hectorcast-db mentioned this pull request Jan 23, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jan 23, 2024
Bugfixes:

* Fix Databricks OAuth M2M on Azure
([#513](#513)).

Other noteworthy changes:

* Use `[]` instead of `None` as default list value for deserialising
responses
([#361](#361)).
* Support dev and staging workspaces
([#514](#514)).

API Changes:

* Added `exists()` method for
[w.tables](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/tables.html)
workspace-level service.
* Added
[w.lakehouse_monitors](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/lakehouse_monitors.html)
workspace-level service.
* Added the following dataclasses: 
  `databricks.sdk.service.catalog.CreateMonitor`, 
  `databricks.sdk.service.catalog.DeleteLakehouseMonitorRequest`, 
  `databricks.sdk.service.catalog.ExistsRequest`, 
  `databricks.sdk.service.catalog.GetLakehouseMonitorRequest`, 
  `databricks.sdk.service.catalog.MonitorCronSchedule`, 
  `databricks.sdk.service.catalog.MonitorCronSchedulePauseStatus`, 
  `databricks.sdk.service.catalog.MonitorCustomMetric`, 
  `databricks.sdk.service.catalog.MonitorCustomMetricType`, 
  `databricks.sdk.service.catalog.MonitorDataClassificationConfig`, 
  `databricks.sdk.service.catalog.MonitorDestinations`, 
  `databricks.sdk.service.catalog.MonitorInferenceLogProfileType`,   

`databricks.sdk.service.catalog.MonitorInferenceLogProfileTypeProblemType`,
  `databricks.sdk.service.catalog.MonitorInfo`, 
  `databricks.sdk.service.catalog.MonitorInfoStatus`, 
  `databricks.sdk.service.catalog.MonitorNotificationsConfig`, 
  `databricks.sdk.service.catalog.MonitorTimeSeriesProfileType`, 
  `databricks.sdk.service.catalog.TableExistsResponse` and
  `databricks.sdk.service.catalog.UpdateMonitor`.
* Changed `create_obo_token()` method for
[w.token_management](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/token_management.html)
workspace-level service with new required argument order.
* Changed `get()` method for
[w.token_management](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/token_management.html)
workspace-level service to return
`databricks.sdk.service.settings.GetTokenResponse` dataclass.
* Changed `lifetime_seconds` field for
`databricks.sdk.service.settings.CreateOboTokenRequest` to no longer be
required.
 * Added `databricks.sdk.service.settings.GetTokenResponse` dataclass.

OpenAPI SHA: e05401ed5dd4974c5333d737ec308a7d451f749f, Date: 2024-01-23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants