Skip to content

Commit

Permalink
Extend service principal migration to create storage credentials for …
Browse files Browse the repository at this point in the history
…access connectors created for each storage account (#1426)

## Changes
Extend service principal migration to create storage credentials for
access connectors created for each storage account

### Linked issues

Resolves #1384 
Resolves #875

### To do
- [x] Update docs on creating access connectors. 
- [x] First prompt for creating credentials for access connectors, then
for service princpals
- [x] Use the storage credentials in [external
locations](https://github.com/databrickslabs/ucx/blob/main/src/databricks/labs/ucx/azure/locations.py#L65)

### Functionality 

- [ ] added relevant user documentation
- [ ] added new CLI command
- [x] modified existing command: `databricks labs ucx ...`
- [ ] added a new workflow
- [ ] modified existing workflow: `...`
- [ ] added a new table
- [ ] modified existing table: `...`

### Tests
<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [x] manually tested
- [x] added unit tests
- [x] added integration tests
- [ ] verified on staging environment (screenshot attached)

---------

Co-authored-by: Serge Smertin <259697+nfx@users.noreply.github.com>
Co-authored-by: vuong-nguyen <44292934+nkvuong@users.noreply.github.com>
  • Loading branch information
3 people authored May 6, 2024
1 parent c344dae commit 93fb559
Show file tree
Hide file tree
Showing 13 changed files with 634 additions and 165 deletions.
24 changes: 14 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -697,16 +697,20 @@ This command is one of prerequisites for the [table migration workflow](#table-m
databricks labs ucx migrate-credentials
```

For Azure, this command migrate Azure Service Principals, which have `Storage Blob Data Contributor`,
`Storage Blob Data Reader`, `Storage Blob Data Owner` roles on ADLS Gen2 locations that are being used in
Databricks, to UC storage credentials. The Azure Service Principals to location mapping are listed
in `/Users/{user_name}/.ucx/azure_storage_account_info.csv` which is generated
by [`principal-prefix-access` command](#principal-prefix-access-command).
Please review the file and delete the Service Principals you do not want to be migrated.
The command will only migrate the Service Principals that have client secret stored in Databricks Secret.

**Warning**: Service principals used to access storage accounts behind firewalls might cause connectivity issues. We
recommend to use access connectors instead.
For Azure, this command prompts to confirm performing the following credential migration steps:
1. [RECOMMENDED] For each storage account, create access connectors with managed identities that have the
`Storage Blob Data Contributor` role on the respective storage account. An storage credential is created for each
access connector.
2. Migrate Azure Service Principals, which have `Storage Blob Data Contributor`,
`Storage Blob Data Reader`, `Storage Blob Data Owner` roles on ADLS Gen2 locations that are being used in
Databricks, to UC storage credentials. The Azure Service Principals to location mapping are listed
in `/Users/{user_name}/.ucx/azure_storage_account_info.csv` which is generated by
[`principal-prefix-access` command](#principal-prefix-access-command). Please review the file and delete the Service
Principals you do not want to be migrated. The command will only migrate the Service Principals that have client
secret stored in Databricks Secret.

**Warning**: Service principals used to access storage accounts behind firewalls might cause connectivity issues. We
recommend to use access connectors instead.

Once you're done with this command, run [`validate-external-locations` command](#validate-external-locations-command) after this one.

Expand Down
18 changes: 15 additions & 3 deletions src/databricks/labs/ucx/azure/access.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ def create_uber_principal(self, prompts: Prompts):

def _create_access_connector_for_storage_account(
self, storage_account: StorageAccount, role_name: str = "STORAGE_BLOB_DATA_READER"
) -> AccessConnector:
) -> tuple[AccessConnector, str]:
access_connector = self._azurerm.create_or_update_access_connector(
storage_account.id.subscription_id,
storage_account.id.resource_group,
Expand All @@ -208,9 +208,21 @@ def _create_access_connector_for_storage_account(
wait_for_provisioning=True,
)
self._apply_storage_permission(access_connector.principal_id, role_name, storage_account)
return access_connector

def create_access_connectors_for_storage_accounts(self) -> list[AccessConnector]:
container = next(self._azurerm.containers(storage_account.id), None)
if container is None:
url = f"abfss://{storage_account.name}.dfs.core.windows.net/"
else:
url = f"abfss://{container.container}@{container.storage_account}.dfs.core.windows.net/"

return access_connector, url

def create_access_connectors_for_storage_accounts(self) -> list[tuple[AccessConnector, str]]:
"""Create access connectors for storage accounts
Returns:
list[AccessConnector, str] : The access connectors with a storage url to which it has access.
"""
used_storage_accounts = self._get_storage_accounts()
if len(used_storage_accounts) == 0:
logger.warning(
Expand Down
135 changes: 89 additions & 46 deletions src/databricks/labs/ucx/azure/credentials.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
import logging
from dataclasses import dataclass
from dataclasses import dataclass, field

from databricks.labs.blueprint.installation import Installation
from databricks.labs.blueprint.tui import Prompts
from databricks.sdk import WorkspaceClient
from databricks.sdk.errors import BadRequest
from databricks.sdk.errors.platform import InvalidParameterValue
from databricks.sdk.service.catalog import (
AzureManagedIdentityRequest,
AzureServicePrincipal,
Privilege,
StorageCredentialInfo,
Expand Down Expand Up @@ -33,19 +35,42 @@ class ServicePrincipalMigrationInfo:
class StorageCredentialValidationResult:
name: str
application_id: str
read_only: bool
read_only: bool | None
validated_on: str
directory_id: str | None = None
failures: list[str] | None = None
directory_id: str | None = None # str when storage credential created for AzureServicePrincipal
failures: list[str] = field(default_factory=list)

@classmethod
def from_validation(cls, permission_mapping: StoragePermissionMapping, failures: list[str] | None):
def _get_application_and_directory_id(
cls, storage_credential_info: StorageCredentialInfo
) -> tuple[str, str | None]:
if storage_credential_info.azure_service_principal is not None:
application_id = storage_credential_info.azure_service_principal.application_id
directory_id = storage_credential_info.azure_service_principal.directory_id
return application_id, directory_id

if storage_credential_info.azure_managed_identity is not None:
if storage_credential_info.azure_managed_identity.managed_identity_id is not None:
return storage_credential_info.azure_managed_identity.managed_identity_id, None
return storage_credential_info.azure_managed_identity.access_connector_id, None

raise KeyError("Storage credential info is missing an application id.")

@classmethod
def from_storage_credential_info(
cls,
storage_credential_info: StorageCredentialInfo,
validated_on: str,
failures: list[str],
):
assert storage_credential_info.name is not None
application_id, directory_id = cls._get_application_and_directory_id(storage_credential_info)
return cls(
permission_mapping.principal,
permission_mapping.client_id,
permission_mapping.privilege == Privilege.READ_FILES.value,
permission_mapping.prefix,
permission_mapping.directory_id,
storage_credential_info.name,
application_id,
storage_credential_info.read_only,
validated_on,
directory_id,
failures,
)

Expand Down Expand Up @@ -104,30 +129,31 @@ def create_with_client_secret(self, spn: ServicePrincipalMigrationInfo) -> Stora
read_only=spn.permission_mapping.privilege == Privilege.READ_FILES.value,
)

def validate(self, permission_mapping: StoragePermissionMapping) -> StorageCredentialValidationResult:
def validate(self, storage_credential_info: StorageCredentialInfo, url: str) -> StorageCredentialValidationResult:
try:
validation = self._ws.storage_credentials.validate(
storage_credential_name=permission_mapping.principal,
url=permission_mapping.prefix,
read_only=permission_mapping.privilege == Privilege.READ_FILES.value,
storage_credential_name=storage_credential_info.name,
url=url,
read_only=storage_credential_info.read_only,
)
except InvalidParameterValue:
logger.warning(
"There is an existing external location overlaps with the prefix that is mapped to "
"the service principal and used for validating the migrated storage credential. "
"Skip the validation"
)
return StorageCredentialValidationResult.from_validation(
permission_mapping,
return StorageCredentialValidationResult.from_storage_credential_info(
storage_credential_info,
url,
[
"The validation is skipped because an existing external location overlaps "
"with the location used for validation."
],
)

if not validation.results:
return StorageCredentialValidationResult.from_validation(
permission_mapping, ["Validation returned no results."]
return StorageCredentialValidationResult.from_storage_credential_info(
storage_credential_info, url, ["Validation returned no results."]
)

failures = []
Expand All @@ -136,7 +162,8 @@ def validate(self, permission_mapping: StoragePermissionMapping) -> StorageCrede
continue
if result.result == ValidationResultResult.FAIL:
failures.append(f"{result.operation.value} validation failed with message: {result.message}")
return StorageCredentialValidationResult.from_validation(permission_mapping, None if not failures else failures)

return StorageCredentialValidationResult.from_storage_credential_info(storage_credential_info, url, failures)


class ServicePrincipalMigration(SecretsMixin):
Expand Down Expand Up @@ -238,53 +265,69 @@ def _migrate_service_principals(
f"'{spn.permission_mapping.prefix}' with non-Allow network configuration"
)

self._storage_credential_manager.create_with_client_secret(spn)
execution_result.append(self._storage_credential_manager.validate(spn.permission_mapping))

storage_credential_info = self._storage_credential_manager.create_with_client_secret(spn)
validation_results = self._storage_credential_manager.validate(
storage_credential_info,
spn.permission_mapping.prefix,
)
execution_result.append(validation_results)
return execution_result

def _create_access_connectors_for_storage_accounts(self) -> list[StorageCredentialValidationResult]:
self._resource_permissions.create_access_connectors_for_storage_accounts()
return []
def _create_storage_credentials_for_storage_accounts(self) -> list[StorageCredentialValidationResult]:
access_connectors = self._resource_permissions.create_access_connectors_for_storage_accounts()

execution_results = []
for access_connector, url in access_connectors:
storage_credential_info = self._ws.storage_credentials.create(
access_connector.name,
azure_managed_identity=AzureManagedIdentityRequest(str(access_connector.id)),
comment="Created by UCX",
read_only=False, # Access connectors get "STORAGE_BLOB_DATA_CONTRIBUTOR" permissions
)
try:
validation_results = self._storage_credential_manager.validate(storage_credential_info, url)
except BadRequest:
logger.warning(f"Could not validate storage credential {storage_credential_info.name} for url {url}")
else:
execution_results.append(validation_results)

return execution_results

def run(self, prompts: Prompts, include_names: set[str] | None = None) -> list[StorageCredentialValidationResult]:
plan_confirmed = prompts.confirm(
"Above Azure Service Principals will be migrated to UC storage credentials, please review and confirm."
"[RECOMMENDED] Please confirm to create an access connector with a managed identity for each storage "
"account."
)
sp_results = []
ac_results = []
if plan_confirmed:
sp_migration_infos = self._generate_migration_list(include_names)
plan_confirmed = True
if any(spn.permission_mapping.default_network_action != "Allow" for spn in sp_migration_infos):
plan_confirmed = prompts.confirm(
"At least one Azure Service Principal accesses a storage account with non-Allow default network "
"configuration, which might cause connectivity issues. We recommend using Databricks Access "
"Connectors instead (next prompt). Would you like to continue with migrating the service "
"principals?"
)
if plan_confirmed:
sp_results = self._migrate_service_principals(sp_migration_infos)
ac_results = self._create_storage_credentials_for_storage_accounts()

sp_migration_infos = self._generate_migration_list(include_names)
if any(spn.permission_mapping.default_network_action != "Allow" for spn in sp_migration_infos):
logger.warning(
"At least one Azure Service Principal accesses a storage account with non-Allow default network "
"configuration, which might cause connectivity issues. We recommend using Databricks Access "
"Connectors instead"
)
plan_confirmed = prompts.confirm(
"[RECOMMENDED] Please confirm to create an access connector with a managed identity for each storage "
"account."
"Above Azure Service Principals will be migrated to UC storage credentials, please review and confirm."
)
ac_results = []
sp_results = []
if plan_confirmed:
ac_results = self._create_access_connectors_for_storage_accounts()
sp_results = self._migrate_service_principals(sp_migration_infos)

execution_results = sp_results + ac_results
execution_results = ac_results + sp_results
if execution_results:
results_file = self.save(execution_results)
logger.info(
"Completed migration from Azure Service Principal to UC Storage credentials "
"and creation of Databricks Access Connectors for storage accounts "
"and creation of UC Storage credentials for storage access with Databricks Access Connectors. "
f"Please check {results_file} for validation results"
)
else:
logger.info(
"No Azure Service Principal migrated to UC Storage credentials "
"nor Databricks Access Connectors created for storage accounts"
"No UC Storage credentials created during Azure Service Principal migration "
"nor for storage access with Databricks Access Connectors."
)

return execution_results
27 changes: 27 additions & 0 deletions src/databricks/labs/ucx/azure/locations.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,33 @@ def _prefix_credential_name_mapping(self) -> tuple[dict[str, str], dict[str, str
continue
if permission_mapping.client_id in app_id_mapping_read:
prefix_mapping_read[permission_mapping.prefix] = app_id_mapping_read[permission_mapping.client_id]

all_storage_accounts = list(self._azurerm.storage_accounts())
for storage_credential in self._ws.storage_credentials.list():
# Filter storage credentials for access connectors created by UCX
if not (
storage_credential.name is not None
and storage_credential.name.startswith("ac-")
and storage_credential.comment is not None
and storage_credential.comment == "Created by UCX"
):
continue

storage_account_name = storage_credential.name.removeprefix("ac-")
storage_accounts = [st for st in all_storage_accounts if st.name == storage_account_name]
if len(storage_accounts) == 0:
logger.warning(
f"Storage account {storage_account_name} for access connector {storage_credential.name} not found, "
"therefore, not able to create external locations for this storage account using the access "
"connector."
)
continue

for container in self._azurerm.containers(storage_accounts[0].id):
storage_url = f"abfss://{container.container}@{container.storage_account}.dfs.core.windows.net/"
# UCX assigns created access connectors the "STORAGE_BLOB_DATA_CONTRIBUTOR" role on the storage account
prefix_mapping_write[storage_url] = storage_credential.name

return prefix_mapping_write, prefix_mapping_read

def _create_location_name(self, location_url: str) -> str:
Expand Down
16 changes: 9 additions & 7 deletions src/databricks/labs/ucx/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -301,13 +301,15 @@ def create_missing_principals(

@ucx.command
def migrate_credentials(w: WorkspaceClient, prompts: Prompts, ctx: WorkspaceContext | None = None, **named_parameters):
"""For Azure, this command migrates Azure Service Principals, which have Storage Blob Data Contributor,
Storage Blob Data Reader, Storage Blob Data Owner roles on ADLS Gen2 locations that are being used in
Databricks, to UC storage credentials.
The Azure Service Principals to location mapping are listed in
{install_folder}/.ucx/azure_storage_account_info.csv which is generated by principal_prefix_access command.
Please review the file and delete the Service Principals you do not want to be migrated.
The command will only migrate the Service Principals that have client secret stored in Databricks Secret.
"""For Azure, this command prompts to i) create UC storage credentials for the access connectors with a
managed identity created for each storage account present in the ADLS Gen2 locations, the access connectors are
granted Storage Blob Data Contributor permissions on their corresponding storage account, to prepare for adopting to
use Databricks' best practice for using access connectors to authenticate with external storage, and ii) to migrate
Azure Service Principals, which have Storage Blob Data Contributor, Storage Blob Data Reader, Storage Blob Data
Owner roles on ADLS Gen2 locations that are being used in Databricks, to UC storage credentials. The Azure Service
Principals to location mapping are listed in {install_folder}/.ucx/azure_storage_account_info.csv which is generated
by principal_prefix_access command. Please review the file and delete the Service Principals you do not want to be
migrated. The command will only migrate the Service Principals that have client secret stored in Databricks Secret.
For AWS, this command migrates AWS Instance Profiles that are being used in Databricks, to UC storage credentials.
The AWS Instance Profiles to location mapping are listed in
{install_folder}/.ucx/aws_instance_profile_info.csv which is generated by principal_prefix_access command.
Expand Down
24 changes: 24 additions & 0 deletions tests/integration/azure/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import pytest


def delete_ucx_created_resources(api_client):
"""Delete UCX created resources"""
for resource in api_client.list():
if resource.name is not None and resource.comment is not None and resource.comment == "Created by UCX":
api_client.delete(resource.name, force=True)


@pytest.fixture
def clean_storage_credentials(az_cli_ctx):
"""Clean test generated storage credentials."""
delete_ucx_created_resources(az_cli_ctx.workspace_client.storage_credentials)
yield
delete_ucx_created_resources(az_cli_ctx.workspace_client.storage_credentials)


@pytest.fixture
def clean_external_locations(az_cli_ctx):
"""Clean test generated external locations."""
delete_ucx_created_resources(az_cli_ctx.workspace_client.external_locations)
yield
delete_ucx_created_resources(az_cli_ctx.workspace_client.external_locations)
Loading

0 comments on commit 93fb559

Please sign in to comment.