Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: Create UC Schema and Table Grants based on Legacy Table ACLs #340

Closed
Tracked by #333
nfx opened this issue Sep 29, 2023 · 1 comment · Fixed by #1054
Closed
Tracked by #333

[FEATURE]: Create UC Schema and Table Grants based on Legacy Table ACLs #340

nfx opened this issue Sep 29, 2023 · 1 comment · Fixed by #1054
Assignees
Labels
enhancement New feature or request migrate/external go/uc/upgrade SYNC EXTERNAL TABLES step migrate/managed go/uc/upgrade Upgrade Managed Tables and Jobs

Comments

@nfx
Copy link
Collaborator

nfx commented Sep 29, 2023

Background

Customers who wish to upgrade to UC are left in a position where their legacy table ACLs are incompatible with Unity Catalog. Customers desire that these legacy ACLs be migrated to UC.

Upstream dependencies:

Privilege Model Differences

By default, Table ACLs are permissive by design, whereas UC is not permissive by design.

Specifically, if a grant does not exist on a table in UC, you will not have access to it. If a grant does that matches your user, then you will.

Table ACLs work opposite to this. By default, you have access to all objects unless a grant exists on that table that does not give you access.

Table ACLs also support DENY, which does not exist in UC.

Privilege Model Map Hive To UC

Hive Metastore Privilege Intended Functional Action Object types UC Metastore Privilege
SELECT gives read access to an object. Catalog, schema, table, view SELECT
CREATE gives ability to create an object (for example, a table in a schema). schema, table, view CREATE
MODIFY gives ability to add, delete, and modify data to or from an object. schema, table, view MODIFY
USAGE does not give any abilities, but is an additional requirement to perform any action on a schema object schema, table, view USAGE
READ_METADATA gives ability to view an object and its metadata. schema, table, view BROWSE
CREATE_NAMED_FUNCTION gives ability to create a named UDF in an existing catalog or schema. function CREATE FUNCTION
MODIFY_CLASSPATH gives ability to add files to the Spark class path. - does not translate
ALL PRIVILEGES gives all privileges (is translated into all the above privileges). schema, table ALL PRIVILEGES

READ_METADATA translates to the BROWSE privilege in UC, and can be granted on all objects - this function is in preview and should be enabled for any customer using this.
CREATE_NAMED_FUNCTION translates to CREATE FUNCTION in UC.

port the mapping to https://github.com/databrickslabs/ucx/blob/main/src/databricks/labs/ucx/hive_metastore/grants.py#L93-L111

WIP

Dealing with 2-level to 3-level namespace changes

If a user has USAGE access to a schema, then they also need access to the translated catalog in UC.
All USAGE on schemas should be translated to USAGE on the schema in UC, as well as USAGE on the containing catalog.

Recommended Migration Approach

Grab all ACLs (GRANTs/DENY) on tables, views, and schemas.
Generate a distinct list of objects not in this list that are in HMS.

For each object in HMS that does not have an ACL (GRANT and DENY) on it directly

  • If this object is contained within a schema in HMS that has a USAGE grant on it, and that USAGE grant is not applied to all users. Translate this to mean that this table is hidden from view except for the groups or users in that ACL.
    • Provide a BROWSE ACL on this particular schema for any users and groups that have USAGE on this schema as they mean the same thing in this context
    • Translate the grants otherwise accordingly to the map.
  • If this object is contained within a schema that has a DENY on it Only grant USAGE on the schema to members identified
  • If this object is contained within a schema that does not have a USAGE grant. Translate this to mean that this table is accessible by all users

**For each ACL on an object

@nfx nfx added migrate/external go/uc/upgrade SYNC EXTERNAL TABLES step migrate/managed go/uc/upgrade Upgrade Managed Tables and Jobs labels Sep 29, 2023
@nfx
Copy link
Collaborator Author

nfx commented Oct 2, 2023

i think we need to introduce another step here in terms of the workflow.

https://github.com/databrickslabs/ucx/blob/main/src/databricks/labs/ucx/hive_metastore/grants.py#L93-L111 holds the current version of mapping, that possibly needs to be adjusted.

@pritishpai pritishpai self-assigned this Oct 17, 2023
@nfx nfx added the enhancement New feature or request label Nov 7, 2023
@nfx nfx changed the title Migrate Legacy Table ACLs to UC Grants [FEATURE]: Create UC Schema and Table Grants based on Legacy Table ACLs Mar 13, 2024
@nfx nfx closed this as completed in #1054 Mar 21, 2024
nfx pushed a commit that referenced this issue Mar 21, 2024
## Changes
<!-- Summary of your changes that are easy to understand. Add
screenshots when necessary -->

### Linked issues
<!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes,
fixed, resolve, resolves, resolved. See
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
-->

Resolves #340
To be followed up with PRs for  #887 #907

### Functionality 

- [x] modified existing workflow: `migrate-tables`

### Tests
<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [x] manually tested
- [x] added unit tests
- [x] added integration tests
- [x] verified on staging environment (screenshot attached)
nfx added a commit that referenced this issue Mar 21, 2024
* Added Legacy Table ACL grants migration ([#1054](#1054)). This commit introduces a legacy table ACL grants migration to the `migrate-tables` workflow, resolving issue [#340](#340) and paving the way for follow-up PRs [#887](#887) and [#907](#907). A new `GrantsCrawler` class is added for crawling grants, along with a `GroupManager` class to manage groups during migration. The `TablesMigrate` class is updated to accept an instance of `GrantsCrawler` and `GroupManager` in its constructor. The migration process has been thoroughly tested with unit tests, integration tests, and manual testing on a staging environment. The changes include the addition of a new Enum class `AclMigrationWhat` and updates to the `Table` dataclass, and affect the way tables are selected for migration based on rules. The logging and error handling have been improved in the `skip_schema` function.
* Added `databricks labs ucx cluster-remap` command to remap legacy cluster configurations to UC-compatible ([#994](#994)). In this open-source library update, we have developed and added the `databricks labs ucx cluster-remap` command, which facilitates the remapping of legacy cluster configurations to UC-compatible ones. This new CLI command comes with user documentation to guide the cluster remapping process. Additionally, we have expanded the functionality of creating and managing UC external catalogs and schemas with the inclusion of `create-catalogs-schemas` and `revert-cluster-remap` commands. This change does not modify existing commands or workflows and does not introduce new tables. The `databricks labs ucx cluster-remap` command allows users to re-map and revert the re-mapping of clusters from Unity Catalog (UC) using the CLI, ensuring compatibility and streamlining the migration process. The new command and associated functions have been manually tested for functionality.
* Added `migrate-tables` workflow ([#1051](#1051)). The `migrate-tables` workflow has been added, which allows for more fine-grained control over the resources allocated to the workspace. This workflow includes two new instance variables `min_workers` and `max_workers` in the `WorkspaceConfig` class, with default values of 1 and 10 respectively. A new `trigger` function has also been introduced, which initializes a configuration, SQL backend, and WorkspaceClient based on the provided configuration file. The `run_task` function has been added, which looks up the specified task, logs relevant information, and runs the task's function with the provided arguments. The `Task` class's `fn` attribute now includes an `Installation` object as a parameter. Additionally, a new `migrate-tables` workflow has been added for migrating tables from the Hive Metastore to the Unity Catalog, along with new classes and methods for table mapping, migration status refreshing, and migrating tables. The `migrate_dbfs_root_delta_tables` and `migrate_external_tables_sync` methods perform migrations for Delta tables located in the DBFS root and synchronize external tables, respectively. These functions use the workspace client to access the catalogs and ensure proper migration. Integration tests have also been added for these new methods to ensure their correct operation.
* Added handling for `SYNC` command failures ([#1073](#1073)). This pull request introduces changes to improve handling of `SYNC` command failures during external table migrations in the Hive metastore. Previously, the `SYNC` command's result was not checked, and failures were not logged. Now, the `_migrate_external_table` method in `table_migrate.py` fetches the result of the `SYNC` command execution, logs a warning message for failures, and returns `False` if the command fails. A new integration test has been added to simulate a failed `SYNC` command due to a non-existent catalog and schema, ensuring the migration tool handles such failures. A new test case has also been added to verify the handling of `SYNC` command failures during external table migrations, using a mock backend to simulate failures and checking for appropriate log messages. These changes enhance the reliability and robustness of the migration process, providing clearer error diagnosis and handling for potential `SYNC` command failures.
* Added initial version of `databricks labs ucx migrate-local-code` command ([#1067](#1067)). A new `databricks labs ucx migrate-local-code` command has been added to facilitate migration of local code to a Databricks environment, specifically targeting Python and SQL files. This initial version is experimental and aims to help users and administrators manage code migration, maintain consistency across workspaces, and enhance compatibility with the Unity Catalog, a component of Databricks' data and AI offerings. The command introduces a new `Files` class for applying migrations to code files, considering their language. It also updates the `.gitignore` file and the pyproject.toml file to ensure appropriate version control management. Additionally, new classes and methods have been added to support code analysis, transformation, and linting for various programming languages. These improvements will aid in streamlining the migration process and ensuring compatibility with Databricks' environment.
* Added instance pool to cluster policy ([#1078](#1078)). A new field, `instance_pool_id`, has been added to the cluster policy configuration in `policy.py`, allowing users to specify the ID of an instance pool to be applied to all workflow clusters in the policy. This ID can be manually set or automatically retrieved by the system. A new private method, `_get_instance_pool_id()`, has been added to handle the retrieval of the instance pool ID. Additionally, a new test for table migration jobs has been added to `test_installation.py` to ensure the migration job is correctly configured with the specified parallelism, minimum and maximum number of workers, and instance pool ID. A new test case for creating a cluster policy with an instance pool has also been added to `tests/unit/installer/test_policy.py` to ensure the instance pool is added to the cluster policy during creation. These changes provide users with more control over instance pools and cluster policies, and improve the overall functionality of the library.
* Fixed `ucx move` logic for `MANAGED` & `EXTERNAL` tables ([#1062](#1062)). The `ucx move` command has been updated to allow for the movement of UC tables/views after the table upgrade process, providing flexibility in managing catalog structure. The command now supports moving multiple tables simultaneously, dropping managed tables/views upon confirmation, and deep-cloning managed tables while dropping and recreating external tables. A refactoring of the `TableMove` class has improved code organization and readability, and the associated unit tests have been updated to reflect these changes. This feature is targeted towards developers and administrators seeking to adjust their catalog structure after table upgrades, with the added ability to manage exceptional conditions gracefully.
* Fixed integration testing with random product names ([#1074](#1074)). In the recent update, the `trigger` function in the `tasks.py` module of the `ucx` framework has undergone modification to incorporate a new argument, `install_folder`, within the `Installation` object. This object is now generated locally within the `trigger` function and subsequently passed to the `run_task` function. The `install_folder` is determined by obtaining the parent directory of the `config_path` variable, transforming it into a POSIX-style path, and eliminating the leading "/Workspace" prefix. This enhancement guarantees that the `run_task` function acquires the correct installation folder for the `ucx` framework, thereby improving the overall functionality and precision of the framework. Furthermore, the `Installation.current` method has been supplanted with the newly formed `Installation` object, which now encompasses the `install_folder` argument.
* Refactor installer to separate workflows methods from the installer class ([#1055](#1055)). In this release, the installer in the `cli.py` file has been refactored to improve modularity and maintainability. The installation and workflow functionalities have been separated by importing a new class called `WorkflowsInstallation` from `databricks.labs.ucx.installer.workflows`. The `WorkspaceInstallation` class is no longer used in various functions, and the new `WorkflowsInstallation` class is used instead. Additionally, a new mixin class called `InstallationMixin` has been introduced, which includes methods for uninstalling UCX, removing jobs, and validating installation steps. The `WorkflowsInstallation` class now inherits from this mixin class. A new file, `workflows.py`, has been added to the `databricks/labs/ucx/installer` directory, which contains methods for managing Databricks jobs. The new `WorkflowsInstallation` class is responsible for deploying workflows, uploading wheels to DBFS or WSFS, and creating debug notebooks. The refactoring also includes the addition of new methods for handling specific workflows, such as `run_workflow`, `validate_step`, and `repair_run`, which are now contained in the `WorkflowsInstallation` class. The `test_install.py` file in the `tests/unit` directory has also been updated to include new imports and test functions to accommodate these changes.
* Skip unsupported locations while migrating to external location in Azure ([#1066](#1066)). In this release, we have updated the functionality of migrating to an external location in Azure. A new private method `_filter_unsupported_location` has been added to the `locations.py` file, which checks if the location URLs are supported and removes the unsupported ones from the list. Only locations starting with "abfss://" are considered supported. Unsupported locations are logged with a warning message. Additionally, a new test `test_skip_unsupported_location` has been introduced to verify that the `location_migration` function correctly skips unsupported locations during migration to external locations in Azure. The test checks if the correct log messages are generated for skipped unsupported locations, and it mocks various scenarios such as crawled HMS external locations, storage credentials, UC external locations, and installation with permission mapping. The mock crawled HMS external locations contain two unsupported locations: `adl://` and `wasbs://`. This ensures that the function handles unsupported locations correctly, avoiding any unnecessary errors or exceptions during migration.
* Triggering Assessment Workflow from Installer based on User Prompt ([#1007](#1007)). A new functionality has been added to the installer that allows users to trigger an assessment workflow based on a prompt during the installation process. The `_trigger_workflow` method has been implemented, which can be initiated with a step string argument. This method retrieves the job ID for the specified step from the `_state.jobs` dictionary, generates the job URL, and triggers the job using the `run_now` method from the `jobs` class of the Workspace object. Users will be asked to confirm triggering the assessment workflow and will have the option to open the job URL in a web browser after triggering it. A new unit test, `test_triggering_assessment_wf`, has been introduced to the `test_install.py` file to verify the functionality of triggering an assessment workflow based on user prompt. This test uses existing classes and functions, such as `MockBackend`, `MockPrompts`, `WorkspaceConfig`, and `WorkspaceInstallation`, to run the `WorkspaceInstallation.run` method with a mocked `WorkspaceConfig` object and a mock installation. The test also includes a user prompt to confirm triggering the assessment job and opening the assessment job URL. The new functionality and test improve the installation process by enabling users to easily trigger the assessment workflow based on their specific needs.
* Updated README.md for Service Principal Installation Limit ([#1076](#1076)). This release includes an update to the README.md file to clarify that installing UCX with a Service Principal is not supported. Previously, the file indicated that Databricks Workspace Administrator privileges were required for the user running the installation, but did not explicitly state that Service Principal installation is not supported. The updated text now includes this information, ensuring that users have a clear understanding of the requirements and limitations of the installation process. The rest of the file remains unchanged and continues to provide instructions for installing UCX, including required software and network access. No new methods or functionality have been added, and no existing functionality has been changed beyond the addition of this clarification. The changes in this release have been manually tested to ensure they are functioning as intended.
@nfx nfx mentioned this issue Mar 21, 2024
nfx added a commit that referenced this issue Mar 21, 2024
* Added Legacy Table ACL grants migration
([#1054](#1054)). This
commit introduces a legacy table ACL grants migration to the
`migrate-tables` workflow, resolving issue
[#340](#340) and paving the
way for follow-up PRs
[#887](#887) and
[#907](#907). A new
`GrantsCrawler` class is added for crawling grants, along with a
`GroupManager` class to manage groups during migration. The
`TablesMigrate` class is updated to accept an instance of
`GrantsCrawler` and `GroupManager` in its constructor. The migration
process has been thoroughly tested with unit tests, integration tests,
and manual testing on a staging environment. The changes include the
addition of a new Enum class `AclMigrationWhat` and updates to the
`Table` dataclass, and affect the way tables are selected for migration
based on rules. The logging and error handling have been improved in the
`skip_schema` function.
* Added `databricks labs ucx cluster-remap` command to remap legacy
cluster configurations to UC-compatible
([#994](#994)). In this
open-source library update, we have developed and added the `databricks
labs ucx cluster-remap` command, which facilitates the remapping of
legacy cluster configurations to UC-compatible ones. This new CLI
command comes with user documentation to guide the cluster remapping
process. Additionally, we have expanded the functionality of creating
and managing UC external catalogs and schemas with the inclusion of
`create-catalogs-schemas` and `revert-cluster-remap` commands. This
change does not modify existing commands or workflows and does not
introduce new tables. The `databricks labs ucx cluster-remap` command
allows users to re-map and revert the re-mapping of clusters from Unity
Catalog (UC) using the CLI, ensuring compatibility and streamlining the
migration process. The new command and associated functions have been
manually tested for functionality.
* Added `migrate-tables` workflow
([#1051](#1051)). The
`migrate-tables` workflow has been added, which allows for more
fine-grained control over the resources allocated to the workspace. This
workflow includes two new instance variables `min_workers` and
`max_workers` in the `WorkspaceConfig` class, with default values of 1
and 10 respectively. A new `trigger` function has also been introduced,
which initializes a configuration, SQL backend, and WorkspaceClient
based on the provided configuration file. The `run_task` function has
been added, which looks up the specified task, logs relevant
information, and runs the task's function with the provided arguments.
The `Task` class's `fn` attribute now includes an `Installation` object
as a parameter. Additionally, a new `migrate-tables` workflow has been
added for migrating tables from the Hive Metastore to the Unity Catalog,
along with new classes and methods for table mapping, migration status
refreshing, and migrating tables. The `migrate_dbfs_root_delta_tables`
and `migrate_external_tables_sync` methods perform migrations for Delta
tables located in the DBFS root and synchronize external tables,
respectively. These functions use the workspace client to access the
catalogs and ensure proper migration. Integration tests have also been
added for these new methods to ensure their correct operation.
* Added handling for `SYNC` command failures
([#1073](#1073)). This pull
request introduces changes to improve handling of `SYNC` command
failures during external table migrations in the Hive metastore.
Previously, the `SYNC` command's result was not checked, and failures
were not logged. Now, the `_migrate_external_table` method in
`table_migrate.py` fetches the result of the `SYNC` command execution,
logs a warning message for failures, and returns `False` if the command
fails. A new integration test has been added to simulate a failed `SYNC`
command due to a non-existent catalog and schema, ensuring the migration
tool handles such failures. A new test case has also been added to
verify the handling of `SYNC` command failures during external table
migrations, using a mock backend to simulate failures and checking for
appropriate log messages. These changes enhance the reliability and
robustness of the migration process, providing clearer error diagnosis
and handling for potential `SYNC` command failures.
* Added initial version of `databricks labs ucx migrate-local-code`
command ([#1067](#1067)). A
new `databricks labs ucx migrate-local-code` command has been added to
facilitate migration of local code to a Databricks environment,
specifically targeting Python and SQL files. This initial version is
experimental and aims to help users and administrators manage code
migration, maintain consistency across workspaces, and enhance
compatibility with the Unity Catalog, a component of Databricks' data
and AI offerings. The command introduces a new `Files` class for
applying migrations to code files, considering their language. It also
updates the `.gitignore` file and the pyproject.toml file to ensure
appropriate version control management. Additionally, new classes and
methods have been added to support code analysis, transformation, and
linting for various programming languages. These improvements will aid
in streamlining the migration process and ensuring compatibility with
Databricks' environment.
* Added instance pool to cluster policy
([#1078](#1078)). A new
field, `instance_pool_id`, has been added to the cluster policy
configuration in `policy.py`, allowing users to specify the ID of an
instance pool to be applied to all workflow clusters in the policy. This
ID can be manually set or automatically retrieved by the system. A new
private method, `_get_instance_pool_id()`, has been added to handle the
retrieval of the instance pool ID. Additionally, a new test for table
migration jobs has been added to `test_installation.py` to ensure the
migration job is correctly configured with the specified parallelism,
minimum and maximum number of workers, and instance pool ID. A new test
case for creating a cluster policy with an instance pool has also been
added to `tests/unit/installer/test_policy.py` to ensure the instance
pool is added to the cluster policy during creation. These changes
provide users with more control over instance pools and cluster
policies, and improve the overall functionality of the library.
* Fixed `ucx move` logic for `MANAGED` & `EXTERNAL` tables
([#1062](#1062)). The `ucx
move` command has been updated to allow for the movement of UC
tables/views after the table upgrade process, providing flexibility in
managing catalog structure. The command now supports moving multiple
tables simultaneously, dropping managed tables/views upon confirmation,
and deep-cloning managed tables while dropping and recreating external
tables. A refactoring of the `TableMove` class has improved code
organization and readability, and the associated unit tests have been
updated to reflect these changes. This feature is targeted towards
developers and administrators seeking to adjust their catalog structure
after table upgrades, with the added ability to manage exceptional
conditions gracefully.
* Fixed integration testing with random product names
([#1074](#1074)). In the
recent update, the `trigger` function in the `tasks.py` module of the
`ucx` framework has undergone modification to incorporate a new
argument, `install_folder`, within the `Installation` object. This
object is now generated locally within the `trigger` function and
subsequently passed to the `run_task` function. The `install_folder` is
determined by obtaining the parent directory of the `config_path`
variable, transforming it into a POSIX-style path, and eliminating the
leading "/Workspace" prefix. This enhancement guarantees that the
`run_task` function acquires the correct installation folder for the
`ucx` framework, thereby improving the overall functionality and
precision of the framework. Furthermore, the `Installation.current`
method has been supplanted with the newly formed `Installation` object,
which now encompasses the `install_folder` argument.
* Refactor installer to separate workflows methods from the installer
class ([#1055](#1055)). In
this release, the installer in the `cli.py` file has been refactored to
improve modularity and maintainability. The installation and workflow
functionalities have been separated by importing a new class called
`WorkflowsInstallation` from `databricks.labs.ucx.installer.workflows`.
The `WorkspaceInstallation` class is no longer used in various
functions, and the new `WorkflowsInstallation` class is used instead.
Additionally, a new mixin class called `InstallationMixin` has been
introduced, which includes methods for uninstalling UCX, removing jobs,
and validating installation steps. The `WorkflowsInstallation` class now
inherits from this mixin class. A new file, `workflows.py`, has been
added to the `databricks/labs/ucx/installer` directory, which contains
methods for managing Databricks jobs. The new `WorkflowsInstallation`
class is responsible for deploying workflows, uploading wheels to DBFS
or WSFS, and creating debug notebooks. The refactoring also includes the
addition of new methods for handling specific workflows, such as
`run_workflow`, `validate_step`, and `repair_run`, which are now
contained in the `WorkflowsInstallation` class. The `test_install.py`
file in the `tests/unit` directory has also been updated to include new
imports and test functions to accommodate these changes.
* Skip unsupported locations while migrating to external location in
Azure ([#1066](#1066)). In
this release, we have updated the functionality of migrating to an
external location in Azure. A new private method
`_filter_unsupported_location` has been added to the `locations.py`
file, which checks if the location URLs are supported and removes the
unsupported ones from the list. Only locations starting with "abfss://"
are considered supported. Unsupported locations are logged with a
warning message. Additionally, a new test
`test_skip_unsupported_location` has been introduced to verify that the
`location_migration` function correctly skips unsupported locations
during migration to external locations in Azure. The test checks if the
correct log messages are generated for skipped unsupported locations,
and it mocks various scenarios such as crawled HMS external locations,
storage credentials, UC external locations, and installation with
permission mapping. The mock crawled HMS external locations contain two
unsupported locations: `adl://` and `wasbs://`. This ensures that the
function handles unsupported locations correctly, avoiding any
unnecessary errors or exceptions during migration.
* Triggering Assessment Workflow from Installer based on User Prompt
([#1007](#1007)). A new
functionality has been added to the installer that allows users to
trigger an assessment workflow based on a prompt during the installation
process. The `_trigger_workflow` method has been implemented, which can
be initiated with a step string argument. This method retrieves the job
ID for the specified step from the `_state.jobs` dictionary, generates
the job URL, and triggers the job using the `run_now` method from the
`jobs` class of the Workspace object. Users will be asked to confirm
triggering the assessment workflow and will have the option to open the
job URL in a web browser after triggering it. A new unit test,
`test_triggering_assessment_wf`, has been introduced to the
`test_install.py` file to verify the functionality of triggering an
assessment workflow based on user prompt. This test uses existing
classes and functions, such as `MockBackend`, `MockPrompts`,
`WorkspaceConfig`, and `WorkspaceInstallation`, to run the
`WorkspaceInstallation.run` method with a mocked `WorkspaceConfig`
object and a mock installation. The test also includes a user prompt to
confirm triggering the assessment job and opening the assessment job
URL. The new functionality and test improve the installation process by
enabling users to easily trigger the assessment workflow based on their
specific needs.
* Updated README.md for Service Principal Installation Limit
([#1076](#1076)). This
release includes an update to the README.md file to clarify that
installing UCX with a Service Principal is not supported. Previously,
the file indicated that Databricks Workspace Administrator privileges
were required for the user running the installation, but did not
explicitly state that Service Principal installation is not supported.
The updated text now includes this information, ensuring that users have
a clear understanding of the requirements and limitations of the
installation process. The rest of the file remains unchanged and
continues to provide instructions for installing UCX, including required
software and network access. No new methods or functionality have been
added, and no existing functionality has been changed beyond the
addition of this clarification. The changes in this release have been
manually tested to ensure they are functioning as intended.
dmoore247 pushed a commit that referenced this issue Mar 23, 2024
## Changes
<!-- Summary of your changes that are easy to understand. Add
screenshots when necessary -->

### Linked issues
<!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes,
fixed, resolve, resolves, resolved. See
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
-->

Resolves #340
To be followed up with PRs for  #887 #907

### Functionality 

- [x] modified existing workflow: `migrate-tables`

### Tests
<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [x] manually tested
- [x] added unit tests
- [x] added integration tests
- [x] verified on staging environment (screenshot attached)
dmoore247 pushed a commit that referenced this issue Mar 23, 2024
* Added Legacy Table ACL grants migration
([#1054](#1054)). This
commit introduces a legacy table ACL grants migration to the
`migrate-tables` workflow, resolving issue
[#340](#340) and paving the
way for follow-up PRs
[#887](#887) and
[#907](#907). A new
`GrantsCrawler` class is added for crawling grants, along with a
`GroupManager` class to manage groups during migration. The
`TablesMigrate` class is updated to accept an instance of
`GrantsCrawler` and `GroupManager` in its constructor. The migration
process has been thoroughly tested with unit tests, integration tests,
and manual testing on a staging environment. The changes include the
addition of a new Enum class `AclMigrationWhat` and updates to the
`Table` dataclass, and affect the way tables are selected for migration
based on rules. The logging and error handling have been improved in the
`skip_schema` function.
* Added `databricks labs ucx cluster-remap` command to remap legacy
cluster configurations to UC-compatible
([#994](#994)). In this
open-source library update, we have developed and added the `databricks
labs ucx cluster-remap` command, which facilitates the remapping of
legacy cluster configurations to UC-compatible ones. This new CLI
command comes with user documentation to guide the cluster remapping
process. Additionally, we have expanded the functionality of creating
and managing UC external catalogs and schemas with the inclusion of
`create-catalogs-schemas` and `revert-cluster-remap` commands. This
change does not modify existing commands or workflows and does not
introduce new tables. The `databricks labs ucx cluster-remap` command
allows users to re-map and revert the re-mapping of clusters from Unity
Catalog (UC) using the CLI, ensuring compatibility and streamlining the
migration process. The new command and associated functions have been
manually tested for functionality.
* Added `migrate-tables` workflow
([#1051](#1051)). The
`migrate-tables` workflow has been added, which allows for more
fine-grained control over the resources allocated to the workspace. This
workflow includes two new instance variables `min_workers` and
`max_workers` in the `WorkspaceConfig` class, with default values of 1
and 10 respectively. A new `trigger` function has also been introduced,
which initializes a configuration, SQL backend, and WorkspaceClient
based on the provided configuration file. The `run_task` function has
been added, which looks up the specified task, logs relevant
information, and runs the task's function with the provided arguments.
The `Task` class's `fn` attribute now includes an `Installation` object
as a parameter. Additionally, a new `migrate-tables` workflow has been
added for migrating tables from the Hive Metastore to the Unity Catalog,
along with new classes and methods for table mapping, migration status
refreshing, and migrating tables. The `migrate_dbfs_root_delta_tables`
and `migrate_external_tables_sync` methods perform migrations for Delta
tables located in the DBFS root and synchronize external tables,
respectively. These functions use the workspace client to access the
catalogs and ensure proper migration. Integration tests have also been
added for these new methods to ensure their correct operation.
* Added handling for `SYNC` command failures
([#1073](#1073)). This pull
request introduces changes to improve handling of `SYNC` command
failures during external table migrations in the Hive metastore.
Previously, the `SYNC` command's result was not checked, and failures
were not logged. Now, the `_migrate_external_table` method in
`table_migrate.py` fetches the result of the `SYNC` command execution,
logs a warning message for failures, and returns `False` if the command
fails. A new integration test has been added to simulate a failed `SYNC`
command due to a non-existent catalog and schema, ensuring the migration
tool handles such failures. A new test case has also been added to
verify the handling of `SYNC` command failures during external table
migrations, using a mock backend to simulate failures and checking for
appropriate log messages. These changes enhance the reliability and
robustness of the migration process, providing clearer error diagnosis
and handling for potential `SYNC` command failures.
* Added initial version of `databricks labs ucx migrate-local-code`
command ([#1067](#1067)). A
new `databricks labs ucx migrate-local-code` command has been added to
facilitate migration of local code to a Databricks environment,
specifically targeting Python and SQL files. This initial version is
experimental and aims to help users and administrators manage code
migration, maintain consistency across workspaces, and enhance
compatibility with the Unity Catalog, a component of Databricks' data
and AI offerings. The command introduces a new `Files` class for
applying migrations to code files, considering their language. It also
updates the `.gitignore` file and the pyproject.toml file to ensure
appropriate version control management. Additionally, new classes and
methods have been added to support code analysis, transformation, and
linting for various programming languages. These improvements will aid
in streamlining the migration process and ensuring compatibility with
Databricks' environment.
* Added instance pool to cluster policy
([#1078](#1078)). A new
field, `instance_pool_id`, has been added to the cluster policy
configuration in `policy.py`, allowing users to specify the ID of an
instance pool to be applied to all workflow clusters in the policy. This
ID can be manually set or automatically retrieved by the system. A new
private method, `_get_instance_pool_id()`, has been added to handle the
retrieval of the instance pool ID. Additionally, a new test for table
migration jobs has been added to `test_installation.py` to ensure the
migration job is correctly configured with the specified parallelism,
minimum and maximum number of workers, and instance pool ID. A new test
case for creating a cluster policy with an instance pool has also been
added to `tests/unit/installer/test_policy.py` to ensure the instance
pool is added to the cluster policy during creation. These changes
provide users with more control over instance pools and cluster
policies, and improve the overall functionality of the library.
* Fixed `ucx move` logic for `MANAGED` & `EXTERNAL` tables
([#1062](#1062)). The `ucx
move` command has been updated to allow for the movement of UC
tables/views after the table upgrade process, providing flexibility in
managing catalog structure. The command now supports moving multiple
tables simultaneously, dropping managed tables/views upon confirmation,
and deep-cloning managed tables while dropping and recreating external
tables. A refactoring of the `TableMove` class has improved code
organization and readability, and the associated unit tests have been
updated to reflect these changes. This feature is targeted towards
developers and administrators seeking to adjust their catalog structure
after table upgrades, with the added ability to manage exceptional
conditions gracefully.
* Fixed integration testing with random product names
([#1074](#1074)). In the
recent update, the `trigger` function in the `tasks.py` module of the
`ucx` framework has undergone modification to incorporate a new
argument, `install_folder`, within the `Installation` object. This
object is now generated locally within the `trigger` function and
subsequently passed to the `run_task` function. The `install_folder` is
determined by obtaining the parent directory of the `config_path`
variable, transforming it into a POSIX-style path, and eliminating the
leading "/Workspace" prefix. This enhancement guarantees that the
`run_task` function acquires the correct installation folder for the
`ucx` framework, thereby improving the overall functionality and
precision of the framework. Furthermore, the `Installation.current`
method has been supplanted with the newly formed `Installation` object,
which now encompasses the `install_folder` argument.
* Refactor installer to separate workflows methods from the installer
class ([#1055](#1055)). In
this release, the installer in the `cli.py` file has been refactored to
improve modularity and maintainability. The installation and workflow
functionalities have been separated by importing a new class called
`WorkflowsInstallation` from `databricks.labs.ucx.installer.workflows`.
The `WorkspaceInstallation` class is no longer used in various
functions, and the new `WorkflowsInstallation` class is used instead.
Additionally, a new mixin class called `InstallationMixin` has been
introduced, which includes methods for uninstalling UCX, removing jobs,
and validating installation steps. The `WorkflowsInstallation` class now
inherits from this mixin class. A new file, `workflows.py`, has been
added to the `databricks/labs/ucx/installer` directory, which contains
methods for managing Databricks jobs. The new `WorkflowsInstallation`
class is responsible for deploying workflows, uploading wheels to DBFS
or WSFS, and creating debug notebooks. The refactoring also includes the
addition of new methods for handling specific workflows, such as
`run_workflow`, `validate_step`, and `repair_run`, which are now
contained in the `WorkflowsInstallation` class. The `test_install.py`
file in the `tests/unit` directory has also been updated to include new
imports and test functions to accommodate these changes.
* Skip unsupported locations while migrating to external location in
Azure ([#1066](#1066)). In
this release, we have updated the functionality of migrating to an
external location in Azure. A new private method
`_filter_unsupported_location` has been added to the `locations.py`
file, which checks if the location URLs are supported and removes the
unsupported ones from the list. Only locations starting with "abfss://"
are considered supported. Unsupported locations are logged with a
warning message. Additionally, a new test
`test_skip_unsupported_location` has been introduced to verify that the
`location_migration` function correctly skips unsupported locations
during migration to external locations in Azure. The test checks if the
correct log messages are generated for skipped unsupported locations,
and it mocks various scenarios such as crawled HMS external locations,
storage credentials, UC external locations, and installation with
permission mapping. The mock crawled HMS external locations contain two
unsupported locations: `adl://` and `wasbs://`. This ensures that the
function handles unsupported locations correctly, avoiding any
unnecessary errors or exceptions during migration.
* Triggering Assessment Workflow from Installer based on User Prompt
([#1007](#1007)). A new
functionality has been added to the installer that allows users to
trigger an assessment workflow based on a prompt during the installation
process. The `_trigger_workflow` method has been implemented, which can
be initiated with a step string argument. This method retrieves the job
ID for the specified step from the `_state.jobs` dictionary, generates
the job URL, and triggers the job using the `run_now` method from the
`jobs` class of the Workspace object. Users will be asked to confirm
triggering the assessment workflow and will have the option to open the
job URL in a web browser after triggering it. A new unit test,
`test_triggering_assessment_wf`, has been introduced to the
`test_install.py` file to verify the functionality of triggering an
assessment workflow based on user prompt. This test uses existing
classes and functions, such as `MockBackend`, `MockPrompts`,
`WorkspaceConfig`, and `WorkspaceInstallation`, to run the
`WorkspaceInstallation.run` method with a mocked `WorkspaceConfig`
object and a mock installation. The test also includes a user prompt to
confirm triggering the assessment job and opening the assessment job
URL. The new functionality and test improve the installation process by
enabling users to easily trigger the assessment workflow based on their
specific needs.
* Updated README.md for Service Principal Installation Limit
([#1076](#1076)). This
release includes an update to the README.md file to clarify that
installing UCX with a Service Principal is not supported. Previously,
the file indicated that Databricks Workspace Administrator privileges
were required for the user running the installation, but did not
explicitly state that Service Principal installation is not supported.
The updated text now includes this information, ensuring that users have
a clear understanding of the requirements and limitations of the
installation process. The rest of the file remains unchanged and
continues to provide instructions for installing UCX, including required
software and network access. No new methods or functionality have been
added, and no existing functionality has been changed beyond the
addition of this clarification. The changes in this release have been
manually tested to ensure they are functioning as intended.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request migrate/external go/uc/upgrade SYNC EXTERNAL TABLES step migrate/managed go/uc/upgrade Upgrade Managed Tables and Jobs
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants