Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor installer to separate workflows methods from the installer class #1055

Merged
merged 27 commits into from
Mar 21, 2024

Conversation

FastLee
Copy link
Contributor

@FastLee FastLee commented Mar 14, 2024

In this release, the installer class has undergone refactoring to separate workflow methods into a new class called WorkflowsInstallation. This change can be seen in the modification of the import statement for WorkspaceInstallation, which is now imported from databricks.labs.ucx.installer.workflows. The WorkflowsInstallation class is used to manage workflows in the databricks labs installation, handling the creation, updating, and monitoring of jobs for various steps in the UCX installation process. The class takes in several parameters, including the workspace_config, installation, ws, product_info, prompts, product_info, and timedelta objects. It is used alongside the existing WorkspaceInstallation class to provide a more comprehensive installation and management solution.

The WorkflowsInstallation class is used to install workflows-related components, such as jobs, dashboards, and policies. The class includes methods to create, repair, and validate jobs, and to get the latest job status. It also handles the creation of a debug notebook and exceptions while running tasks. The code includes helper methods to check the status of jobs, repair jobs, and handle exceptions.

Additionally, the WorkspaceInstallation class has been updated to include a workflows_installer parameter, which is an instance of the WorkflowsInstallation class. The wheels parameter has been removed from the WorkspaceInstallation constructor and is instead passed to the WorkflowsInstallation constructor. The code also includes an updated new_installation function, which is used to create a new installation of the databricks labs software and is now responsible for initializing both the WorkspaceInstallation and WorkflowsInstallation classes.

The test_install_cluster_override_jobs, test_write_protected_dbfs, test_remove_jobs, test_remove_secret_scope, test_create_cluster_policy, test_remove_warehouse, test_run_workflow, test_repair_run, test_latest_job_status, test_check_inventory_database_exists, and test_user_not_admin test functions have been updated to use the workflows_installer parameter instead of the wheels parameter in the WorkspaceInstallation constructor. The test_save_config and test_fresh_install test functions have been updated to include new parameters in the config object, such as min_workers and max_workers. The test_install_with_external_hms_conf test function has been removed and replaced with a new test function test_remove_jobs which tests the removal of jobs.

Overall, these changes provide a more modular and organized codebase, making it easier to maintain and extend the installation and management functionality of the databricks labs software.

Copy link

codecov bot commented Mar 14, 2024

Codecov Report

Attention: Patch coverage is 90.65657% with 37 lines in your changes are missing coverage. Please review.

Project coverage is 89.52%. Comparing base (db099c6) to head (35dcb00).

Files Patch % Lines
src/databricks/labs/ucx/installer/workflows.py 90.16% 18 Missing and 12 partials ⚠️
src/databricks/labs/ucx/installer/mixins.py 93.84% 2 Missing and 2 partials ⚠️
src/databricks/labs/ucx/install.py 86.95% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1055      +/-   ##
==========================================
+ Coverage   89.14%   89.52%   +0.38%     
==========================================
  Files          55       57       +2     
  Lines        6735     6780      +45     
  Branches     1215     1213       -2     
==========================================
+ Hits         6004     6070      +66     
+ Misses        481      462      -19     
+ Partials      250      248       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -39,7 +40,7 @@
@ucx.command
def workflows(w: WorkspaceClient):
"""Show deployed workflows and their state"""
installation = WorkspaceInstallation.current(w)
installation = WorkflowsInstallation.current(w)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
installation = WorkflowsInstallation.current(w)
installation = Workflows.current(w)

"run_id": "{{run_id}}",
"parent_run_id": "{{parent_run_id}}",
}
DEBUG_NOTEBOOK = """# Databricks notebook source
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug notebook probably can stay here

Copy link

github-actions bot commented Mar 20, 2024

✅ 112/112 passed, 7 flaky, 20 skipped, 1h57m7s total

Flaky tests:

  • 🤪 test_delete_ws_groups_should_delete_renamed_and_reflected_groups_only (1m0.098s)
  • 🤪 test_table_migration_job_cluster_override (6m2.677s)
  • 🤪 test_repair_run_workflow_job (7m12.07s)
  • 🤪 test_job_failure_propagates_correct_error_message_and_logs (10m21.136s)
  • 🤪 test_running_real_remove_backup_groups_job (10m6.325s)
  • 🤪 test_running_real_migrate_groups_job (10m15.019s)
  • 🤪 test_running_real_assessment_job (9m42.556s)

Running from acceptance #1695

@nfx nfx merged commit 6ab4ae2 into main Mar 21, 2024
7 checks passed
@nfx nfx deleted the refactor-installer-1053 branch March 21, 2024 05:58
nfx added a commit that referenced this pull request Mar 21, 2024
* Added Legacy Table ACL grants migration ([#1054](#1054)). This commit introduces a legacy table ACL grants migration to the `migrate-tables` workflow, resolving issue [#340](#340) and paving the way for follow-up PRs [#887](#887) and [#907](#907). A new `GrantsCrawler` class is added for crawling grants, along with a `GroupManager` class to manage groups during migration. The `TablesMigrate` class is updated to accept an instance of `GrantsCrawler` and `GroupManager` in its constructor. The migration process has been thoroughly tested with unit tests, integration tests, and manual testing on a staging environment. The changes include the addition of a new Enum class `AclMigrationWhat` and updates to the `Table` dataclass, and affect the way tables are selected for migration based on rules. The logging and error handling have been improved in the `skip_schema` function.
* Added `databricks labs ucx cluster-remap` command to remap legacy cluster configurations to UC-compatible ([#994](#994)). In this open-source library update, we have developed and added the `databricks labs ucx cluster-remap` command, which facilitates the remapping of legacy cluster configurations to UC-compatible ones. This new CLI command comes with user documentation to guide the cluster remapping process. Additionally, we have expanded the functionality of creating and managing UC external catalogs and schemas with the inclusion of `create-catalogs-schemas` and `revert-cluster-remap` commands. This change does not modify existing commands or workflows and does not introduce new tables. The `databricks labs ucx cluster-remap` command allows users to re-map and revert the re-mapping of clusters from Unity Catalog (UC) using the CLI, ensuring compatibility and streamlining the migration process. The new command and associated functions have been manually tested for functionality.
* Added `migrate-tables` workflow ([#1051](#1051)). The `migrate-tables` workflow has been added, which allows for more fine-grained control over the resources allocated to the workspace. This workflow includes two new instance variables `min_workers` and `max_workers` in the `WorkspaceConfig` class, with default values of 1 and 10 respectively. A new `trigger` function has also been introduced, which initializes a configuration, SQL backend, and WorkspaceClient based on the provided configuration file. The `run_task` function has been added, which looks up the specified task, logs relevant information, and runs the task's function with the provided arguments. The `Task` class's `fn` attribute now includes an `Installation` object as a parameter. Additionally, a new `migrate-tables` workflow has been added for migrating tables from the Hive Metastore to the Unity Catalog, along with new classes and methods for table mapping, migration status refreshing, and migrating tables. The `migrate_dbfs_root_delta_tables` and `migrate_external_tables_sync` methods perform migrations for Delta tables located in the DBFS root and synchronize external tables, respectively. These functions use the workspace client to access the catalogs and ensure proper migration. Integration tests have also been added for these new methods to ensure their correct operation.
* Added handling for `SYNC` command failures ([#1073](#1073)). This pull request introduces changes to improve handling of `SYNC` command failures during external table migrations in the Hive metastore. Previously, the `SYNC` command's result was not checked, and failures were not logged. Now, the `_migrate_external_table` method in `table_migrate.py` fetches the result of the `SYNC` command execution, logs a warning message for failures, and returns `False` if the command fails. A new integration test has been added to simulate a failed `SYNC` command due to a non-existent catalog and schema, ensuring the migration tool handles such failures. A new test case has also been added to verify the handling of `SYNC` command failures during external table migrations, using a mock backend to simulate failures and checking for appropriate log messages. These changes enhance the reliability and robustness of the migration process, providing clearer error diagnosis and handling for potential `SYNC` command failures.
* Added initial version of `databricks labs ucx migrate-local-code` command ([#1067](#1067)). A new `databricks labs ucx migrate-local-code` command has been added to facilitate migration of local code to a Databricks environment, specifically targeting Python and SQL files. This initial version is experimental and aims to help users and administrators manage code migration, maintain consistency across workspaces, and enhance compatibility with the Unity Catalog, a component of Databricks' data and AI offerings. The command introduces a new `Files` class for applying migrations to code files, considering their language. It also updates the `.gitignore` file and the pyproject.toml file to ensure appropriate version control management. Additionally, new classes and methods have been added to support code analysis, transformation, and linting for various programming languages. These improvements will aid in streamlining the migration process and ensuring compatibility with Databricks' environment.
* Added instance pool to cluster policy ([#1078](#1078)). A new field, `instance_pool_id`, has been added to the cluster policy configuration in `policy.py`, allowing users to specify the ID of an instance pool to be applied to all workflow clusters in the policy. This ID can be manually set or automatically retrieved by the system. A new private method, `_get_instance_pool_id()`, has been added to handle the retrieval of the instance pool ID. Additionally, a new test for table migration jobs has been added to `test_installation.py` to ensure the migration job is correctly configured with the specified parallelism, minimum and maximum number of workers, and instance pool ID. A new test case for creating a cluster policy with an instance pool has also been added to `tests/unit/installer/test_policy.py` to ensure the instance pool is added to the cluster policy during creation. These changes provide users with more control over instance pools and cluster policies, and improve the overall functionality of the library.
* Fixed `ucx move` logic for `MANAGED` & `EXTERNAL` tables ([#1062](#1062)). The `ucx move` command has been updated to allow for the movement of UC tables/views after the table upgrade process, providing flexibility in managing catalog structure. The command now supports moving multiple tables simultaneously, dropping managed tables/views upon confirmation, and deep-cloning managed tables while dropping and recreating external tables. A refactoring of the `TableMove` class has improved code organization and readability, and the associated unit tests have been updated to reflect these changes. This feature is targeted towards developers and administrators seeking to adjust their catalog structure after table upgrades, with the added ability to manage exceptional conditions gracefully.
* Fixed integration testing with random product names ([#1074](#1074)). In the recent update, the `trigger` function in the `tasks.py` module of the `ucx` framework has undergone modification to incorporate a new argument, `install_folder`, within the `Installation` object. This object is now generated locally within the `trigger` function and subsequently passed to the `run_task` function. The `install_folder` is determined by obtaining the parent directory of the `config_path` variable, transforming it into a POSIX-style path, and eliminating the leading "/Workspace" prefix. This enhancement guarantees that the `run_task` function acquires the correct installation folder for the `ucx` framework, thereby improving the overall functionality and precision of the framework. Furthermore, the `Installation.current` method has been supplanted with the newly formed `Installation` object, which now encompasses the `install_folder` argument.
* Refactor installer to separate workflows methods from the installer class ([#1055](#1055)). In this release, the installer in the `cli.py` file has been refactored to improve modularity and maintainability. The installation and workflow functionalities have been separated by importing a new class called `WorkflowsInstallation` from `databricks.labs.ucx.installer.workflows`. The `WorkspaceInstallation` class is no longer used in various functions, and the new `WorkflowsInstallation` class is used instead. Additionally, a new mixin class called `InstallationMixin` has been introduced, which includes methods for uninstalling UCX, removing jobs, and validating installation steps. The `WorkflowsInstallation` class now inherits from this mixin class. A new file, `workflows.py`, has been added to the `databricks/labs/ucx/installer` directory, which contains methods for managing Databricks jobs. The new `WorkflowsInstallation` class is responsible for deploying workflows, uploading wheels to DBFS or WSFS, and creating debug notebooks. The refactoring also includes the addition of new methods for handling specific workflows, such as `run_workflow`, `validate_step`, and `repair_run`, which are now contained in the `WorkflowsInstallation` class. The `test_install.py` file in the `tests/unit` directory has also been updated to include new imports and test functions to accommodate these changes.
* Skip unsupported locations while migrating to external location in Azure ([#1066](#1066)). In this release, we have updated the functionality of migrating to an external location in Azure. A new private method `_filter_unsupported_location` has been added to the `locations.py` file, which checks if the location URLs are supported and removes the unsupported ones from the list. Only locations starting with "abfss://" are considered supported. Unsupported locations are logged with a warning message. Additionally, a new test `test_skip_unsupported_location` has been introduced to verify that the `location_migration` function correctly skips unsupported locations during migration to external locations in Azure. The test checks if the correct log messages are generated for skipped unsupported locations, and it mocks various scenarios such as crawled HMS external locations, storage credentials, UC external locations, and installation with permission mapping. The mock crawled HMS external locations contain two unsupported locations: `adl://` and `wasbs://`. This ensures that the function handles unsupported locations correctly, avoiding any unnecessary errors or exceptions during migration.
* Triggering Assessment Workflow from Installer based on User Prompt ([#1007](#1007)). A new functionality has been added to the installer that allows users to trigger an assessment workflow based on a prompt during the installation process. The `_trigger_workflow` method has been implemented, which can be initiated with a step string argument. This method retrieves the job ID for the specified step from the `_state.jobs` dictionary, generates the job URL, and triggers the job using the `run_now` method from the `jobs` class of the Workspace object. Users will be asked to confirm triggering the assessment workflow and will have the option to open the job URL in a web browser after triggering it. A new unit test, `test_triggering_assessment_wf`, has been introduced to the `test_install.py` file to verify the functionality of triggering an assessment workflow based on user prompt. This test uses existing classes and functions, such as `MockBackend`, `MockPrompts`, `WorkspaceConfig`, and `WorkspaceInstallation`, to run the `WorkspaceInstallation.run` method with a mocked `WorkspaceConfig` object and a mock installation. The test also includes a user prompt to confirm triggering the assessment job and opening the assessment job URL. The new functionality and test improve the installation process by enabling users to easily trigger the assessment workflow based on their specific needs.
* Updated README.md for Service Principal Installation Limit ([#1076](#1076)). This release includes an update to the README.md file to clarify that installing UCX with a Service Principal is not supported. Previously, the file indicated that Databricks Workspace Administrator privileges were required for the user running the installation, but did not explicitly state that Service Principal installation is not supported. The updated text now includes this information, ensuring that users have a clear understanding of the requirements and limitations of the installation process. The rest of the file remains unchanged and continues to provide instructions for installing UCX, including required software and network access. No new methods or functionality have been added, and no existing functionality has been changed beyond the addition of this clarification. The changes in this release have been manually tested to ensure they are functioning as intended.
@nfx nfx mentioned this pull request Mar 21, 2024
nfx added a commit that referenced this pull request Mar 21, 2024
* Added Legacy Table ACL grants migration
([#1054](#1054)). This
commit introduces a legacy table ACL grants migration to the
`migrate-tables` workflow, resolving issue
[#340](#340) and paving the
way for follow-up PRs
[#887](#887) and
[#907](#907). A new
`GrantsCrawler` class is added for crawling grants, along with a
`GroupManager` class to manage groups during migration. The
`TablesMigrate` class is updated to accept an instance of
`GrantsCrawler` and `GroupManager` in its constructor. The migration
process has been thoroughly tested with unit tests, integration tests,
and manual testing on a staging environment. The changes include the
addition of a new Enum class `AclMigrationWhat` and updates to the
`Table` dataclass, and affect the way tables are selected for migration
based on rules. The logging and error handling have been improved in the
`skip_schema` function.
* Added `databricks labs ucx cluster-remap` command to remap legacy
cluster configurations to UC-compatible
([#994](#994)). In this
open-source library update, we have developed and added the `databricks
labs ucx cluster-remap` command, which facilitates the remapping of
legacy cluster configurations to UC-compatible ones. This new CLI
command comes with user documentation to guide the cluster remapping
process. Additionally, we have expanded the functionality of creating
and managing UC external catalogs and schemas with the inclusion of
`create-catalogs-schemas` and `revert-cluster-remap` commands. This
change does not modify existing commands or workflows and does not
introduce new tables. The `databricks labs ucx cluster-remap` command
allows users to re-map and revert the re-mapping of clusters from Unity
Catalog (UC) using the CLI, ensuring compatibility and streamlining the
migration process. The new command and associated functions have been
manually tested for functionality.
* Added `migrate-tables` workflow
([#1051](#1051)). The
`migrate-tables` workflow has been added, which allows for more
fine-grained control over the resources allocated to the workspace. This
workflow includes two new instance variables `min_workers` and
`max_workers` in the `WorkspaceConfig` class, with default values of 1
and 10 respectively. A new `trigger` function has also been introduced,
which initializes a configuration, SQL backend, and WorkspaceClient
based on the provided configuration file. The `run_task` function has
been added, which looks up the specified task, logs relevant
information, and runs the task's function with the provided arguments.
The `Task` class's `fn` attribute now includes an `Installation` object
as a parameter. Additionally, a new `migrate-tables` workflow has been
added for migrating tables from the Hive Metastore to the Unity Catalog,
along with new classes and methods for table mapping, migration status
refreshing, and migrating tables. The `migrate_dbfs_root_delta_tables`
and `migrate_external_tables_sync` methods perform migrations for Delta
tables located in the DBFS root and synchronize external tables,
respectively. These functions use the workspace client to access the
catalogs and ensure proper migration. Integration tests have also been
added for these new methods to ensure their correct operation.
* Added handling for `SYNC` command failures
([#1073](#1073)). This pull
request introduces changes to improve handling of `SYNC` command
failures during external table migrations in the Hive metastore.
Previously, the `SYNC` command's result was not checked, and failures
were not logged. Now, the `_migrate_external_table` method in
`table_migrate.py` fetches the result of the `SYNC` command execution,
logs a warning message for failures, and returns `False` if the command
fails. A new integration test has been added to simulate a failed `SYNC`
command due to a non-existent catalog and schema, ensuring the migration
tool handles such failures. A new test case has also been added to
verify the handling of `SYNC` command failures during external table
migrations, using a mock backend to simulate failures and checking for
appropriate log messages. These changes enhance the reliability and
robustness of the migration process, providing clearer error diagnosis
and handling for potential `SYNC` command failures.
* Added initial version of `databricks labs ucx migrate-local-code`
command ([#1067](#1067)). A
new `databricks labs ucx migrate-local-code` command has been added to
facilitate migration of local code to a Databricks environment,
specifically targeting Python and SQL files. This initial version is
experimental and aims to help users and administrators manage code
migration, maintain consistency across workspaces, and enhance
compatibility with the Unity Catalog, a component of Databricks' data
and AI offerings. The command introduces a new `Files` class for
applying migrations to code files, considering their language. It also
updates the `.gitignore` file and the pyproject.toml file to ensure
appropriate version control management. Additionally, new classes and
methods have been added to support code analysis, transformation, and
linting for various programming languages. These improvements will aid
in streamlining the migration process and ensuring compatibility with
Databricks' environment.
* Added instance pool to cluster policy
([#1078](#1078)). A new
field, `instance_pool_id`, has been added to the cluster policy
configuration in `policy.py`, allowing users to specify the ID of an
instance pool to be applied to all workflow clusters in the policy. This
ID can be manually set or automatically retrieved by the system. A new
private method, `_get_instance_pool_id()`, has been added to handle the
retrieval of the instance pool ID. Additionally, a new test for table
migration jobs has been added to `test_installation.py` to ensure the
migration job is correctly configured with the specified parallelism,
minimum and maximum number of workers, and instance pool ID. A new test
case for creating a cluster policy with an instance pool has also been
added to `tests/unit/installer/test_policy.py` to ensure the instance
pool is added to the cluster policy during creation. These changes
provide users with more control over instance pools and cluster
policies, and improve the overall functionality of the library.
* Fixed `ucx move` logic for `MANAGED` & `EXTERNAL` tables
([#1062](#1062)). The `ucx
move` command has been updated to allow for the movement of UC
tables/views after the table upgrade process, providing flexibility in
managing catalog structure. The command now supports moving multiple
tables simultaneously, dropping managed tables/views upon confirmation,
and deep-cloning managed tables while dropping and recreating external
tables. A refactoring of the `TableMove` class has improved code
organization and readability, and the associated unit tests have been
updated to reflect these changes. This feature is targeted towards
developers and administrators seeking to adjust their catalog structure
after table upgrades, with the added ability to manage exceptional
conditions gracefully.
* Fixed integration testing with random product names
([#1074](#1074)). In the
recent update, the `trigger` function in the `tasks.py` module of the
`ucx` framework has undergone modification to incorporate a new
argument, `install_folder`, within the `Installation` object. This
object is now generated locally within the `trigger` function and
subsequently passed to the `run_task` function. The `install_folder` is
determined by obtaining the parent directory of the `config_path`
variable, transforming it into a POSIX-style path, and eliminating the
leading "/Workspace" prefix. This enhancement guarantees that the
`run_task` function acquires the correct installation folder for the
`ucx` framework, thereby improving the overall functionality and
precision of the framework. Furthermore, the `Installation.current`
method has been supplanted with the newly formed `Installation` object,
which now encompasses the `install_folder` argument.
* Refactor installer to separate workflows methods from the installer
class ([#1055](#1055)). In
this release, the installer in the `cli.py` file has been refactored to
improve modularity and maintainability. The installation and workflow
functionalities have been separated by importing a new class called
`WorkflowsInstallation` from `databricks.labs.ucx.installer.workflows`.
The `WorkspaceInstallation` class is no longer used in various
functions, and the new `WorkflowsInstallation` class is used instead.
Additionally, a new mixin class called `InstallationMixin` has been
introduced, which includes methods for uninstalling UCX, removing jobs,
and validating installation steps. The `WorkflowsInstallation` class now
inherits from this mixin class. A new file, `workflows.py`, has been
added to the `databricks/labs/ucx/installer` directory, which contains
methods for managing Databricks jobs. The new `WorkflowsInstallation`
class is responsible for deploying workflows, uploading wheels to DBFS
or WSFS, and creating debug notebooks. The refactoring also includes the
addition of new methods for handling specific workflows, such as
`run_workflow`, `validate_step`, and `repair_run`, which are now
contained in the `WorkflowsInstallation` class. The `test_install.py`
file in the `tests/unit` directory has also been updated to include new
imports and test functions to accommodate these changes.
* Skip unsupported locations while migrating to external location in
Azure ([#1066](#1066)). In
this release, we have updated the functionality of migrating to an
external location in Azure. A new private method
`_filter_unsupported_location` has been added to the `locations.py`
file, which checks if the location URLs are supported and removes the
unsupported ones from the list. Only locations starting with "abfss://"
are considered supported. Unsupported locations are logged with a
warning message. Additionally, a new test
`test_skip_unsupported_location` has been introduced to verify that the
`location_migration` function correctly skips unsupported locations
during migration to external locations in Azure. The test checks if the
correct log messages are generated for skipped unsupported locations,
and it mocks various scenarios such as crawled HMS external locations,
storage credentials, UC external locations, and installation with
permission mapping. The mock crawled HMS external locations contain two
unsupported locations: `adl://` and `wasbs://`. This ensures that the
function handles unsupported locations correctly, avoiding any
unnecessary errors or exceptions during migration.
* Triggering Assessment Workflow from Installer based on User Prompt
([#1007](#1007)). A new
functionality has been added to the installer that allows users to
trigger an assessment workflow based on a prompt during the installation
process. The `_trigger_workflow` method has been implemented, which can
be initiated with a step string argument. This method retrieves the job
ID for the specified step from the `_state.jobs` dictionary, generates
the job URL, and triggers the job using the `run_now` method from the
`jobs` class of the Workspace object. Users will be asked to confirm
triggering the assessment workflow and will have the option to open the
job URL in a web browser after triggering it. A new unit test,
`test_triggering_assessment_wf`, has been introduced to the
`test_install.py` file to verify the functionality of triggering an
assessment workflow based on user prompt. This test uses existing
classes and functions, such as `MockBackend`, `MockPrompts`,
`WorkspaceConfig`, and `WorkspaceInstallation`, to run the
`WorkspaceInstallation.run` method with a mocked `WorkspaceConfig`
object and a mock installation. The test also includes a user prompt to
confirm triggering the assessment job and opening the assessment job
URL. The new functionality and test improve the installation process by
enabling users to easily trigger the assessment workflow based on their
specific needs.
* Updated README.md for Service Principal Installation Limit
([#1076](#1076)). This
release includes an update to the README.md file to clarify that
installing UCX with a Service Principal is not supported. Previously,
the file indicated that Databricks Workspace Administrator privileges
were required for the user running the installation, but did not
explicitly state that Service Principal installation is not supported.
The updated text now includes this information, ensuring that users have
a clear understanding of the requirements and limitations of the
installation process. The rest of the file remains unchanged and
continues to provide instructions for installing UCX, including required
software and network access. No new methods or functionality have been
added, and no existing functionality has been changed beyond the
addition of this clarification. The changes in this release have been
manually tested to ensure they are functioning as intended.
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
* Added Legacy Table ACL grants migration
([#1054](#1054)). This
commit introduces a legacy table ACL grants migration to the
`migrate-tables` workflow, resolving issue
[#340](#340) and paving the
way for follow-up PRs
[#887](#887) and
[#907](#907). A new
`GrantsCrawler` class is added for crawling grants, along with a
`GroupManager` class to manage groups during migration. The
`TablesMigrate` class is updated to accept an instance of
`GrantsCrawler` and `GroupManager` in its constructor. The migration
process has been thoroughly tested with unit tests, integration tests,
and manual testing on a staging environment. The changes include the
addition of a new Enum class `AclMigrationWhat` and updates to the
`Table` dataclass, and affect the way tables are selected for migration
based on rules. The logging and error handling have been improved in the
`skip_schema` function.
* Added `databricks labs ucx cluster-remap` command to remap legacy
cluster configurations to UC-compatible
([#994](#994)). In this
open-source library update, we have developed and added the `databricks
labs ucx cluster-remap` command, which facilitates the remapping of
legacy cluster configurations to UC-compatible ones. This new CLI
command comes with user documentation to guide the cluster remapping
process. Additionally, we have expanded the functionality of creating
and managing UC external catalogs and schemas with the inclusion of
`create-catalogs-schemas` and `revert-cluster-remap` commands. This
change does not modify existing commands or workflows and does not
introduce new tables. The `databricks labs ucx cluster-remap` command
allows users to re-map and revert the re-mapping of clusters from Unity
Catalog (UC) using the CLI, ensuring compatibility and streamlining the
migration process. The new command and associated functions have been
manually tested for functionality.
* Added `migrate-tables` workflow
([#1051](#1051)). The
`migrate-tables` workflow has been added, which allows for more
fine-grained control over the resources allocated to the workspace. This
workflow includes two new instance variables `min_workers` and
`max_workers` in the `WorkspaceConfig` class, with default values of 1
and 10 respectively. A new `trigger` function has also been introduced,
which initializes a configuration, SQL backend, and WorkspaceClient
based on the provided configuration file. The `run_task` function has
been added, which looks up the specified task, logs relevant
information, and runs the task's function with the provided arguments.
The `Task` class's `fn` attribute now includes an `Installation` object
as a parameter. Additionally, a new `migrate-tables` workflow has been
added for migrating tables from the Hive Metastore to the Unity Catalog,
along with new classes and methods for table mapping, migration status
refreshing, and migrating tables. The `migrate_dbfs_root_delta_tables`
and `migrate_external_tables_sync` methods perform migrations for Delta
tables located in the DBFS root and synchronize external tables,
respectively. These functions use the workspace client to access the
catalogs and ensure proper migration. Integration tests have also been
added for these new methods to ensure their correct operation.
* Added handling for `SYNC` command failures
([#1073](#1073)). This pull
request introduces changes to improve handling of `SYNC` command
failures during external table migrations in the Hive metastore.
Previously, the `SYNC` command's result was not checked, and failures
were not logged. Now, the `_migrate_external_table` method in
`table_migrate.py` fetches the result of the `SYNC` command execution,
logs a warning message for failures, and returns `False` if the command
fails. A new integration test has been added to simulate a failed `SYNC`
command due to a non-existent catalog and schema, ensuring the migration
tool handles such failures. A new test case has also been added to
verify the handling of `SYNC` command failures during external table
migrations, using a mock backend to simulate failures and checking for
appropriate log messages. These changes enhance the reliability and
robustness of the migration process, providing clearer error diagnosis
and handling for potential `SYNC` command failures.
* Added initial version of `databricks labs ucx migrate-local-code`
command ([#1067](#1067)). A
new `databricks labs ucx migrate-local-code` command has been added to
facilitate migration of local code to a Databricks environment,
specifically targeting Python and SQL files. This initial version is
experimental and aims to help users and administrators manage code
migration, maintain consistency across workspaces, and enhance
compatibility with the Unity Catalog, a component of Databricks' data
and AI offerings. The command introduces a new `Files` class for
applying migrations to code files, considering their language. It also
updates the `.gitignore` file and the pyproject.toml file to ensure
appropriate version control management. Additionally, new classes and
methods have been added to support code analysis, transformation, and
linting for various programming languages. These improvements will aid
in streamlining the migration process and ensuring compatibility with
Databricks' environment.
* Added instance pool to cluster policy
([#1078](#1078)). A new
field, `instance_pool_id`, has been added to the cluster policy
configuration in `policy.py`, allowing users to specify the ID of an
instance pool to be applied to all workflow clusters in the policy. This
ID can be manually set or automatically retrieved by the system. A new
private method, `_get_instance_pool_id()`, has been added to handle the
retrieval of the instance pool ID. Additionally, a new test for table
migration jobs has been added to `test_installation.py` to ensure the
migration job is correctly configured with the specified parallelism,
minimum and maximum number of workers, and instance pool ID. A new test
case for creating a cluster policy with an instance pool has also been
added to `tests/unit/installer/test_policy.py` to ensure the instance
pool is added to the cluster policy during creation. These changes
provide users with more control over instance pools and cluster
policies, and improve the overall functionality of the library.
* Fixed `ucx move` logic for `MANAGED` & `EXTERNAL` tables
([#1062](#1062)). The `ucx
move` command has been updated to allow for the movement of UC
tables/views after the table upgrade process, providing flexibility in
managing catalog structure. The command now supports moving multiple
tables simultaneously, dropping managed tables/views upon confirmation,
and deep-cloning managed tables while dropping and recreating external
tables. A refactoring of the `TableMove` class has improved code
organization and readability, and the associated unit tests have been
updated to reflect these changes. This feature is targeted towards
developers and administrators seeking to adjust their catalog structure
after table upgrades, with the added ability to manage exceptional
conditions gracefully.
* Fixed integration testing with random product names
([#1074](#1074)). In the
recent update, the `trigger` function in the `tasks.py` module of the
`ucx` framework has undergone modification to incorporate a new
argument, `install_folder`, within the `Installation` object. This
object is now generated locally within the `trigger` function and
subsequently passed to the `run_task` function. The `install_folder` is
determined by obtaining the parent directory of the `config_path`
variable, transforming it into a POSIX-style path, and eliminating the
leading "/Workspace" prefix. This enhancement guarantees that the
`run_task` function acquires the correct installation folder for the
`ucx` framework, thereby improving the overall functionality and
precision of the framework. Furthermore, the `Installation.current`
method has been supplanted with the newly formed `Installation` object,
which now encompasses the `install_folder` argument.
* Refactor installer to separate workflows methods from the installer
class ([#1055](#1055)). In
this release, the installer in the `cli.py` file has been refactored to
improve modularity and maintainability. The installation and workflow
functionalities have been separated by importing a new class called
`WorkflowsInstallation` from `databricks.labs.ucx.installer.workflows`.
The `WorkspaceInstallation` class is no longer used in various
functions, and the new `WorkflowsInstallation` class is used instead.
Additionally, a new mixin class called `InstallationMixin` has been
introduced, which includes methods for uninstalling UCX, removing jobs,
and validating installation steps. The `WorkflowsInstallation` class now
inherits from this mixin class. A new file, `workflows.py`, has been
added to the `databricks/labs/ucx/installer` directory, which contains
methods for managing Databricks jobs. The new `WorkflowsInstallation`
class is responsible for deploying workflows, uploading wheels to DBFS
or WSFS, and creating debug notebooks. The refactoring also includes the
addition of new methods for handling specific workflows, such as
`run_workflow`, `validate_step`, and `repair_run`, which are now
contained in the `WorkflowsInstallation` class. The `test_install.py`
file in the `tests/unit` directory has also been updated to include new
imports and test functions to accommodate these changes.
* Skip unsupported locations while migrating to external location in
Azure ([#1066](#1066)). In
this release, we have updated the functionality of migrating to an
external location in Azure. A new private method
`_filter_unsupported_location` has been added to the `locations.py`
file, which checks if the location URLs are supported and removes the
unsupported ones from the list. Only locations starting with "abfss://"
are considered supported. Unsupported locations are logged with a
warning message. Additionally, a new test
`test_skip_unsupported_location` has been introduced to verify that the
`location_migration` function correctly skips unsupported locations
during migration to external locations in Azure. The test checks if the
correct log messages are generated for skipped unsupported locations,
and it mocks various scenarios such as crawled HMS external locations,
storage credentials, UC external locations, and installation with
permission mapping. The mock crawled HMS external locations contain two
unsupported locations: `adl://` and `wasbs://`. This ensures that the
function handles unsupported locations correctly, avoiding any
unnecessary errors or exceptions during migration.
* Triggering Assessment Workflow from Installer based on User Prompt
([#1007](#1007)). A new
functionality has been added to the installer that allows users to
trigger an assessment workflow based on a prompt during the installation
process. The `_trigger_workflow` method has been implemented, which can
be initiated with a step string argument. This method retrieves the job
ID for the specified step from the `_state.jobs` dictionary, generates
the job URL, and triggers the job using the `run_now` method from the
`jobs` class of the Workspace object. Users will be asked to confirm
triggering the assessment workflow and will have the option to open the
job URL in a web browser after triggering it. A new unit test,
`test_triggering_assessment_wf`, has been introduced to the
`test_install.py` file to verify the functionality of triggering an
assessment workflow based on user prompt. This test uses existing
classes and functions, such as `MockBackend`, `MockPrompts`,
`WorkspaceConfig`, and `WorkspaceInstallation`, to run the
`WorkspaceInstallation.run` method with a mocked `WorkspaceConfig`
object and a mock installation. The test also includes a user prompt to
confirm triggering the assessment job and opening the assessment job
URL. The new functionality and test improve the installation process by
enabling users to easily trigger the assessment workflow based on their
specific needs.
* Updated README.md for Service Principal Installation Limit
([#1076](#1076)). This
release includes an update to the README.md file to clarify that
installing UCX with a Service Principal is not supported. Previously,
the file indicated that Databricks Workspace Administrator privileges
were required for the user running the installation, but did not
explicitly state that Service Principal installation is not supported.
The updated text now includes this information, ensuring that users have
a clear understanding of the requirements and limitations of the
installation process. The rest of the file remains unchanged and
continues to provide instructions for installing UCX, including required
software and network access. No new methods or functionality have been
added, and no existing functionality has been changed beyond the
addition of this clarification. The changes in this release have been
manually tested to ensure they are functioning as intended.
@@ -278,26 +203,6 @@ def _configure_new_installation(self) -> WorkspaceConfig:

policy_id, instance_profile, spark_conf_dict = self._policy_installer.create(inventory_database)

# Save configurable spark_conf for table migration cluster
# parallelism will not be needed if backlog is fixed in https://databricks.atlassian.net/browse/ES-975874
parallelism = self._prompts.question(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FastLee How to make these parameters configurable if we remove those prompt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants