-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend code migration progress documentation #3588
Merged
Merged
+25
−93
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
✅ 2/2 passed, 26s total Running from acceptance #8130 |
FastLee
approved these changes
Jan 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
gueniai
added a commit
that referenced
this pull request
Feb 12, 2025
* Added documentation to use Delta Live Tables migration ([#3587](#3587)). In this release, we have introduced several new features and improvements to the Unity Catalog (UC) migration process, including group migration, table migration, data reconciliation, code migration, and Delta Live Table (DLT) pipeline migration. The DLT pipeline migration process involves cloning Hive Metastore DLT pipelines to the Unity Catalog, allowing both pipelines to run independently after the cloned pipeline reaches the `RUNNING` state. The cloned pipeline will copy over all data and checkpoints upon the first update. We have also added new pipeline migration commands for the DLT pipeline migration process, including `migrate-dlt-pipelines` with options to include or exclude a comma-separated list of pipeline IDs. Additionally, this release includes documentation for using Delta Live Tables migration and provides documentation for issue [#2065](#2065). The relevant tests for this release have been manually conducted. We recommend ensuring that all name-based references in the HMS pipeline are fully qualified and that the original pipeline is not running when requesting the clone. It is also essential to update permissions, refresh the catalog, and ensure all jobs and alerts are pointing to the new UC tables and views once the code migration is complete. * Added the CLI command for migrating DLT pipelines ([#3579](#3579)). A new CLI command has been added to migrate DLT pipelines from HMS to UC using the DLT Migration API. The command allows for limiting the pipelines to be migrated based on their IDs through the use of `include-pipeline-ids` and `exclude-pipeline-ids` flags. These flags accept a comma-separated list of pipeline IDs for inclusion or exclusion. This change, linked to issue [#3107](#3107), includes a new function for adding the CLI command, and has undergone manual testing. The functionality is part of issue [#3579](#3579), and enhances the UCX tool's functionality by providing more control over the pipeline migration process. Unit tests, integration tests, and verification on the staging environment are pending. * Addressed Bug with Dashboard migration ([#3663](#3663)). The recent update addresses a bug in Dashboard migration by modifying the `_list_dashboards` method in the `dashboards.py` file. Previously, the method only listed dashboards, unable to retrieve dashboard details if they were not available in the listing. This led to an incomplete list of dashboards. The updated method now fetches dashboard details using the `_get_dashboard` method and converts them to the required format using the `Dashboard.from_sdk_redash_dashboard` method. Moreover, the `_crawl` method now skips dashboard entries with `None` IDs. In addition, the RedashDashboardCrawler class in the `assessment/test_dashboards.py` file has been updated to improve its handling of dashboards without IDs, addressing bug [#3663](#3663). The get method of the WorkspaceClient mock has been added to return a dashboard object with the given ID, and the list and get methods for WorkspaceClient mock now return SdkRedashDashboard objects with appropriate IDs, facilitating the testing of various edge cases related to dashboard retrieval and processing. These changes aim to enhance the robustness of the RedashDashboardCrawler and its handling of various dashboard configurations, specifically those without an ID. * Extend code migration progress documentation ([#3588](#3588)). In this release, we have added comprehensive guidance for migrating code to Unity Catalog in two new sections: `Code Migration` and 'Final Details'. The `Code Migration` section outlines the process, advising users to initiate the procedure by utilizing a linter to detect compatibility issues. Users can run the `assessment` and `migration-progress` dashboards or use the `lint-local-code` command to execute the linter. The `Final Details` section instructs users on remapping clusters for compatibility with Unity Catalog, using the `cluster-remap` command. To facilitate these updates, we have also revised the code migration commands reference documentation, incorporating new and modified commands for linting local code, migrating local code, migrating DB SQL dashboards, and reverting DB SQL dashboards. Each command is accompanied by usage instructions, flags, and examples to ensure a smooth and efficient code migration experience for software engineers using the UCX tool. * Fixed Skip/Unskip schema functionality ([#3567](#3567)). In this release, we have addressed issues related to skip/unskip schema functionality in our open-source library. We have made modifications to the `skip_schema` and `unskip_schema` methods in the `mapping.py` file, which are now responsible for marking and unmarking a schema in the migration process by applying a table property. Additionally, we have updated the `_get_database_in_scope_task` and `_get_table_in_scope_task` methods to parse table properties and check if the specified schema or table is marked to be skipped. We also fixed an issue related to skipping and unskipping a schema in the database by including the catalog name `hive_metastore` in the `ALTER SCHEMA` statement. The 'Skip/Unskip' functionality is now working as intended, allowing users to skip or unskip a schema using the `databricks.labs.ucx.skip` property. These changes ensure that the correct commands are executed when skipping or unskipping a schema, enhancing the overall functionality and reliability of our library. * Make `MaybeTree` the main Python AST entrypoint for constructing the syntax tree ([#3550](#3550)). The commit modifies the Python linting related code to use `MaybeTree` as the main entry point for constructing the syntax tree, replacing the use of `Tree`. The `MaybeTree` class now has class methods for constructing a syntax tree from source code and for normalizing the source code before parsing. This change enforces consistent normalization of source code before parsing and resolves issues [#3457](#3457) and [#3213](#3213). The `normalized_parse` method has been renamed to `from_source_code` to align with the naming convention used in UCX. The `walk` and `first_statement` methods have been removed from `MaybeTree` as they were repetitions from `Tree`'s methods. Unit tests have been added to ensure the correctness of these changes. The `MaybeTree` class encapsulates a parse tree that may have failed to parse, and is used to construct the syntax tree in the Python AST handling. This change enhances the consistency and reliability of the Python AST syntax tree construction process. * Make fixer diagnostic codes unique ([#3582](#3582)). In this release, the diagnostic codes for the `databricks labs ucx migrate-local-code` command have been modified to ensure that each code is unique, allowing for the correct fixer to be identified and applied during code migration and fixing. This change includes modifications to the codebase and addition of unit and integration tests. The documentation has been updated to include new diagnostic codes for migrated tables specific to Python and SQL, enabling users to better understand the issues and take appropriate action. As a software engineer, it is recommended to review the code changes, new diagnostic codes, and added tests to ensure they cover all necessary scenarios and work as expected. This will improve the functionality of the code migration process and enhance user experience. * Removed the linting false positive for missing table format warning when using `spark.table` ([#3589](#3589)). In this change, the linting logic related to missing table format warnings when using `spark.table` has been modified to remove a false positive issue ([#3545](#3545)). A new linter class, `DBRv8d0PyLinter`, has been implemented to detect backwards incompatible changes in DBR version 8.0 for table-creation with implicit format. The linter checks for correct usage of `writeTo` and `saveAsTable` methods from the PySpark SQL DataFrameWriter API, ensuring the right number of arguments and format specifications are provided. Additionally, unit tests have been modified to reflect these changes, addressing a specific case triggering false positives and ensuring accurate identification of true positive cases for missing table format warnings. * Removed tree from `PythonSequentialLinter` ([#3535](#3535)). In this release, we have made significant changes to the code linting process in our open-source library. We have removed the tree manipulation logic from `PythonSequentialLinter` and introduced changes to `NotebookLinter`. The new `Advice` dataclass has been added to represent code-related advice, and `NotebookLinter` now handles early failure during code resolution and attaches subsequent cells and `%run` notebook trees as child trees. These changes improve the overall design and functionality of the code linting process, ensuring better handling of tree manipulation within the notebook linter. Additionally, unit tests and the `databricks labs ucx lint-local-code` command have been updated. * Updated sqlglot requirement from <26.3,>=25.5.0 to >=25.5.0,<26.4 ([#3572](#3572)). In this pull request, we have updated the `sqlglot` dependency requirement in the 'pyproject.toml' file from the range of >=25.5.0 and <26.3 to a new range of >=25.5.0 and <26.4. This change allows us to use the latest version of `sqlglot` while ensuring that it does not exceed the specified range. A reference to the `sqlglot` changelog has been provided in the pull request, and the commit message includes details of the update along with a list of commits and changelog entries. This update ensures that our project utilizes the latest `sqlglot` features and improvements while maintaining compatibility with our codebase. Dependency updates: * Updated sqlglot requirement from <26.3,>=25.5.0 to >=25.5.0,<26.4 ([#3572](#3572)).
gueniai
added a commit
that referenced
this pull request
Feb 12, 2025
* Added documentation to use Delta Live Tables migration ([#3587](#3587)). In this release, we have extended the migration process for Delta Live Tables (DLT) to Unity Catalog (UC) in Databricks. After table migration, users can now perform a Delta Live Table pipeline migration process, which clones Hive Metastore DLT pipelines to the Unity Catalog. The cloned pipeline will copy over all the data and checkpoints during the first update and run normally thereafter. While migration is in progress, maintenance is automatically paused for both pipelines. Users should consider several factors, such as ensuring names are fully qualified and not editing pipeline-defining notebooks during cloning. Additionally, we have added documentation for using Delta Live Tables migration, including a detailed description, usage, and a link to issue [#2065](#2065). The new pipeline migration commands, such as `migrate-dlt-pipelines`, allow developers and administrators to migrate specific Delta Live Tables pipelines, specifying a list of pipeline IDs to include or exclude, providing greater flexibility during the migration process. * Added the CLI command for migrating DLT pipelines ([#3579](#3579)). A new CLI command, "migrate-pipelines", has been added to facilitate the migration of DLT pipelines from HMS to UC using the DLT Migration API. This command includes optional flags `include-pipeline-ids` and `exclude-pipeline-ids` to specify which pipeline IDs to include or exclude from the migration. The associated functionality is implemented in the newly introduced `PipelinesMigrator` class, which takes in `include_pipeline_ids` and `exclude_pipeline_ids` as arguments. The changes also include an update to the `table_ownership_grant_loader` method, although it is unrelated to the new CLI command. While the new command has been manually tested, unit tests, integration tests, and verification on the staging environment are yet to be completed. * Addressed Bug with Dashboard migration ([#3663](#3663)). The recent update addresses a bug in the migration of Dashboards by modifying the behavior of the WorkspaceClient object in the "tests/unit/assessment/test_dashboards.py" file. The bug, which affected the RedashDashboardCrawler's ability to snapshot and manage dashboard data, was caused by the lack of an implementation for the `dashboards.get` method in the mocked `dashboards.list` method. This issue has been resolved by adding a mocked implementation of the `dashboards.get` method, which returns a dashboard object with the specified ID. Additionally, the `_crawl` method now iterates through each dashboard and filters out those with no ID, and the `_list_dashboards` method checks if the dashboard ID is None, fetches the dashboard details if it is not, and appends them to the dashboards list. This ensures that the migration process proceeds smoothly and that only dashboards with valid IDs and details are included in the final list. * Extend code migration progress documentation ([#3588](#3588)). In this documentation update, we have added two new sections, `Code Migration` and "Final Details," to provide detailed instructions for the code migration process. The `Code Migration` section outlines the recommended approach for migrating code, including using the migration-progress dashboard, migrate- commands, and setting the default catalog to Unity Catalog. Users are advised to use the linter to identify compatibility issues before migrating the code. The `Final Details` section includes instructions for running the cluster-remap command to remap clusters to be UC compatible. Additionally, we have updated the link for code migration to the new section and renumbered the steps in the migration process. The documentation for UCX commands has been updated to provide more detailed information about code migration, including starting the migration process with linting, using dashboards, and employing local commands. A new section for finalizing the migration has been added, and the commands `lint-local-code`, `migrate-local-code`, `migrate-dbsql-dashboards`, and `revert-dbsql-dashboards` have been added to facilitate the code migration process. This update resolves issue [#2231](#2231) and includes relevant user documentation. * Fixed Skip/Unskip schema functionality ([#3567](#3567)). In this release, we have addressed issue [#3494](#3494) related to the Skip/Unskip schema functionality in the Hive Metastore. The commit with ID [#3567](#3567) modifies the `mapping.py` and `test_mapping.py` files to ensure that schemas are skipped and unskipped correctly during the migration process. In the `skip_schema` method, the schema is now explicitly set to be skipped by applying the `UCX_SKIP_PROPERTY` table property to the `hive_metastore` schema. The `unskip_schema` method removes the `UCX_SKIP_PROPERTY` table property from the `hive_metastore` schema if it exists. The `_get_database_in_scope_task` and `_get_table_in_scope_task` methods have also been updated to correctly check the `UCX_SKIP_PROPERTY` value when determining whether to include a database or table in the migration process. Additionally, tests for the CLI have been updated to ensure that the `ALTER SCHEMA` command works correctly with fully qualified schema names. The `test_skip_with_schema` function now includes the catalog and schema names for the `ALTER SCHEMA` command, while the `test_unskip_with_schema` function uses `SET DBPROPERTIES` instead of `UNSET DBPROPERTIES IF EXISTS`. These changes improve the reliability and predictability of the Skip/Unskip schema functionality in the Hive Metastore. * Make `MaybeTree` the main Python AST entrypoint for constructing the syntax tree ([#3550](#3550)). In this commit, the `MaybeTree` class has become the main entrypoint for constructing the Python AST syntax tree, replacing the previous usage of the `Tree` class. This change enforces normalization of source code before parsing and resolves issues [#3457](#3457) and [#3213](#3213). Relevant class methods have been moved from `Tree` to `MaybeTree`, which now includes a `from_source_code` method for normalizing and parsing source code. The `walk` and `first_statement` methods have been removed from `MaybeTree` as they were repetitions from `Tree`'s methods. Python linting related code has been modified, and unit tests have been added to ensure proper functionality of the new `MaybeTree` class and its methods. * Make fixer diagnostic codes unique ([#3582](#3582)). In this release, fixer diagnostic codes have been made unique for the `databricks labs ucx migrate-local-code` command to ensure the correct fixer is used for code migration and fixing. This change impacts the command by adding new diagnostic codes for migrated tables, specifically for Python, SQL, and SQL-related migrations, and updating the existing command to use these new codes. Additionally, unit tests and integration tests have been added and modified to cover the new functionality, and manual testing has been performed to ensure correct behavior. * Removed the linting false positive for missing table format warning when using `spark.table` ([#3589](#3589)). This change resolves a linting false positive issue related to missing table format warnings when using the `spark.table` command, specifically addressing a backwards incompatible change in DBR version 8.0 where table-creation with implicit format is no longer supported. The linter has been updated to include more precise match for method names and unit tests have been modified accordingly. This change ensures that the linter correctly identifies the absence of the table format warning in specific scenarios involving `spark.table`, without affecting the core functionalities of the system. * Removed tree from `PythonSequentialLinter` ([#3535](#3535)). In this release, the tree manipulation functionality has been removed from the `PythonSequentialLinter` and added to the `NotebookLinter`. The `PythonSequentialLinter` now focuses solely on sequential linting, while the `NotebookLinter` handles tree manipulation and management for notebooks. This includes early failure on code resolution failure, attachment of `%run` notebook trees as child trees to the calling cell, and modifications to the existing `databricks labs ucx lint-local-code` command. Additionally, new type hints, a `dataclass` named `Advice`, and new and modified unit tests have been included. These changes improve the separation of concerns between linters and tree manipulation, making the codebase easier to understand and maintain. * Updated sqlglot requirement from <26.3,>=25.5.0 to >=25.5.0,<26.4 ([#3572](#3572)). In this release, we have updated the required version range of the sqlglot library in the pyproject.toml file. The new version range (greater than or equal to 25.5.0 and less than 26.4) allows for the installation of the latest version of sqlglot while avoiding potential compatibility issues that may arise with versions 26.4 and above. This update includes several bug fixes and improvements, such as changes to how single VALUES clauses in CTEs are expanded and how LEVEL columns in CONNECT BY queries are treated. The API documentation and CHANGELOG.md file for various versions of sqlglot have also been updated as part of this pull request. We recommend all users to update to this version to take advantage of the latest features and improvements. Dependency updates: * Updated sqlglot requirement from <26.3,>=25.5.0 to >=25.5.0,<26.4 ([#3572](#3572)).
Merged
gueniai
added a commit
that referenced
this pull request
Feb 25, 2025
* Added documentation to use Delta Live Tables migration ([#3587](#3587)). In this documentation update, we introduce a new section for migrating Delta Live Table pipelines to the Unity Catalog as part of the migration process. This workflow allows for the original and cloned pipelines to run independently after the cloned pipeline reaches the `RUNNING` state. The update includes an example of stopping and renaming an existing HMS DLT pipeline, and creating a new cloned pipeline. Additionally, known issues and limitations are outlined, such as supported streaming sources, maintenance pausing, and querying by timestamp. To streamline the migration process, the `migrate-dlt-pipelines` command is introduced with optional parameters for including or excluding specific pipeline IDs. This feature is intended for developers and administrators managing data pipelines and handling table aliasing issues. Relevant user documentation has been added and the changes have been manually tested. * Added support for MSSQL and POSTGRESQL to HMS Federation ([#3701](#3701)). In this enhancement, the open-source library now supports Microsoft SQL Server (MSSQL) and PostgreSQL databases in the Hive Metastore Federation (HMS Federation) feature. This update introduces classes for handling external Hive Metastore instances and their versions, and refactors a regex pattern for better support of various JDBC URL formats. A new `supported_databases_port` class variable is added to map supported databases to default ports, allowing the code to handle SQL Server's distinct default port. Additionally, a `supported_hms_versions` class variable is created, outlining supported Hive Metastore versions. The `_external_hms` method is updated to extract HMS version information more accurately, and the `_split_jdbc_url` method is refactored for better URL format compatibility and parameter extraction. The test file `test_federation.py` has been updated with new unit tests for external catalog creation with MSSQL and PostgreSQL, further enhancing compatibility with various databases and expanding HMS Federation's capabilities. * Added the CLI command for migrating DLT pipelines ([#3579](#3579)). A new CLI command, "migrate-dlt-pipelines," has been added for migrating DLT pipelines from HMS to UC using the DLT Migration API. This command allows users to include or exclude specific pipeline IDs during migration using the `--include-pipeline-ids` and `--exclude-pipeline-ids` flags, respectively. The change impacts the `PipelinesMigrator` class, which has been updated to accept and use these new parameters. Currently, there is no information available about testing, but the changes are expected to be manually tested and accompanied by corresponding unit and integration tests in the future. The changes are isolated to the `PipelinesMigrator` class and related functionality, with no impact on existing methods or functionality. * Addressed Bug with Dashboard migration ([#3663](#3663)). In this release, the `_crawl` method in `dashboards.py` has been enhanced to exclude SDK dashboards that lack IDs during the dashboard migration process. This modification enhances migration efficiency by avoiding unnecessary processing of incomplete dashboards. Additionally, the `_list_dashboards` method now includes a check for dashboards with no IDs while iterating through the `dashboards_iterator`. If a dashboard with no ID is found, the method fetches the dashboard details using the `_get_dashboard` method and adds them to the `dashboards` list, ensuring proper processing. Furthermore, a bug fix for issue [#3663](#3663) has been implemented in the `RedashDashboardCrawler` class in `assessment/test_dashboards.py`. The `get` method has been added as a side effect to the `WorkspaceClient` mock's `dashboards` attribute, enabling the retrieval of individual dashboard objects by their IDs. This modification ensures that the `RedashDashboardCrawler` can correctly retrieve and process dashboard objects from the `WorkspaceClient` mock, preventing errors due to missing dashboard objects. * Broaden safe read text caught exception scope ([#3705](#3705)). In this release, the `safe_read_text` function has been enhanced to handle a broader range of exceptions that may occur while reading a text file, including `OSError` and `UnicodeError`, making it more robust and safe. The function previously caught specific exceptions such as `FileNotFoundError`, `UnicodeDecodeError`, and `PermissionError`. Additionally, the codebase has been improved with updated unit tests, ensuring that the new functionality works correctly. The linting parts of the code have also been updated, enhancing the readability and maintainability of the project for other software engineers. A new method, `safe_read_text`, has been added to the `source_code` module, with several new test cases designed to ensure that the method handles edge cases correctly, such as when the file does not exist, when the path is a directory, or when an OSError occurs. These changes make the open-source library more reliable and robust for various use cases. * Case sensitive/insensitive table validation ([#3580](#3580)). In this release, the library has been updated to enable more flexible and customizable metadata comparison for tables. A case sensitive flag has been introduced for metadata comparison, which allows for consideration or ignoring of column name case during validation. The `TableMetadataRetriever` abstract base class now includes a new parameter `column_name_transformer` in the `get_metadata` method, which is a callable that can be used to transform column names as needed for comparison. Additionally, a new `case_sensitive` parameter has been added to the `StandardSchemaComparator` constructor to determine whether column names should be compared case sensitively or not. A new parametrized test function `test_schema_comparison_case` has also been included to ensure that this functionality works as expected. These changes provide users with more control over the metadata comparison process and improve the library's handling of cases where column names in the source and target tables may have different cases. * Catch `AttributeError` in `InfferedValue._safe_infer_internal` ([#3684](#3684)). In this release, we have implemented a change to the `_safe_infer_internal` method in the `InferredValue` class to catch `AttributeError`. This change addresses an issue in the Astroid library reported in their GitHub repository (<pylint-dev/astroid#2683>) and resolves issue [#3659](#3659) in our project. By handling `AttributeError` during the inference process, we have made the code more robust and safer. When an exception occurs, an error message is logged with debug-level logging, and the method yields the `Uninferable` sentinel value to indicate that inference failed for the node. This enhancement strengthens the source code linting code through value inference in our open-source library. * Document to run `validate-groups-membership` before groups migration, not after ([#3631](#3631)). In this release, we have updated the order of executing the `validate-groups-membership` command in the group migration process. Previously, the command was recommended to be run after the groups migration, but it has been updated to be executed before the migration. This change ensures that the groups have the correct membership and the number of groups and users in the workspace and account are the same before migration, providing an extra level of safety. Additionally, we have updated the `remove-workspace-local-backup-groups` command to remove workspace-level backup groups and their permissions only after confirming the successful migration of all groups. We have also updated the spelling of the `validate-group-membership` command to `validate-groups-membership` in a documentation file. This release is aimed at software engineers who are adopting the project and looking to migrate their groups to the account level. * Extend code migration progress documentation ([#3588](#3588)). In this documentation update, we have added two new sections, `Code Migration` and "Final details," to the open-source library's migration process documentation. The `Code Migration` section provides a detailed walkthrough of the steps to migrate code after completing table migration and data reconciliation, including using the linter to investigate compatibility issues and linted workspace resources. The "[linter advices](/docs/reference/linter_codes)" provide codes and messages on detected issues and resolution methods. The migrated code can then be prioritized and tracked using the `migration-progress` dashboard, and migrated using the `migrate-` commands. The `Final details` section outlines the steps to take once code migration is complete, including running the `cluster-remap` command to remap clusters to be Unity Catalog compatible. This update resolves issue [#2231](#2231) and includes updated user documentation, with new methods for linting and migrating local code, managing dashboard migrations, and syncing workspace information. Additional commands for creating and validating table mappings, migrating locations, and assigning metastores are also included, with the aim of improving the code migration process by providing more detailed documentation and new commands for managing the migration. * Fixed Skip/Unskip schema functionality ([#3567](#3567)). In this release, we have addressed the improper handling of skip/unskip schema functionality in our open-source library. The `skip_schema` and `unskip_schema` methods in the `mapping.py` file have been updated to include the `hive_metastore` schema prefix while setting or unsetting the database property that determines whether a schema should be skipped. Additionally, the `_get_database_in_scope_task` and `_get_table_in_scope_task` methods have been modified to parse table properties as a dictionary, allowing for more straightforward lookup of the skip property for a table. The `test_skip_with_schema` and `test_unskip_with_schema` methods in the `tests/unit/test_cli.py` file have also been updated. The `test_skip_with_schema` method now includes the catalog name `hive_metastore` in the `ALTER SCHEMA` statement, ensuring that the schema is properly skipped. The `test_unskip_with_schema` method has been modified to use the `SET DBPROPERTIES` statement to set the value of the `databricks.labs.ucx.skip` property to `false`, effectively unskipping the schema. Furthermore, the `execute` method in the `sbe` module and the queries in the `mock_backend` module have been updated to match the new commands. These changes address the issue of improperly skipping schemas and ensure that the code functions as intended, allowing users to skip and unskip schemas as needed. Overall, these modifications improve the reliability and correctness of the skip/unskip schema functionality, ensuring that it behaves as expected in different scenarios. * Fixed `Total Tables` widget in assessment to only show table counts ([#3738](#3738)). In this release, we have addressed the issue with the `Total Tables` widget in the assessment dashboard as part of resolving [#3738](#3738) and in relation to [#3252](#3252). The revised `00_3_count_total_tables.sql` query in the `src/databricks/labs/ucx/queries/assessment/main/` directory now includes a WHERE clause to filter out views from the table count query. By excluding views and only displaying table counts in the `Total Tables` widget, the scope of changes is limited to the SQL query itself. The diff reflects the addition of the WHERE clause and necessary indentation. The commit has been manually tested as part of our quality assurance process, and the successful test results are documented in the `Tests` section of the commit message. * Fixed broken anchor for doc release ([#3720](#3720)). In this release, we have developed and implemented fixes to address issues with the Databricks workflows documentation used in the migration process. The previous version contained a broken anchor reference for the workflow process, which has now been corrected. This improvement includes the addition of a manual test to verify the fix. The revised documentation enables users to view the status of deployed workflows and rerun failed workflows using the `workflows` and `repair-run` commands, respectively. These updates simplify the management and troubleshooting of workflows, enhancing the overall user experience. * Fixed broken anchors in documentation ([#3712](#3712)). In this release, we have made significant improvements to the UCX process documentation, addressing issues related to broken anchors, outdated command names, and syntax. The commands `enable_hms_federation` and `create_federated_catalog` have been renamed to `enable-hms-federation` and `create-federated-catalog`, respectively. These updates include corresponding changes to the command syntax and have been manually tested to ensure accuracy. Additionally, we have added a new command, `validate-groups-membership`, which can be executed prior to the group migration workflow for added confidence. In case of no matching account group in the UCX-installed workspace, the `create-account-groups` command is now available. This release also includes updates to the section titles and links to enhance clarity and reflect current functionality. * Fixed notebook sources with `NotebookLinter.apply` ([#3693](#3693)). A new `Github.py` file has been added to the `databricks/labs/ucx/` directory, providing functionality for working with GitHub issues. It includes an `IssueType` enum, a `construct_new_issue_url` function, and constants for constructing URLs to the documentation and GitHub repository. The `NotebookLinter` class has been updated to include notebook fixing functionality, and the `PythonLinter` class has been introduced to run `apply` on an Abstract Syntax Tree (AST) tree. The `Notebook.apply` method has been implemented to apply changes to notebook sources and the legacy `NotebookMigrator` has been removed. These changes also include various unit and integration tests and modifications to the existing `databricks labs ucx migrate-local-code` command. The `DOCS_URL` method has been added to the `databricks.labs.ucx.github` module, and the error message for external metastore connectivity issues now includes a link to the UCX installation instruction in the documentation. * Fixed the broken documentation links in dashboards ([#3726](#3726)). This revision updates documentation links in various dashboards to correct broken links and enhance the user experience. Specifically, it addresses issues [#3725](#3725) and [#3726](#3726) by updating links in the "Assessment Overview," "Assessment Summary," and `Compute summary` dashboards, as well as the `group migration` and `table upgrade` documentation. The changes include replacing local Markdown file links with online documentation links and updating links to point to the correct documentation sections in the UCX GitHub repository. Although the changes have been manually tested, no unit or integration tests have been added, and staging environment verification has not been performed. Despite this, the revisions ensure accurate and up-to-date documentation links, improving the usability of the dashboards. * Force `MaybeDependency` to have a `Dependency` OR `list[Problem]`, not neither nor both ([#3635](#3635)). This commit enforces the `MaybeDependency` object to have either a `Dependency` or a `list[Problem]`, but not neither or both, in order to handle known libraries during import registration. It resolves issue [#3585](#3585), breaks up issue [#3626](#3626), and progresses issue [#1527](#1527), while modifying code linting logic and updating unit tests to accommodate these changes. Specifically, new classes like `KnownLoader`, `KnownDependency`, and `KnownProblem` have been introduced, and the `_resolve_allow_list` method has been updated to reflect the new enforcement. Additionally, tests have been added and modified to ensure the correct behavior of the modified logic, with a focus on handling directories, resolving children in context, and detecting known problems in imported libraries. * HMS Federation Documentation ([#3688](#3688)). The HMS Federation feature allows Hive Metastore (HMS) to be federated to a catalog, acting as a step towards migrating to Unity Catalog or as a hybrid solution where both HMS and UC access to the data is required. This feature provides an alternative to the table migration process, eliminating the need for table mapping, creating catalogs and schemas, and migrating Hive metastore data objects. The `enable_hms_federation` command enables the Hive Metastore federation process, while the `create_federated_catalog` command creates a UC catalog that mirrors all the schemas and tables in the source Hive Metastore. The `migrate-glue-credentials` command, which is AWS-only, creates a UC Service Credential for GLUE. These new commands are documented in the HMS Federation Documentation section and are now part of the migration process documentation with the data reconciliation step following it. To enable HMS Federation, use the `enable-hms-federation` and `create-federated-catalog` commands. * Make `MaybeTree` the main Python AST entrypoint for constructing the syntax tree ([#3550](#3550)). In this release, the main entry point for constructing the Python AST syntax tree has been changed from `Tree` to `MaybeTree` in the open-source library. This change involves moving class methods and static methods that construct a `MaybeTree` from the `Tree` class to the `MaybeTree` class, and making the class method that normalizes the source code before parsing the only entry point. The `normalized_parse` method has been renamed to `from_source_code` to match the commonly used naming for class methods within UCX. The `walk` and `first_statement` methods have been removed from `MaybeTree` as they were repetitions from `Tree`'s methods. These changes aim to enforce normalization and improve code consistency. Additionally, unit tests have been added and the Python linting related code has been modified to work with the new `MaybeTree` class. This change resolves issues [#3457](#3457) and [#3213](#3213). * Make fixer diagnostic codes unique ([#3582](#3582)). This commit modifies the `databricks labs ucx migrate-local-code` command to make fixer diagnostic codes unique, ensuring accurate code migration and fixing. Two new methods have been added for modifying and adding unit and integration tests. Diagnostic codes for the `table-migrated-to-uc` issue are now unique depending on the context where the table is referenced: SQL, Python, or Python-SQL. This ensures the appropriate fixer is applied when addressing code migration issues, improving overall functionality and user experience. Additionally, the commit updates the documentation to include the new postfixes for the `table-migrated-to-uc` linter code and their descriptions, making it clearer for developers to diagnose and resolve issues related to table migration. * Removed the linting false positive for missing table format warning when using `spark.table` ([#3589](#3589)). In this release, linting false positives related to missing table format warnings when using `spark.table` have been addressed, resolving issue [#3545](#3545). The linting logic and unit tests have been updated to handle changes in the default format for table references in Databricks Runtime 8.0, which now uses Delta as the default format. These changes improve the accuracy of the linting process, reducing unnecessary warnings and enhancing the overall developer experience. Additionally, the `test_linting_walker_populates_paths` unit test in the `test_jobs.py` file has been updated to use a different file path for testing. * Removed tree from `PythonSequentialLinter` ([#3535](#3535)). In this release, the `PythonSequentialLinter` has been refactored to no longer manipulate the code tree, and instead, the tree manipulation logic has been moved to `NotebookLinter`. This change improves the separation of concerns between the two components, resulting in a more modular and maintainable codebase. The `NotebookLinter` now handles early failure when resolving the code used by a notebook and attaches `%run` notebook trees as a child tree to the cell that calls the notebook. The code linting functionality has been modified, and the `databricks labs ucx lint-local-code` command has been updated. These changes resolve [#3543](#3543) and progress [#3514](#3514) and are dependent on PRs [#3529](#3529) and [#3550](#3550). The changes have been manually tested and include added and modified unit tests. Additionally, the `Advice` class has been updated to include a type variable `T`, which allows for more specific type hinting when creating instances of the class and its subclasses. * Rename file language helper function ([#3661](#3661)). In this code change, the helper function for determining the file language and checking its support by the linter has been renamed and refactored. The function, previously called `file_language`, has been updated and now named `infer_file_language_if_supported`. This change clarifies the function's purpose as it not only infers the file language but also checks if the file is supported by the linter, acting as a filter. The function returns a `Language` object if the file is supported or `None` if it is not. The `infer_file_language_if_supported` function has been used in other parts of the codebase, such as the `is_a_notebook` function. This change improves the codebase's readability and maintainability by making the helper function's purpose more explicit. The related code has been updated to use the new function accordingly. * Scope crawled jobs in `JobsCrawler` with `include_job_ids` ([#3658](#3658)). In this release, the `JobsCrawler` class in the `workflow_task.py` file has been updated to include a new optional parameter `include_job_ids` in the constructor. This parameter allows users to specify a list of job IDs to include in the crawling process, improving efficiency in large workspaces. Additionally, a check has been added to the `_assess_jobs` method to skip jobs whose IDs are not in the list of included IDs. Integration tests have been added to ensure the correct behavior of the new feature. This change resolves issue [#3656](#3656), which requested the ability to crawl jobs based on a specific list of job IDs. It is recommended to add a comment to the code explaining the purpose and usage of the `include_job_ids` parameter and update the documentation accordingly. * Support fixing `LocalFile`'s with `FileLinter` ([#3660](#3660)). In this release, we have added new methods `write_text`, `safe_write_text`, `back_up_path`, and `revert_back_up_path` to the `base.py` file to support fixing files in `LocalFile` containers and adding unit tests and integration tests. The `LocalFile` class in the "files.py" file has been extended to include new methods and properties, such as `apply`, `migrated_code`, `back_up_path`, and `back_up_original_and_flush_migrated_code`, enabling fixing files using linters and writing changes back to the container. The `databricks labs ucx migrate-local-code` command has also been updated to utilize the new functionality. These changes address issue [#3514](#3514), ensuring the proper handling of errors during file writing and providing automated fixing of code issues within LocalFiles. * Updated `migate-local-code` to use latest linter functionality ([#3700](#3700)). In this update, the `migrate-local-code` command has been enhanced by incorporating the latest linter functionality. The `LocalFileMigrator` and `LocalCodeLinter` classes have been merged, and the interfaces of `.fix` and `.apply` methods have been aligned. A new `FixerWalker` has been introduced to address dependencies in the dependency graph, and the existing `databricks labs ucx migrate-local-code` command has been updated accordingly. Relevant unit tests and integration tests have been added and modified to ensure the correctness of the changes, which resolve issue [#3514](#3514) and supersede issue [#3520](#3520). The `lint-local-code` command has also been updated with a flag to specify the path for linting. The `migate-local-code` command now lints local code and generates advice on how to make it compatible with the Unity Catalog, and can also apply local code fixes to make them compatible. * Updated sqlglot requirement from <26.3,>=25.5.0 to >=25.5.0,<26.4 ([#3572](#3572)). In this pull request, we have updated the requirement for the `sqlglot` library in the 'pyproject.toml' file, changing it from being greater than or equal to version 25.5.0 and less than 26.3, to being greater than or equal to version 25.5.0 and less than 26.4. This change is part of issue [#3572](#3572) and was made to allow for the use of the latest version of 'sqlglot'. The pull request includes a changelog from the `sqlglot` repository, detailing the changes made in each version between 25.5.0 and 26.4. The commits relevant to this update include bumping the version of `sqlglotrs` to various versions between 0.3.7 and 0.3.14. This pull request was automatically generated by Dependabot, a tool that creates pull requests to update the dependencies in a project. It is now ready for review and merging. * Updated sqlglot requirement from <26.4,>=25.5.0 to >=25.5.0,<26.7 ([#3677](#3677)). In this release, we have updated the `sqlglot` dependency from version `>=25.5.0,<26.4` to `>=25.5.0,<26.7`. This change allows us to leverage the latest version of `sqlglot`, which includes various bug fixes and improvements, such as avoiding redundant casts in FROM/TO_UTC_TIMESTAMP and enhancing UUID support. Although there are some breaking changes introduced in the latest version, they should not affect our project's functionality. Additionally, this update includes several bug fixes and improvements for specific dialects such as Redshift, BigQuery, and TSQL. Overall, this update enhances the performance and functionality of the `sqlglot` library, ensuring compatibility with the latest version. * Use cached property for table migration index on local checkout context ([#3711](#3711)). In this release, we introduce a new cached property, `_migration_index`, to the `LocalCheckoutContext` class, designed to store the table migration index for the local checkout context. This change aims to prevent multiple recrawling when the migration index is empty. The `linter_context_factory` method has been refactored to utilize the new `_migration_index` property, and the `CurrentSessionState` parameter is removed. Additionally, the `local_code_linter` method has been updated to leverage the new `LinterContext` instance with the `_migration_index` property, instead of using the `linter_context_factory` method. The `LocalCodeLinter` object now accepts a new callable lambda function, returning a `LinterContext` instance with the `_migration_index` property. These enhancements improve code performance by reducing the migration index crawls in the local checkout context and simplify the code by eliminating the `CurrentSessionState` parameter. * [DOCS] Explain when to run `remove-workspace-local-backup-groups` workflow ([#3707](#3707)). In this release, the UCX component of the application has been enhanced with new Databricks workflows for orchestrating the group migration process. The `workflows` command displays the status of the workflows, and the `repair-run` command allows for rerunning failed workflows. The group migration workflow is specifically designed to be executed after a successful assessment workflow, and running it is followed by an optional `remove-workspace-local-backup-groups` workflow. This final step removes unnecessary workspace-level backup groups and their associated permissions, keeping the workspace clean and organized. The `remove-workspace-local-backup-groups` workflow should only be executed after confirming the successful migration of all groups involved. Dependency updates: * Updated sqlglot requirement from <26.3,>=25.5.0 to >=25.5.0,<26.4 ([#3572](#3572)). * Updated sqlglot requirement from <26.4,>=25.5.0 to >=25.5.0,<26.7 ([#3677](#3677)).
github-merge-queue bot
pushed a commit
that referenced
this pull request
Feb 25, 2025
* Added documentation to use Delta Live Tables migration ([#3587](#3587)). In this documentation update, we introduce a new section for migrating Delta Live Table pipelines to the Unity Catalog as part of the migration process. This workflow allows for the original and cloned pipelines to run independently after the cloned pipeline reaches the `RUNNING` state. The update includes an example of stopping and renaming an existing HMS DLT pipeline, and creating a new cloned pipeline. Additionally, known issues and limitations are outlined, such as supported streaming sources, maintenance pausing, and querying by timestamp. To streamline the migration process, the `migrate-dlt-pipelines` command is introduced with optional parameters for including or excluding specific pipeline IDs. This feature is intended for developers and administrators managing data pipelines and handling table aliasing issues. Relevant user documentation has been added and the changes have been manually tested. * Added support for MSSQL and POSTGRESQL to HMS Federation ([#3701](#3701)). In this enhancement, the open-source library now supports Microsoft SQL Server (MSSQL) and PostgreSQL databases in the Hive Metastore Federation (HMS Federation) feature. This update introduces classes for handling external Hive Metastore instances and their versions, and refactors a regex pattern for better support of various JDBC URL formats. A new `supported_databases_port` class variable is added to map supported databases to default ports, allowing the code to handle SQL Server's distinct default port. Additionally, a `supported_hms_versions` class variable is created, outlining supported Hive Metastore versions. The `_external_hms` method is updated to extract HMS version information more accurately, and the `_split_jdbc_url` method is refactored for better URL format compatibility and parameter extraction. The test file `test_federation.py` has been updated with new unit tests for external catalog creation with MSSQL and PostgreSQL, further enhancing compatibility with various databases and expanding HMS Federation's capabilities. * Added the CLI command for migrating DLT pipelines ([#3579](#3579)). A new CLI command, "migrate-dlt-pipelines," has been added for migrating DLT pipelines from HMS to UC using the DLT Migration API. This command allows users to include or exclude specific pipeline IDs during migration using the `--include-pipeline-ids` and `--exclude-pipeline-ids` flags, respectively. The change impacts the `PipelinesMigrator` class, which has been updated to accept and use these new parameters. Currently, there is no information available about testing, but the changes are expected to be manually tested and accompanied by corresponding unit and integration tests in the future. The changes are isolated to the `PipelinesMigrator` class and related functionality, with no impact on existing methods or functionality. * Addressed Bug with Dashboard migration ([#3663](#3663)). In this release, the `_crawl` method in `dashboards.py` has been enhanced to exclude SDK dashboards that lack IDs during the dashboard migration process. This modification enhances migration efficiency by avoiding unnecessary processing of incomplete dashboards. Additionally, the `_list_dashboards` method now includes a check for dashboards with no IDs while iterating through the `dashboards_iterator`. If a dashboard with no ID is found, the method fetches the dashboard details using the `_get_dashboard` method and adds them to the `dashboards` list, ensuring proper processing. Furthermore, a bug fix for issue [#3663](#3663) has been implemented in the `RedashDashboardCrawler` class in `assessment/test_dashboards.py`. The `get` method has been added as a side effect to the `WorkspaceClient` mock's `dashboards` attribute, enabling the retrieval of individual dashboard objects by their IDs. This modification ensures that the `RedashDashboardCrawler` can correctly retrieve and process dashboard objects from the `WorkspaceClient` mock, preventing errors due to missing dashboard objects. * Broaden safe read text caught exception scope ([#3705](#3705)). In this release, the `safe_read_text` function has been enhanced to handle a broader range of exceptions that may occur while reading a text file, including `OSError` and `UnicodeError`, making it more robust and safe. The function previously caught specific exceptions such as `FileNotFoundError`, `UnicodeDecodeError`, and `PermissionError`. Additionally, the codebase has been improved with updated unit tests, ensuring that the new functionality works correctly. The linting parts of the code have also been updated, enhancing the readability and maintainability of the project for other software engineers. A new method, `safe_read_text`, has been added to the `source_code` module, with several new test cases designed to ensure that the method handles edge cases correctly, such as when the file does not exist, when the path is a directory, or when an OSError occurs. These changes make the open-source library more reliable and robust for various use cases. * Case sensitive/insensitive table validation ([#3580](#3580)). In this release, the library has been updated to enable more flexible and customizable metadata comparison for tables. A case sensitive flag has been introduced for metadata comparison, which allows for consideration or ignoring of column name case during validation. The `TableMetadataRetriever` abstract base class now includes a new parameter `column_name_transformer` in the `get_metadata` method, which is a callable that can be used to transform column names as needed for comparison. Additionally, a new `case_sensitive` parameter has been added to the `StandardSchemaComparator` constructor to determine whether column names should be compared case sensitively or not. A new parametrized test function `test_schema_comparison_case` has also been included to ensure that this functionality works as expected. These changes provide users with more control over the metadata comparison process and improve the library's handling of cases where column names in the source and target tables may have different cases. * Catch `AttributeError` in `InfferedValue._safe_infer_internal` ([#3684](#3684)). In this release, we have implemented a change to the `_safe_infer_internal` method in the `InferredValue` class to catch `AttributeError`. This change addresses an issue in the Astroid library reported in their GitHub repository (<pylint-dev/astroid#2683>) and resolves issue [#3659](#3659) in our project. By handling `AttributeError` during the inference process, we have made the code more robust and safer. When an exception occurs, an error message is logged with debug-level logging, and the method yields the `Uninferable` sentinel value to indicate that inference failed for the node. This enhancement strengthens the source code linting code through value inference in our open-source library. * Document to run `validate-groups-membership` before groups migration, not after ([#3631](#3631)). In this release, we have updated the order of executing the `validate-groups-membership` command in the group migration process. Previously, the command was recommended to be run after the groups migration, but it has been updated to be executed before the migration. This change ensures that the groups have the correct membership and the number of groups and users in the workspace and account are the same before migration, providing an extra level of safety. Additionally, we have updated the `remove-workspace-local-backup-groups` command to remove workspace-level backup groups and their permissions only after confirming the successful migration of all groups. We have also updated the spelling of the `validate-group-membership` command to `validate-groups-membership` in a documentation file. This release is aimed at software engineers who are adopting the project and looking to migrate their groups to the account level. * Extend code migration progress documentation ([#3588](#3588)). In this documentation update, we have added two new sections, `Code Migration` and "Final details," to the open-source library's migration process documentation. The `Code Migration` section provides a detailed walkthrough of the steps to migrate code after completing table migration and data reconciliation, including using the linter to investigate compatibility issues and linted workspace resources. The "[linter advices](/docs/reference/linter_codes)" provide codes and messages on detected issues and resolution methods. The migrated code can then be prioritized and tracked using the `migration-progress` dashboard, and migrated using the `migrate-` commands. The `Final details` section outlines the steps to take once code migration is complete, including running the `cluster-remap` command to remap clusters to be Unity Catalog compatible. This update resolves issue [#2231](#2231) and includes updated user documentation, with new methods for linting and migrating local code, managing dashboard migrations, and syncing workspace information. Additional commands for creating and validating table mappings, migrating locations, and assigning metastores are also included, with the aim of improving the code migration process by providing more detailed documentation and new commands for managing the migration. * Fixed Skip/Unskip schema functionality ([#3567](#3567)). In this release, we have addressed the improper handling of skip/unskip schema functionality in our open-source library. The `skip_schema` and `unskip_schema` methods in the `mapping.py` file have been updated to include the `hive_metastore` schema prefix while setting or unsetting the database property that determines whether a schema should be skipped. Additionally, the `_get_database_in_scope_task` and `_get_table_in_scope_task` methods have been modified to parse table properties as a dictionary, allowing for more straightforward lookup of the skip property for a table. The `test_skip_with_schema` and `test_unskip_with_schema` methods in the `tests/unit/test_cli.py` file have also been updated. The `test_skip_with_schema` method now includes the catalog name `hive_metastore` in the `ALTER SCHEMA` statement, ensuring that the schema is properly skipped. The `test_unskip_with_schema` method has been modified to use the `SET DBPROPERTIES` statement to set the value of the `databricks.labs.ucx.skip` property to `false`, effectively unskipping the schema. Furthermore, the `execute` method in the `sbe` module and the queries in the `mock_backend` module have been updated to match the new commands. These changes address the issue of improperly skipping schemas and ensure that the code functions as intended, allowing users to skip and unskip schemas as needed. Overall, these modifications improve the reliability and correctness of the skip/unskip schema functionality, ensuring that it behaves as expected in different scenarios. * Fixed `Total Tables` widget in assessment to only show table counts ([#3738](#3738)). In this release, we have addressed the issue with the `Total Tables` widget in the assessment dashboard as part of resolving [#3738](#3738) and in relation to [#3252](#3252). The revised `00_3_count_total_tables.sql` query in the `src/databricks/labs/ucx/queries/assessment/main/` directory now includes a WHERE clause to filter out views from the table count query. By excluding views and only displaying table counts in the `Total Tables` widget, the scope of changes is limited to the SQL query itself. The diff reflects the addition of the WHERE clause and necessary indentation. The commit has been manually tested as part of our quality assurance process, and the successful test results are documented in the `Tests` section of the commit message. * Fixed broken anchor for doc release ([#3720](#3720)). In this release, we have developed and implemented fixes to address issues with the Databricks workflows documentation used in the migration process. The previous version contained a broken anchor reference for the workflow process, which has now been corrected. This improvement includes the addition of a manual test to verify the fix. The revised documentation enables users to view the status of deployed workflows and rerun failed workflows using the `workflows` and `repair-run` commands, respectively. These updates simplify the management and troubleshooting of workflows, enhancing the overall user experience. * Fixed broken anchors in documentation ([#3712](#3712)). In this release, we have made significant improvements to the UCX process documentation, addressing issues related to broken anchors, outdated command names, and syntax. The commands `enable_hms_federation` and `create_federated_catalog` have been renamed to `enable-hms-federation` and `create-federated-catalog`, respectively. These updates include corresponding changes to the command syntax and have been manually tested to ensure accuracy. Additionally, we have added a new command, `validate-groups-membership`, which can be executed prior to the group migration workflow for added confidence. In case of no matching account group in the UCX-installed workspace, the `create-account-groups` command is now available. This release also includes updates to the section titles and links to enhance clarity and reflect current functionality. * Fixed notebook sources with `NotebookLinter.apply` ([#3693](#3693)). A new `Github.py` file has been added to the `databricks/labs/ucx/` directory, providing functionality for working with GitHub issues. It includes an `IssueType` enum, a `construct_new_issue_url` function, and constants for constructing URLs to the documentation and GitHub repository. The `NotebookLinter` class has been updated to include notebook fixing functionality, and the `PythonLinter` class has been introduced to run `apply` on an Abstract Syntax Tree (AST) tree. The `Notebook.apply` method has been implemented to apply changes to notebook sources and the legacy `NotebookMigrator` has been removed. These changes also include various unit and integration tests and modifications to the existing `databricks labs ucx migrate-local-code` command. The `DOCS_URL` method has been added to the `databricks.labs.ucx.github` module, and the error message for external metastore connectivity issues now includes a link to the UCX installation instruction in the documentation. * Fixed the broken documentation links in dashboards ([#3726](#3726)). This revision updates documentation links in various dashboards to correct broken links and enhance the user experience. Specifically, it addresses issues [#3725](#3725) and [#3726](#3726) by updating links in the "Assessment Overview," "Assessment Summary," and `Compute summary` dashboards, as well as the `group migration` and `table upgrade` documentation. The changes include replacing local Markdown file links with online documentation links and updating links to point to the correct documentation sections in the UCX GitHub repository. Although the changes have been manually tested, no unit or integration tests have been added, and staging environment verification has not been performed. Despite this, the revisions ensure accurate and up-to-date documentation links, improving the usability of the dashboards. * Force `MaybeDependency` to have a `Dependency` OR `list[Problem]`, not neither nor both ([#3635](#3635)). This commit enforces the `MaybeDependency` object to have either a `Dependency` or a `list[Problem]`, but not neither or both, in order to handle known libraries during import registration. It resolves issue [#3585](#3585), breaks up issue [#3626](#3626), and progresses issue [#1527](#1527), while modifying code linting logic and updating unit tests to accommodate these changes. Specifically, new classes like `KnownLoader`, `KnownDependency`, and `KnownProblem` have been introduced, and the `_resolve_allow_list` method has been updated to reflect the new enforcement. Additionally, tests have been added and modified to ensure the correct behavior of the modified logic, with a focus on handling directories, resolving children in context, and detecting known problems in imported libraries. * HMS Federation Documentation ([#3688](#3688)). The HMS Federation feature allows Hive Metastore (HMS) to be federated to a catalog, acting as a step towards migrating to Unity Catalog or as a hybrid solution where both HMS and UC access to the data is required. This feature provides an alternative to the table migration process, eliminating the need for table mapping, creating catalogs and schemas, and migrating Hive metastore data objects. The `enable_hms_federation` command enables the Hive Metastore federation process, while the `create_federated_catalog` command creates a UC catalog that mirrors all the schemas and tables in the source Hive Metastore. The `migrate-glue-credentials` command, which is AWS-only, creates a UC Service Credential for GLUE. These new commands are documented in the HMS Federation Documentation section and are now part of the migration process documentation with the data reconciliation step following it. To enable HMS Federation, use the `enable-hms-federation` and `create-federated-catalog` commands. * Make `MaybeTree` the main Python AST entrypoint for constructing the syntax tree ([#3550](#3550)). In this release, the main entry point for constructing the Python AST syntax tree has been changed from `Tree` to `MaybeTree` in the open-source library. This change involves moving class methods and static methods that construct a `MaybeTree` from the `Tree` class to the `MaybeTree` class, and making the class method that normalizes the source code before parsing the only entry point. The `normalized_parse` method has been renamed to `from_source_code` to match the commonly used naming for class methods within UCX. The `walk` and `first_statement` methods have been removed from `MaybeTree` as they were repetitions from `Tree`'s methods. These changes aim to enforce normalization and improve code consistency. Additionally, unit tests have been added and the Python linting related code has been modified to work with the new `MaybeTree` class. This change resolves issues [#3457](#3457) and [#3213](#3213). * Make fixer diagnostic codes unique ([#3582](#3582)). This commit modifies the `databricks labs ucx migrate-local-code` command to make fixer diagnostic codes unique, ensuring accurate code migration and fixing. Two new methods have been added for modifying and adding unit and integration tests. Diagnostic codes for the `table-migrated-to-uc` issue are now unique depending on the context where the table is referenced: SQL, Python, or Python-SQL. This ensures the appropriate fixer is applied when addressing code migration issues, improving overall functionality and user experience. Additionally, the commit updates the documentation to include the new postfixes for the `table-migrated-to-uc` linter code and their descriptions, making it clearer for developers to diagnose and resolve issues related to table migration. * Removed the linting false positive for missing table format warning when using `spark.table` ([#3589](#3589)). In this release, linting false positives related to missing table format warnings when using `spark.table` have been addressed, resolving issue [#3545](#3545). The linting logic and unit tests have been updated to handle changes in the default format for table references in Databricks Runtime 8.0, which now uses Delta as the default format. These changes improve the accuracy of the linting process, reducing unnecessary warnings and enhancing the overall developer experience. Additionally, the `test_linting_walker_populates_paths` unit test in the `test_jobs.py` file has been updated to use a different file path for testing. * Removed tree from `PythonSequentialLinter` ([#3535](#3535)). In this release, the `PythonSequentialLinter` has been refactored to no longer manipulate the code tree, and instead, the tree manipulation logic has been moved to `NotebookLinter`. This change improves the separation of concerns between the two components, resulting in a more modular and maintainable codebase. The `NotebookLinter` now handles early failure when resolving the code used by a notebook and attaches `%run` notebook trees as a child tree to the cell that calls the notebook. The code linting functionality has been modified, and the `databricks labs ucx lint-local-code` command has been updated. These changes resolve [#3543](#3543) and progress [#3514](#3514) and are dependent on PRs [#3529](#3529) and [#3550](#3550). The changes have been manually tested and include added and modified unit tests. Additionally, the `Advice` class has been updated to include a type variable `T`, which allows for more specific type hinting when creating instances of the class and its subclasses. * Rename file language helper function ([#3661](#3661)). In this code change, the helper function for determining the file language and checking its support by the linter has been renamed and refactored. The function, previously called `file_language`, has been updated and now named `infer_file_language_if_supported`. This change clarifies the function's purpose as it not only infers the file language but also checks if the file is supported by the linter, acting as a filter. The function returns a `Language` object if the file is supported or `None` if it is not. The `infer_file_language_if_supported` function has been used in other parts of the codebase, such as the `is_a_notebook` function. This change improves the codebase's readability and maintainability by making the helper function's purpose more explicit. The related code has been updated to use the new function accordingly. * Scope crawled jobs in `JobsCrawler` with `include_job_ids` ([#3658](#3658)). In this release, the `JobsCrawler` class in the `workflow_task.py` file has been updated to include a new optional parameter `include_job_ids` in the constructor. This parameter allows users to specify a list of job IDs to include in the crawling process, improving efficiency in large workspaces. Additionally, a check has been added to the `_assess_jobs` method to skip jobs whose IDs are not in the list of included IDs. Integration tests have been added to ensure the correct behavior of the new feature. This change resolves issue [#3656](#3656), which requested the ability to crawl jobs based on a specific list of job IDs. It is recommended to add a comment to the code explaining the purpose and usage of the `include_job_ids` parameter and update the documentation accordingly. * Support fixing `LocalFile`'s with `FileLinter` ([#3660](#3660)). In this release, we have added new methods `write_text`, `safe_write_text`, `back_up_path`, and `revert_back_up_path` to the `base.py` file to support fixing files in `LocalFile` containers and adding unit tests and integration tests. The `LocalFile` class in the "files.py" file has been extended to include new methods and properties, such as `apply`, `migrated_code`, `back_up_path`, and `back_up_original_and_flush_migrated_code`, enabling fixing files using linters and writing changes back to the container. The `databricks labs ucx migrate-local-code` command has also been updated to utilize the new functionality. These changes address issue [#3514](#3514), ensuring the proper handling of errors during file writing and providing automated fixing of code issues within LocalFiles. * Updated `migate-local-code` to use latest linter functionality ([#3700](#3700)). In this update, the `migrate-local-code` command has been enhanced by incorporating the latest linter functionality. The `LocalFileMigrator` and `LocalCodeLinter` classes have been merged, and the interfaces of `.fix` and `.apply` methods have been aligned. A new `FixerWalker` has been introduced to address dependencies in the dependency graph, and the existing `databricks labs ucx migrate-local-code` command has been updated accordingly. Relevant unit tests and integration tests have been added and modified to ensure the correctness of the changes, which resolve issue [#3514](#3514) and supersede issue [#3520](#3520). The `lint-local-code` command has also been updated with a flag to specify the path for linting. The `migate-local-code` command now lints local code and generates advice on how to make it compatible with the Unity Catalog, and can also apply local code fixes to make them compatible. * Updated sqlglot requirement from <26.3,>=25.5.0 to >=25.5.0,<26.4 ([#3572](#3572)). In this pull request, we have updated the requirement for the `sqlglot` library in the 'pyproject.toml' file, changing it from being greater than or equal to version 25.5.0 and less than 26.3, to being greater than or equal to version 25.5.0 and less than 26.4. This change is part of issue [#3572](#3572) and was made to allow for the use of the latest version of 'sqlglot'. The pull request includes a changelog from the `sqlglot` repository, detailing the changes made in each version between 25.5.0 and 26.4. The commits relevant to this update include bumping the version of `sqlglotrs` to various versions between 0.3.7 and 0.3.14. This pull request was automatically generated by Dependabot, a tool that creates pull requests to update the dependencies in a project. It is now ready for review and merging. * Updated sqlglot requirement from <26.4,>=25.5.0 to >=25.5.0,<26.7 ([#3677](#3677)). In this release, we have updated the `sqlglot` dependency from version `>=25.5.0,<26.4` to `>=25.5.0,<26.7`. This change allows us to leverage the latest version of `sqlglot`, which includes various bug fixes and improvements, such as avoiding redundant casts in FROM/TO_UTC_TIMESTAMP and enhancing UUID support. Although there are some breaking changes introduced in the latest version, they should not affect our project's functionality. Additionally, this update includes several bug fixes and improvements for specific dialects such as Redshift, BigQuery, and TSQL. Overall, this update enhances the performance and functionality of the `sqlglot` library, ensuring compatibility with the latest version. * Use cached property for table migration index on local checkout context ([#3711](#3711)). In this release, we introduce a new cached property, `_migration_index`, to the `LocalCheckoutContext` class, designed to store the table migration index for the local checkout context. This change aims to prevent multiple recrawling when the migration index is empty. The `linter_context_factory` method has been refactored to utilize the new `_migration_index` property, and the `CurrentSessionState` parameter is removed. Additionally, the `local_code_linter` method has been updated to leverage the new `LinterContext` instance with the `_migration_index` property, instead of using the `linter_context_factory` method. The `LocalCodeLinter` object now accepts a new callable lambda function, returning a `LinterContext` instance with the `_migration_index` property. These enhancements improve code performance by reducing the migration index crawls in the local checkout context and simplify the code by eliminating the `CurrentSessionState` parameter. * [DOCS] Explain when to run `remove-workspace-local-backup-groups` workflow ([#3707](#3707)). In this release, the UCX component of the application has been enhanced with new Databricks workflows for orchestrating the group migration process. The `workflows` command displays the status of the workflows, and the `repair-run` command allows for rerunning failed workflows. The group migration workflow is specifically designed to be executed after a successful assessment workflow, and running it is followed by an optional `remove-workspace-local-backup-groups` workflow. This final step removes unnecessary workspace-level backup groups and their associated permissions, keeping the workspace clean and organized. The `remove-workspace-local-backup-groups` workflow should only be executed after confirming the successful migration of all groups involved. Dependency updates: * Updated sqlglot requirement from <26.3,>=25.5.0 to >=25.5.0,<26.4 ([#3572](#3572)). * Updated sqlglot requirement from <26.4,>=25.5.0 to >=25.5.0,<26.7 ([#3677](#3677)).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
documentation
Improvements or additions to documentation
migrate/code
Abstract Syntax Trees and other dark magic
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes
Extend code migration progress documentation by detailing:
Linked issues
Resolves #2231
Functionality