[FEATURE] Add CLI command to migrate a folder of Python files and notebooks on local file system #3514

JCZuurmond · 2025-01-13T08:12:42Z

Description

Add CLI command to migrate a folder of files/notebooks on local file system

## Changes Delete stale code: `NotebookLinter._load_source_from_run_cell` ### Linked issues Progresses #3514 ### Functionality - [x] modified code linting related - [x] modified existing command: `databricks labs ucx lint-local-code` ### Tests - [x] manually tested - [x] added and modified unit tests

## Changes Rename Python AST's `Tree` methods for clarity - Add docstrings - Rename `append_` to `attach_` and `extend_` to more precisely describe what the methods do - Let `attach_` and `extend_` always return `None` - Extend unit testing ### Linked issues Progresses #3514 Precedes #3520 ### Functionality - [x] modified code linting related - [x] modified existing command: `databricks labs ucx lint-local-code` ### Tests - [x] manually tested - [x] added and modified unit tests

* Implement disposition field in SQL backend ([#3477](#3477)). In this release, we've added a new `query_statement_disposition` configuration option for the SQL backend used in the `databricks labs ucx` command-line interface. This option allows users to choose the disposition method for running large SQL queries during assessment results export, preventing failures in cases of large workspaces with high volumes of findings. The new option is included in the `config.yml` file and used in the SqlBackend definition. The commit also includes updates to the `workspace_cli.py` file and addresses issue [#3447](#3447). The `disposition` parameter has been added to the `StatementExecutionBackend` method, and the `Disposition` enum from the `databricks.sdk.service.sql` module has been added to the `config.py` file. The changes have been manually tested and are included in the modified `databricks labs install ucx` and `databricks labs ucx export-assessment` commands. * AWS role issue with external locations pointing to the root of a storage account ([#3510](#3510)). This release includes a modification to enhance AWS role access for external locations pointing to the root of a storage account, addressing issue [#3510](#3510) and closing issue [#3505](#3505). The `aws.py` file in the `src/databricks/labs/ucx/assessment/` directory has been updated to improve S3 bucket ARN pattern matching, now allowing optional trailing slashes for greater flexibility. In the `access.py` file within the `aws` directory of the `databricks/labs/ucx` package, the `_identify_missing_paths` method now checks if the `role.resource_path` is a parent of the external location path or if they match exactly, allowing root-level external locations to be recognized as compatible with AWS roles. A new method, `AWSUCRoleCandidate`, has been added to the `AWSResources` class, and several test cases have been updated or added to ensure proper functionality with UC roles and AWS resources, including handling cases with multiple role creations. * Added assert to make sure installation is finished before re-installation ([#3546](#3546)). In the latest release, we've addressed an issue (commit 3546) where the reinstallation of a software component was starting before the initial installation was complete, causing a warning message to be suppressed and the test to fail. To rectify this, we have enhanced the integration tests and added an assert to ensure that the installation is finished before attempting reinstallation. A new function called `wait_for_installation_to_finish` has been introduced to manage the waiting process. Furthermore, we have updated the `test_compare_remote_local_install_versions` function to accept `installation_ctx` instead of `ws` as a parameter, ensuring proper configuration and loading of the installation before test execution. These changes guarantee that the test will pass if the installation is finished before the reinstallation is attempted. * Added dashboards to migration progress dashboard ([#3314](#3314)). The release notes have been updated to reflect the new features and changes in the migration progress dashboard. This commit includes the addition of dashboards to track the migration progress, with linting resources added to ensure code quality. The commit also modifies the existing dashboard "Migration [main]" and updates both unit and integration tests. Specific new files and methods have been added to enhance functionality, including the tracking of dashboard migration, and new fixtures have been introduced to improve testing. The changes depend on several issues and break up others to progress functionality. Overall, this commit enhances the migration progress dashboard's capabilities, making it more efficient and reliable for tracking migration progress. * Added history log encoder for dashboards ([#3424](#3424)). A history log encoder for dashboards has been added, addressing issues [#3368](#3368) and [#3369](#3369), which modifies the existing `experimental-migration-progress` workflow. This enhancement introduces a `DashboardProgressEncoder` class that encodes Dashboard objects into Historical records, appending inventory snapshots to the history table. The changes include adding new methods for handling object types such as directories, and updating the `is_delta` property of the `Table` class. The commit also includes new tests: manually tested, unit tests added, and integration tests added. Specifically, `test_table_progress_encoder_table_failures` has been updated to include a new parameter, `is_migrated_table`, which, if set to False, adds `Pending migration` to the list of failures. The `is_used_table` parameter has been removed, and its functionality is no longer part of this commit. The changes are tested through manual, unit, and integration testing, ensuring the proper encoding of migration progress and identifying relevant failures. * Create specific failure for Python syntax error while parsing with Astroid ([#3498](#3498)). In this release, the Python linting-related code has been updated to introduce a specific failure type for syntax errors that occur while parsing code using Astroid. Previously, such errors resulted in a generic `system-error` message, but with this change, a new failure type called `python-parse-error` has been introduced. This new error type includes the start and end line and column numbers of the error and is accompanied by a new issue URL for reporting the error on the UCX GitHub. The `system-error` failure type has been renamed to `python-parse-error` to maintain consistency with the `sql-parse-error` failure type. Additionally, a new method `Tree.maybe_parse()` has been introduced to improve error detection and reporting during Python linting. A unit test has been added to ensure the new failure type is working as intended, and a generic failure is kept for directing users to create GitHub issues for surfacing other issues. * DBR 16 and later support ([#3481](#3481)). This release adds support for Databricks Runtime (DBR) 16 and later, enabling the optional conversion of Hive Metastore (HMS) tables to external tables within the `migrate-tables` workflow. The change includes a new static method `_get_entity_storage_locations` to check for the presence of the `entityStorageLocations` property on table metadata. The existing `_convert_hms_table_to_external` method has been updated to use this new method and to include the `entityStorageLocations` constructor argument if present. The changes have been manually tested for DBR 16, tested with existing integration tests for DBR 15, and verified on the staging environment with DBR 16. Additionally, the `skip_job_wait=True` parameter has been added to specific test function calls to improve test execution time. This release also resolves an issue with a failed test in DBR16 due to a JDK update. * Delete stale code: `NotebookLinter._load_source_from_run_cell` ([#3529](#3529)). In this release, we have improved the code linting functionality in the NotebookLinter class of our open-source library by removing the `_load_source_from_run_cell` method in the sources.py file. This method, previously used to load source code from run cells in a notebook, has been identified as stale code and is no longer required. Consequently, this change affects the `databricks labs ucx lint-local-code` command and results in cleaner and more maintainable code. Furthermore, updated and added unit tests have been included in this commit, which have been manually tested to ensure that the changes do not adversely impact existing functionality, thus progressing issue [#3514](#3514). * Exclude ucx dashboards from Lakeview dashboard crawler ([#3450](#3450)). In this release, the functionality of the `assessment` workflow has been improved to exclude certain dashboard IDs from the Lakeview dashboard crawler. This change has been made to address the issue of false positive dashboards and affects the `_crawl` method in the `dashboards.py` file. The excluded dashboard IDs are now obtained from the `install_state.dashboards` object. Additionally, new methods have been added to the `test_dashboards.py` file in the `unit/assessment` directory to test the exclusion functionality, including a test to ensure that the exclude parameter takes priority over the include parameter. The commit also includes unit tests, manual tests, and screenshots to verify the changes on the staging environment. Overall, this modification enhances the accuracy of the dashboard crawler and simplifies the process of identifying and assessing relevant dashboards. * Fixed issue in installing UCX on UC enabled workspace ([#3501](#3501)). This pull request introduces changes to the UCX installer to address an issue ([#3420](#3420)) with installing UCX on UC-enabled workspaces. It updates the UCX policy by changing the `spark_version` parameter from `fixed` to `allowlist` with a default value, allowing the cluster definition to take `single_user` and `user_isolation` values instead of `Legacy_Single_User` and 'Legacy_Table_ACL'. Additionally, the job definition has been updated to use the default value when not explicitly provided. The changes are implemented in the `test_policy.py` file and impact the `test_job_cluster_policy` and `test_job_cluster_on_uc_enabled_workspace` methods. The pull request also includes updates to unit tests and integration tests to ensure the correct behavior of the updated UCX policy and job definition. The target audience is software engineers adopting this project, with changes involving adjusting policy definitions and testing job cluster behavior under different configurations. Issue [#3501](#3501) is also resolved with these changes. * Fixed typo in workflow name (in error message) ([#3491](#3491)). This PR includes a fix for a minor typo in the error message of the `validate_groups_permissions` method in the `workflows.py` file. The typo resulted in the incorrect spelling of `group` as `groups` in the workflow name. The fix simply changes `groups` to `group` in the error message, ensuring accurate workflow name display. The functionality of the code remains unaffected by this change, and no new methods have been added. To clarify, the `validate_groups_permissions` method verifies whether group permissions have been migrated correctly, and if not, raises a ValueError with an error message suggesting the use of the `validate-group-permissions` workflow for validation after the API has caught up. This fix resolves the typo issue and maintains the expected behavior of the code. * Make link to issue template url safe ([#3508](#3508)). In this commit, the `_definitely_failure` function in the `python_ast.py` file has been modified to make the link to the issue template URL safe using Python's `urllib`. This change ensures that any special characters in the source code passed to the function will be properly displayed in the issue template. If the source code cannot be parsed, the function creates a link to the issue template for reporting a bug in the UCX library, including the source code as part of the issue body. With this commit, the source code is now passed through the `urllib.parse.quote_plus` function before being added to the issue body, making it url-safe and improving the robustness and user-friendliness of the library. This change has been introduced in issue [#3498](#3498) and has been manually tested. * Refactor `PipelineMigrator`'s to add `include_pipeline_ids` ([#3495](#3495)). In this refactoring, the `PipelineMigrator` has been updated to introduce an `include_pipeline_ids` option, replacing the previous `skip_pipeline_ids` flag. This change allows users to specify the list of pipelines to migrate, providing better control over the migration process. The `PipelinesMigrator` constructor, `_get_pipelines_to_migrate`, and `migrate_pipelines` methods have been modified to accommodate this new flag. The `_migrate_pipeline` method now accepts the pipeline ID instead of a `PipelineInfo` object. Additionally, the unit tests have been updated to include the new `include_flag` parameter, which facilitates testing various scenarios with different pipeline lists. Although the commit does not show changes to test files, integration tests should be updated to reflect the new `include-pipeline-ids` flag functionality. This improvement resolves issue [#3492](#3492) and enhances the overall flexibility of the `PipelineMigrator`. * Rename Python AST's `Tree` methods for clarity ([#3524](#3524)). In this release, the `Tree` class in the Python AST library has been updated for improved code clarity and functionality. The `append_` methods have been renamed to `attach_` for better accuracy, and now include docstrings for increased understanding. These methods have been updated to always return `None`. A new method, `attach_child_tree`, has been added, allowing for traversal from both parent and child and propagating any module references. Several new methods and functionalities have been introduced to improve the class, while extensive unit testing has been conducted to ensure functionality. Additionally, the diff includes test cases for various functionalities, such as inferring values when attaching trees and verifying spark module propagation, as well as tests to ensure that certain operations are not supported. This change, linked to issues [#3514](#3514) and [#3520](#3520), may affect any code that calls these methods and relies on their return values. However, the added docstrings and unit tests will help ensure your code continues to function correctly. * Schedule the migration progress workflow to run daily ([#3485](#3485)). This PR introduces changes to the UCX installation process to schedule the migration progress workflow to run automatically once a day, with the default schedule set to run at 5 a.m. UTC. It includes refactoring the plumbing used for managing and installing workflows, enabling them to have a Cron-based schedule. The relevant user documentation has been updated, and the existing `migration-progress-experimental` workflow has been modified. Additionally, unit and integration tests have been added/modified to ensure the proper functioning of the updated code, and new functions have been added to verify the workflow's schedule and task detection. * Scope crawled pipelines in PipelineCrawler ([#3513](#3513)). In this release, the `PipelineCrawler` class in the `pipelines.py` file has been updated to include a new optional argument `include_pipeline_ids` in its constructor. This argument allows users to filter the pipelines that are crawled by specifying a list of pipeline IDs. The `_crawl` method has been modified to check if `include_pipeline_ids` is not `None` and to filter the list of pipelines accordingly. The class now also checks if each pipeline exists before getting its configuration, and logs a warning message if the pipeline is not found. Previously, a `NotFound` exception was raised. Additionally, the code has been updated to use `pipeline.spec.configuration` instead of `pipeline_response.spec.configuration` to get the pipeline configuration. These changes have been tested through new and updated unit tests, including a test for handling creators' user names. Overall, these updates provide improved functionality and flexibility for crawling pipelines. * Updated databricks-labs-blueprint requirement from <0.10,>=0.9.1 to >=0.9.1,<0.11 ([#3519](#3519)). In this release, we have updated the version requirement of the `databricks-labs-blueprint` package to be greater than or equal to 0.9.1 and less than 0.11. This change allows us to use the latest version of the package and includes bug fixes and dependency updates. The hosted runner has been patched in version 0.10.1 to address issues with publishing artifacts in the release workflow. Release notes for previous versions are also provided in the commit. These updates are intended to improve the overall functionality and stability of the library. * Updated databricks-sdk requirement from <0.41,>=0.40 to >=0.40,<0.42 ([#3553](#3553)). In this release, the `databricks-sdk` package requirement has been updated to version 0.41.0, which brings new features, improvements, bug fixes, and API changes. Among the new features are the addition of 'serving.http_request' for calling external functions, and recovery on download failures in the Files API client. Although the specifics of the functionality added and changed are not detailed, the focus of this release appears to be on bug fixes and internal enhancements. Additionally, the API has undergone changes, including added and altered methods and fields, however, specific information about these changes has not been provided in the release notes. * Updated sqlglot requirement from <26.1,>=25.5.0 to >=25.5.0,<26.2 ([#3500](#3500)). A critical update has been implemented in this release for the `sqlglot` package, which has been updated to version 25.5.0 or higher, but less than 26.2. This change is essential to leverage the latest version of sqlglot while avoiding any breaking changes introduced in version 26.1. The new version includes several breaking changes, new features, bug fixes, and modifications to various dialects such as hive, postgres, tsql, and sqlite. Moreover, the tokenizer has been updated to accept underscore-separated number literals. However, the specific impact of these changes on the project is not detailed in the commit message, and software engineers should thoroughly test and review the changes to ensure seamless functionality. * Updated sqlglot requirement from <26.2,>=25.5.0 to >=25.5.0,<26.3 ([#3528](#3528)). In this update, we have modified the version constraint for the `sqlglot` dependency from `>=25.5.0,<26.2` to `>=25.5.0,<26.3` in the `pyproject.toml` file. Sqlglot is a Python-based SQL parser and optimizer, and this change allows us to adopt the latest version of sqlglot within the specified version range. This update addresses potential security vulnerabilities and incorporates performance enhancements and bug fixes, ensuring that our library remains up-to-date and secure. * Updated table-migration workflows to also capture updated migration progress into the history log ([#3239](#3239)). This pull request updates the table-migration workflows to log not only the tables that still need to be migrated, but also the progress of the migration. The affected workflows include `migrate-tables`, `migrate-external-hiveserde-tables-in-place-experimental`, `migrate-external-tables-ctas`, `scan-tables-in-mounts-experimental`, and `migrate-tables-in-mounts-experimental`. The encoder for table-history has been refactored to improve control over when the `TableMigrationStatus` data is refreshed. The documentation has been updated to reflect the changes in each workflow. Additionally, both unit and integration tests have been added and updated to ensure the changes work as intended and resolve any conflicts. A new `ProgressTrackingInstallation` class has been added to support this functionality. The changes have been manually tested and include modifications to the existing workflows, new methods, and a renamed method. The `mock_workspace_client` function has been replaced, and the `external_locations.resolve_mount` method and other methods have not been called. The `TablesCrawler` object's `snapshot` method has been called once to retrieve the list of tables in the Hive metastore. The migration record workflow run is also updated to include the workflow run information in the `workflow_runs` table. These changes are expected to improve the accuracy and reliability of the table-migration workflows. Dependency updates: * Updated sqlglot requirement from <26.1,>=25.5.0 to >=25.5.0,<26.2 ([#3500](#3500)). * Updated databricks-labs-blueprint requirement from <0.10,>=0.9.1 to >=0.9.1,<0.11 ([#3519](#3519)). * Updated databricks-sdk requirement from <0.41,>=0.40 to >=0.40,<0.42 ([#3553](#3553))

* Implement disposition field in SQL backend ([#3477](#3477)). This commit adds a `query_statement_disposition` configuration option for the SQL backend in the UCX tool, allowing users to specify the disposition of SQL statements during assessment results export and preventing failures when dealing with large workspaces and a large number of findings. The new configuration option is added to the `config.yml` file and used by the `SqlBackend` definition. The `databricks labs install ucx` and `databricks labs ucx export-assessment` commands have been modified to support this new functionality. A new `Disposition` enum has been added to the `databricks.sdk.service.sql` module. This change resolves issue [#3447](#3447) and is related to pull request [#3455](#3455). The functionality has been manually tested. * AWS role issue with external locations pointing to the root of a storage account ([#3510](#3510)). The `AWSResources` class in the `aws.py` file has been updated to enhance the regular expression pattern for matching S3 bucket names, now including an optional group for trailing slashes and any subsequent characters. This allows for recognition of external locations pointing to the root of a storage account, addressing issue [#3505](#3505). The `access.py` file within the AWS module has also been updated, introducing a new `path` variable and updating a for loop condition to accurately identify missing paths in external locations referencing the root of a storage account. New unit tests have been added to `tests/unit/aws/test_access.py`, including a `test_uc_roles_create_all_roles` method that checks the creation of all possible UC roles when none exist and external locations with and without folders. Additionally, the `backend` fixture has been updated to include a new external location `s3://BUCKET4`, and various tests have been updated to incorporate this location and handle errors appropriately. * Added assert to make sure installation is finished before re-installation ([#3546](#3546)). In this release, we have added an assertion to ensure that the installation process is completed before attempting to reinstall, addressing a previous issue where the reinstallation was starting before the first installation was finished, causing a warning to not be raised and resulting in a test failure. We have introduced a new function `wait_for_installation_to_finish()`, which retries loading the installation if it is not found, with a timeout of 2 minutes. This function is utilized in the `test_compare_remote_local_install_versions` test to ensure that the installation is finished before proceeding. Furthermore, we have extracted the warning message to a variable `error_message` for better readability. This change enhances the reliability of the installation process. * Added dashboards to migration progress dashboard ([#3314](#3314)). This commit introduces significant updates to the migration progress dashboard, adding dashboards, linting resources, and modifying existing components. The changes include a new dashboard displaying the number of dashboards pending migration, with the data sourced from the `ucx_catalog.multiworkspace.objects_snapshot` table. The existing 'Migration [main]' dashboard has been updated, and unit and integration tests have been adapted accordingly. The commit also renames several SQL files, updates the percentage UDF, grant, job, cluster, table, and pipeline migration progress queries, and resolves linting compatibility issues related to Unity Catalog. The changes depend on issue [#3424](#3424), progress issue [#3045](#3045), and break up issue [#3112](#3112). The new dashboard aims to enhance the migration process and ensure a smooth transition to the Unity Catalog. * Added history log encoder for dashboards ([#3424](#3424)). A new history log encoder for dashboards has been added, addressing issues [#3368](#3368) and [#3369](#3369), and modifying the existing `experimental-migration-progress` workflow. This update includes the addition of the `DashboardOwnership` class, used to generate ownership information for dashboards, and the `DashboardProgressEncoder` class, responsible for encoding progress data related to dashboards. The new functionality is tested through manual, unit, and integration testing. In the `Table` class, the `from_table_info` and `from_historical_data` methods have been added, allowing for the creation of `Table` instances from `TableInfo` objects and historical data dictionaries with more flexibility and safety. The `test_tables.py` file in the `integration/progress` directory has also been updated to include a new test function for checking table failures. These changes improve the tracking and management of dashboard IDs, enhance user name retrieval, and ensure the accurate determination of object ownership. * Create specific failure for Python syntax error while parsing with Astroid ([#3498](#3498)). This commit enhances the Python linting functionality in our open-source library by introducing a specific failure message, `python-parse-error`, for syntax errors encountered during code parsing using Astroid. Previously, a generic `system-error` message was used, which has been renamed to maintain consistency with the existing `sql-parse-error` message. This change provides clearer failure indicators and includes more detailed information about the error location. Additionally, modifications to Python linting-related code, unit test additions, and updates to the README guide users on handling these new error types have been implemented. A new method, `Tree.maybe_parse()`, has been introduced to parse Python code and detect syntax errors, ensuring more precise error handling for users. * DBR 16 and later support ([#3481](#3481)). This pull request introduces support for Databricks Runtime (DBR) 16 and later in the code that converts Hive Metastore (HMS) tables to external tables within the `migrate-tables` workflow. The changes include the addition of a new static method `_get_entity_storage_locations` to handle the new `entityStorageLocations` property in DBR16 and the modification of the `_convert_hms_table_to_external` method to account for this property. Additionally, the `run_workflow` function in the `assessment` workflow now has the `skip_job_wait` parameter set to `True`, which allows the workflow to continue running even if a job within it fails. The changes have been manually tested for DBR16, verified in a staging environment, and existing integration tests have been run for DBR 15. The diff also includes updates to the `test_table_migration_convert_manged_to_external` method to skip job waiting during testing, enabling the test to run successfully on DBR 16. * Delete stale code: `NotebookLinter._load_source_from_run_cell` ([#3529](#3529)). In this update, we have removed the stale code `NotebookLinter._load_source_from_run_cell`, which was responsible for loading the source code from a run cell in a notebook. This change is a part of the ongoing effort to address issue [#3514](#3514) and enhances the overall codebase. Additionally, we have modified the existing `databricks labs ucx lint-local-code` command to update the code linting functionality. We have conducted manual testing to ensure that the changes function as intended and have added and modified several unit tests. The `_load_source_from_run_cell` method is no longer needed, as it was part of a deprecated functionality. The modifications to the `databricks labs ucx lint-local-code` command impact the way code linting is performed, ultimately improving the efficiency and maintainability of the codebase. * Exclude ucx dashboards from Lakeview dashboard crawler ([#3450](#3450)). In this release, we have enhanced the `lakeview_crawler` method in the open-source library to exclude Ucx dashboards and prevent false positives. This has been achieved by adding a new optional argument, `exclude_dashboard_ids`, to the `__init__` method, which takes a list of dashboard IDs to exclude from the crawler. The `_crawl` method has been updated to skip dashboards whose IDs match the ones in the `exclude_dashboard_ids` list. The change includes unit tests and manual testing to ensure proper functionality and has been verified on the staging environment. These updates improve the accuracy and reliability of the dashboard crawler, providing better results for software engineers utilizing this library. * Fixed issue in installing UCX on UC enabled workspace ([#3501](#3501)). This PR introduces changes to the `ClusterPolicyInstaller` class, updating the `spark_version` policy definition from a fixed value to an allowlist with a default value. This resolves an issue where, when UC is enabled on a workspace, the cluster definition takes on `single_user` and `user_isolation` values instead of `Legacy_Single_User` and 'Legacy_Table_ACL'. The job definition is also updated to use the default value when not explicitly provided. These changes improve compatibility with UC-enabled workspaces, ensuring the correct values for `spark_version` in the cluster definition. The PR includes updates to unit tests and installation tests, addressing issue [#3420](#3420). * Fixed typo in workflow name (in error message) ([#3491](#3491)). This PR (Pull Request) addresses a minor typo in the error message displayed by the `validate_groups_permissions` method in the `workflows.py` file. The typo occurred in the workflow name mentioned in the error message, where `group` was incorrectly spelled as "groups." The corrected spelling is now `validate-groups-permissions`. This change does not introduce any new methods or modify any existing functionality, but instead focuses on enhancing the clarity and accuracy of the error message. Ensuring that error messages are free from typos and other inaccuracies is essential for maintaining the usability and effectiveness of the code, as it enables users to more easily troubleshoot any issues that may arise during its usage. * HMS Federation Glue Support ([#3526](#3526)). This commit introduces support for HMS Federation Glue in the open-source library, resolving issue [#3011](#3011). The changes include adding a new command, `migrate-glue-credentials`, to migrate Glue credentials to UC storage credentials in the federation glue for HMS. The `AWSResourcePermissions` class has been updated to include a new parameter `config` for HMS Federation Glue configuration and the `load_uc_compatible_roles` method now accepts an optional `resource_type` parameter for filtering compatible roles based on the provided type. Additionally, the `ExternalLocations` class has been updated to handle S3 resource type when identifying missing external locations. The commit also includes several bug fixes, new classes, methods, and changes to the existing methods to handle AWS Glue resources, and updates to the integration tests. Overall, these changes add significant functionality for AWS Glue support in the HMS Federation Glue feature. * Make link to issue template url safe ([#3508](#3508)). In this release, we have updated the `python_ast.py` file to enhance the encoding of the link to the issue template for bug reports. By utilizing the `urllib.parse.quote_plus()` function from Python's standard library, we have ensured that any special characters in the provided source code will be properly encoded. This eliminates the risk of issues arising from incorrectly interpreted characters, enhancing the reliability of the bug reporting process. This change, initially introduced in issue [#3498](#3498), has been thoroughly tested to guarantee its correct functioning. The rest of the file remains unaffected, preserving its original functionality. * Refactor `PipelineMigrator`'s to add `include_pipeline_ids` ([#3495](#3495)). In this release, the `PipelineMigrator` class has been refactored to enhance pipeline migration functionality. The `skip-pipeline-ids` flag has been replaced with `include-pipeline-ids`, allowing users to specify a list of pipelines to migrate, rather than listing pipelines to skip. Additionally, the `exclude_pipeline_ids` functionality has been added to provide even more granularity in pipeline selection. The `migrate_pipelines` method now prioritizes `include_pipeline_ids` and `exclude_pipeline_ids` parameters to determine the list of pipelines for migration. The `_migrate_pipeline` method has been updated to accept a string pipeline ID and now checks if the pipeline has already been migrated. Several support methods, such as `_clone_pipeline`, have also been refactored for improved functionality. Although no new methods were added, the behavior of the `migrate_pipelines` method has changed. While unit tests have been updated to cover the changes, integration tests have not been modified yet. Ensure thorough testing to prevent any new issues or breaks in existing functionality. * Release v0.54.0 ([#3530](#3530)). 0.54.0 brings several enhancements and bug fixes to the UCX library. A `query_statement_disposition` option is added to the SQL backend to handle large SQL queries during assessment results export, preventing potential failures in large workspaces with high volumes of findings. AWS role compatibility checks are improved for external locations pointing to the root of a storage account. Dashboards are enhanced with added migration progress dashboards and a history log encoder. New failure types are introduced for Python syntax errors during parsing and SQL parsing errors. The library now supports DBR 16 and later versions, with optional conversion of Hive Metastore tables to external tables in the `migrate-tables` workflow. The `PipelineMigrator` functionality is refactored to add an `include_pipeline_ids` parameter for better control over the migration process. Various dependency updates, including `databricks-labs-blueprint`, `databricks-sdk`, and `sqlglot`, are included in this release, which bring new features, improvements, and bug fixes, as well as API changes. Please thoroughly test and review the changes to ensure seamless functionality. * Rename Python AST's `Tree` methods for clarity ([#3524](#3524)). In this release, we have made significant improvements to the clarity of the Python AST's `Tree` methods in the `python_analyzer.py` file. The `append_` and `extend_` methods have been renamed to `attach_` to better reflect their functionality. These methods now always return `None`. New methods such as `attach_child_tree`, `attach_nodes`, and `extend_globals` have been introduced to enhance the functionality of the library. The `attach_child_tree` method allows for attaching one tree as a child of another tree, propagating module references and enabling traversal from both the parent and child trees. The `attach_nodes` method sets the parent of the attached nodes and adds them to the body of the tree. Additionally, docstrings have been added, and unit testing has been expanded. The changes include modifications to code linting, existing command functionalities, and manual testing to ensure compatibility. These enhancements improve the clarity, functionality, and flexibility of the Python AST's `Tree` methods. * Revert "Release v0.54.0" ([#3569](#3569)). In version 0.53.1, we have reverted changes from 0.54.0 to address issues with the previous release and ensure proper propagation to PyPI. This version includes various updates such as implementing a disposition field in the SQL backend, improving ARN pattern matching for AWS roles, adding dashboards to migration progress, enhancing Python linting functionality, and adding support for DBR 16 in converting Hive Metastore tables to external tables. We have also excluded UCX dashboards from the Lakeview dashboard crawler, refactored PipelineMigrator's to add include_pipeline_ids, and updated the sqlglot and databricks-labs-blueprint requirements. Additionally, several issues related to installation, typo in workflow name, and table-migration workflows have been fixed. The sqlglot requirement has been updated from <26.1,>=25.5.0 to >=25.5.0,<26.2, and databricks-labs-blueprint from <0.10,>=0.9.1 to >=0.9.1,<0.11. This release does not introduce any new methods or change existing functionality, but focuses on addressing bugs and improving functionality. * Schedule the migration progress workflow to run daily ([#3485](#3485)). This PR introduces a daily scheduling mechanism for the UCX installation's migration progress workflow, allowing it to run automatically once per day at 5 a.m. UTC. It includes refactoring the plumbing for managing and installing workflows, enabling them to have a Cron-based schedule. Relevant user documentation has been updated, and existing unit and integration tests have been added to ensure the changes function as intended. A new test has been added to verify the migration-progress workflow is installed with a schedule attached, checking the workflow schedule's quartz cron expression, time zone, and pause status, as well as confirming that the workflow is unpaused upon installation. The PR also introduces new methods to manage workflow scheduling and configure cron-based schedules. * Scope crawled pipelines in PipelineCrawler ([#3513](#3513)). In the latest release, we have introduced a new optional argument, 'include_pipeline_ids', in the constructor of the PipelinesCrawler class located in the 'databricks/labs/ucx/assessment' module. This argument allows users to filter pipelines based on a list of pipeline IDs, improving the crawler's flexibility and efficiency in processing pipelines. In the `_crawl` method of the PipelinesCrawler class, a new behavior has been implemented based on the value of 'include_pipeline_ids'. If the argument is not None, then the method uses the pipeline IDs from this list instead of retrieving all pipelines. Additionally, two unit tests have been added to verify the functionality of this new argument and ensure that the crawler handles cases where a pipeline is not found or its specification is missing. A new parameter, 'force_refresh', has also been added to the `snapshot` function. This release aims to provide a more efficient and customizable pipeline crawling experience for users. * Updated databricks-labs-blueprint requirement from <0.10,>=0.9.1 to >=0.9.1,<0.11 ([#3519](#3519)). In this update, the requirement for the `databricks-labs-blueprint` library has been changed from version range '<0.10,>=0.9.1>' to a new range of '>=0.9.1,<0.11'. This change allows for the use of the latest version of the library while maintaining compatibility with the current project setup, and is based on information from the library's releases and changelog. The commit includes a list of commits and dependencies for the updated library. This update was automatically implemented by Dependabot, a tool that handles dependency updates and conflict resolution, ensuring a seamless integration process for engineers adopting the project. * Updated databricks-sdk requirement from <0.41,>=0.40 to >=0.40,<0.42 ([#3553](#3553)). In this release, we have updated the `databricks-sdk` package requirement to permit version 0.41 while excluding version 0.42. This update includes several improvements and new features in version 0.41, such as the addition of the `serving.http_request` method for calling external functions and enhancements to the Files API client to recover from download failures. The commit also includes bug fixes, internal changes, and updates to the API for better functionality and compatibility. It is essential to note that these changes have been made to ensure compatibility with the latest features and improvements in the `databricks-sdk` package. * Updated sqlglot requirement from <26.1,>=25.5.0 to >=25.5.0,<26.2 ([#3500](#3500)). In this release, we have updated the version requirement for the sqlglot package. The minimum version required is now 25.5.0 and less than 26.2, previously it was 25.5.0 and less than 26.1. This change allows for the most recent version of sqlglot to be installed, while still maintaining compatibility with the current codebase. The update is necessary due to breaking changes introduced in version 26.1.0 of sqlglot, including normalizing before qualifying tables, requiring the `AS` token in CTEs for all dialects except spark and databricks, supporting Unicode in sqlite, mysql, tsql, postgres, and oracle, parsing ASCII into Unicode to facilitate transpilation, and improving transpilation of CHAR[ACTER]_LENGTH. Additionally, several bug fixes and new features have been added in this update. * Updated sqlglot requirement from <26.2,>=25.5.0 to >=25.5.0,<26.3 ([#3528](#3528)). In this release, we have updated the version constraint for the `sqlglot` dependency in our project's "pyproject.toml" file. The previous constraint allowed versions between 25.5.0 and 26.2, while the new constraint allows versions between 25.5.0 and 26.3. This change was made to ensure that we can use the latest version of sqlglot while also preventing the version from exceeding 26.3. Additionally, the commit includes detailed information about the specific commits and changes made in the updated version of sqlglot, providing valuable insights for software engineers working with this open-source library. * Updated table-migration workflows to also capture updated migration progress into the history log ([#3239](#3239)). The table-migration workflows have been updated to log not only the tables that still need to be migrated, but also the updated progress information into the history log, ensuring a more comprehensive record of migration progress. The affected workflows include `migrate-tables`, `migrate-external-hiveserde-tables-in-place-experimental`, `migrate-external-tables-ctas`, `scan-tables-in-mounts-experimental`, and `migrate-tables-in-mounts-experimental`. The encoder for table-history has been updated to prevent implicit refresh of `TableMigrationStatus` data during initialization. Additionally, the documentation has been updated to reflect which workflows update which tables. New and updated unit and integration tests, as well as manual testing, have been conducted to ensure the functionality of the changes. Dependency updates: * Updated sqlglot requirement from <26.1,>=25.5.0 to >=25.5.0,<26.2 ([#3500](#3500)). * Updated databricks-labs-blueprint requirement from <0.10,>=0.9.1 to >=0.9.1,<0.11 ([#3519](#3519)). * Updated databricks-sdk requirement from <0.41,>=0.40 to >=0.40,<0.42 ([#3553](#3553)).

## Changes While graph building for linting, the `DependencyProblem` are converted to `LocatedAdvice`. This PR aligns the two classes ### Linked issues Progresses #3514 Breaks up #3520 ### Functionality - [x] modified existing command: `databricks labs ucx migrate-local-code` ### Tests - [ ] manually tested - [x] modified and added unit tests - [x] modified and added integration tests

## Changes Make fixer diagnostic codes unique so that the right fixer can be found for code migration/fixing. ### Linked issues Progresses #3514 Breaks up #3520 ### Functionality - [x] modified existing command: `databricks labs ucx migrate-local-code` ### Tests - [ ] manually tested - [x] modified and added unit tests - [x] modified and added integration tests

## Changes Move all graph walkers into a separate module `graph_walkers.py` under the `linters` module to centralize them, because they are used for linting only and to reuse them instead of recreating them ### Linked issues Progresses #3514 Breaks up #3520 ### Functionality - [x] modified existing command: `databricks labs ucx migrate-local-code` ### Tests - [ ] manually tested - [x] modified and added unit tests - [x] modified and added integration tests

## Changes Remove tree from `PythonSequentialLinter` as the sequential linter should just sequence linting, not be used as an intermediate for manipulating the code tree. - Remove tree manipulation related logic from `PythonSequentialLinter` - Rewrite `NotebookLinter` to do the (notebook) tree manipulation instead: - Let `_load_tree_from_notebook` early return on `Failure` similarly to dependency graph building: if we cannot resolve the code used by a notebook then fail early - Attach subsequent cell as child tree to the cell before - Attach `%run` notebook trees a child tree to the cell that calls the notebook ### Linked issues Resolves #3543 Progresses #3514 ### Linked PRs Stacked on : - [x] #3524 Requires : - [x] #3529 - [x] #3550 ### Functionality - [x] modified code linting related - [x] modified existing command: `databricks labs ucx lint-local-code` ### Tests - [x] manually tested - [x] added and modified unit tests

…ing back on its path (#3641) ## Changes Let `LocalFileLinter` reuse the `Dependency` instead falling back on its path as this introduce duplicate ways of loading files. ### Linked issues Progresses #3514 Breaks up #3520 Stacked on: - [x] #3640 - [x] #3639 - [x] #3638 ### Functionality - [x] modified existing command: `databricks labs ucx migrate-local-code` ### Tests - [ ] manually tested - [x] modified and added unit tests - [x] modified and added integration tests

## Changes Add fixing files to `LocalFileLinter`: - Add `apply` method to `LocalFileLinter` to fix files using the linters - Support writing back to a `LocalFile` container ### Linked issues Progresses #3514 ### Functionality - [x] modified existing command: `databricks labs ucx migrate-local-code` ### Tests - [ ] manually tested - [x] modified and added unit tests - [x] modified and added integration tests

Add fixing files to `LocalFileLinter`: - Add `apply` method to `LocalFileLinter` to fix files using the linters - Support writing back to a `LocalFile` container Progresses #3514 - [x] modified existing command: `databricks labs ucx migrate-local-code` - [ ] manually tested - [x] modified and added unit tests - [x] modified and added integration tests

## Changes Update `migate-local-code` to use latest linter functionality - Merge `LocalFileMigrator` with `LocalCodeLinter` - Align `.fix` and `.apply` interface - Introduce `FixerWalker` to fix dependencies in dependency graph ### Linked issues Resolves #3514 Supersedes: #3520 Stacked on: - [x] #3535 ### Functionality - [x] modified existing command: `databricks labs ucx migrate-local-code` ### Tests - [ ] manually tested - [x] modified and added unit tests - [x] modified and added integration tests

## Changes Add notebook fixing to `NotebookLinter` - Implement `Notebook.apply` - Call `Notebook.apply` from `FileLiner.apply` for `Notebook` source container - Remove legacy `NotebookMigrator` - Introduce `PythonLinter` to run apply on a AST tree - Allow to ### Linked issues Progresses #3514 Breaks up #3520 ### Functionality - [x] modified existing command: `databricks labs ucx migrate-local-code` ### Tests - [x] manually tested - [x] modified and added unit tests - [x] modified and added integration tests

* Added documentation to use Delta Live Tables migration ([#3587](#3587)). In this documentation update, we introduce a new section for migrating Delta Live Table pipelines to the Unity Catalog as part of the migration process. This workflow allows for the original and cloned pipelines to run independently after the cloned pipeline reaches the `RUNNING` state. The update includes an example of stopping and renaming an existing HMS DLT pipeline, and creating a new cloned pipeline. Additionally, known issues and limitations are outlined, such as supported streaming sources, maintenance pausing, and querying by timestamp. To streamline the migration process, the `migrate-dlt-pipelines` command is introduced with optional parameters for including or excluding specific pipeline IDs. This feature is intended for developers and administrators managing data pipelines and handling table aliasing issues. Relevant user documentation has been added and the changes have been manually tested. * Added support for MSSQL and POSTGRESQL to HMS Federation ([#3701](#3701)). In this enhancement, the open-source library now supports Microsoft SQL Server (MSSQL) and PostgreSQL databases in the Hive Metastore Federation (HMS Federation) feature. This update introduces classes for handling external Hive Metastore instances and their versions, and refactors a regex pattern for better support of various JDBC URL formats. A new `supported_databases_port` class variable is added to map supported databases to default ports, allowing the code to handle SQL Server's distinct default port. Additionally, a `supported_hms_versions` class variable is created, outlining supported Hive Metastore versions. The `_external_hms` method is updated to extract HMS version information more accurately, and the `_split_jdbc_url` method is refactored for better URL format compatibility and parameter extraction. The test file `test_federation.py` has been updated with new unit tests for external catalog creation with MSSQL and PostgreSQL, further enhancing compatibility with various databases and expanding HMS Federation's capabilities. * Added the CLI command for migrating DLT pipelines ([#3579](#3579)). A new CLI command, "migrate-dlt-pipelines," has been added for migrating DLT pipelines from HMS to UC using the DLT Migration API. This command allows users to include or exclude specific pipeline IDs during migration using the `--include-pipeline-ids` and `--exclude-pipeline-ids` flags, respectively. The change impacts the `PipelinesMigrator` class, which has been updated to accept and use these new parameters. Currently, there is no information available about testing, but the changes are expected to be manually tested and accompanied by corresponding unit and integration tests in the future. The changes are isolated to the `PipelinesMigrator` class and related functionality, with no impact on existing methods or functionality. * Addressed Bug with Dashboard migration ([#3663](#3663)). In this release, the `_crawl` method in `dashboards.py` has been enhanced to exclude SDK dashboards that lack IDs during the dashboard migration process. This modification enhances migration efficiency by avoiding unnecessary processing of incomplete dashboards. Additionally, the `_list_dashboards` method now includes a check for dashboards with no IDs while iterating through the `dashboards_iterator`. If a dashboard with no ID is found, the method fetches the dashboard details using the `_get_dashboard` method and adds them to the `dashboards` list, ensuring proper processing. Furthermore, a bug fix for issue [#3663](#3663) has been implemented in the `RedashDashboardCrawler` class in `assessment/test_dashboards.py`. The `get` method has been added as a side effect to the `WorkspaceClient` mock's `dashboards` attribute, enabling the retrieval of individual dashboard objects by their IDs. This modification ensures that the `RedashDashboardCrawler` can correctly retrieve and process dashboard objects from the `WorkspaceClient` mock, preventing errors due to missing dashboard objects. * Broaden safe read text caught exception scope ([#3705](#3705)). In this release, the `safe_read_text` function has been enhanced to handle a broader range of exceptions that may occur while reading a text file, including `OSError` and `UnicodeError`, making it more robust and safe. The function previously caught specific exceptions such as `FileNotFoundError`, `UnicodeDecodeError`, and `PermissionError`. Additionally, the codebase has been improved with updated unit tests, ensuring that the new functionality works correctly. The linting parts of the code have also been updated, enhancing the readability and maintainability of the project for other software engineers. A new method, `safe_read_text`, has been added to the `source_code` module, with several new test cases designed to ensure that the method handles edge cases correctly, such as when the file does not exist, when the path is a directory, or when an OSError occurs. These changes make the open-source library more reliable and robust for various use cases. * Case sensitive/insensitive table validation ([#3580](#3580)). In this release, the library has been updated to enable more flexible and customizable metadata comparison for tables. A case sensitive flag has been introduced for metadata comparison, which allows for consideration or ignoring of column name case during validation. The `TableMetadataRetriever` abstract base class now includes a new parameter `column_name_transformer` in the `get_metadata` method, which is a callable that can be used to transform column names as needed for comparison. Additionally, a new `case_sensitive` parameter has been added to the `StandardSchemaComparator` constructor to determine whether column names should be compared case sensitively or not. A new parametrized test function `test_schema_comparison_case` has also been included to ensure that this functionality works as expected. These changes provide users with more control over the metadata comparison process and improve the library's handling of cases where column names in the source and target tables may have different cases. * Catch `AttributeError` in `InfferedValue._safe_infer_internal` ([#3684](#3684)). In this release, we have implemented a change to the `_safe_infer_internal` method in the `InferredValue` class to catch `AttributeError`. This change addresses an issue in the Astroid library reported in their GitHub repository (<pylint-dev/astroid#2683>) and resolves issue [#3659](#3659) in our project. By handling `AttributeError` during the inference process, we have made the code more robust and safer. When an exception occurs, an error message is logged with debug-level logging, and the method yields the `Uninferable` sentinel value to indicate that inference failed for the node. This enhancement strengthens the source code linting code through value inference in our open-source library. * Document to run `validate-groups-membership` before groups migration, not after ([#3631](#3631)). In this release, we have updated the order of executing the `validate-groups-membership` command in the group migration process. Previously, the command was recommended to be run after the groups migration, but it has been updated to be executed before the migration. This change ensures that the groups have the correct membership and the number of groups and users in the workspace and account are the same before migration, providing an extra level of safety. Additionally, we have updated the `remove-workspace-local-backup-groups` command to remove workspace-level backup groups and their permissions only after confirming the successful migration of all groups. We have also updated the spelling of the `validate-group-membership` command to `validate-groups-membership` in a documentation file. This release is aimed at software engineers who are adopting the project and looking to migrate their groups to the account level. * Extend code migration progress documentation ([#3588](#3588)). In this documentation update, we have added two new sections, `Code Migration` and "Final details," to the open-source library's migration process documentation. The `Code Migration` section provides a detailed walkthrough of the steps to migrate code after completing table migration and data reconciliation, including using the linter to investigate compatibility issues and linted workspace resources. The "[linter advices](/docs/reference/linter_codes)" provide codes and messages on detected issues and resolution methods. The migrated code can then be prioritized and tracked using the `migration-progress` dashboard, and migrated using the `migrate-` commands. The `Final details` section outlines the steps to take once code migration is complete, including running the `cluster-remap` command to remap clusters to be Unity Catalog compatible. This update resolves issue [#2231](#2231) and includes updated user documentation, with new methods for linting and migrating local code, managing dashboard migrations, and syncing workspace information. Additional commands for creating and validating table mappings, migrating locations, and assigning metastores are also included, with the aim of improving the code migration process by providing more detailed documentation and new commands for managing the migration. * Fixed Skip/Unskip schema functionality ([#3567](#3567)). In this release, we have addressed the improper handling of skip/unskip schema functionality in our open-source library. The `skip_schema` and `unskip_schema` methods in the `mapping.py` file have been updated to include the `hive_metastore` schema prefix while setting or unsetting the database property that determines whether a schema should be skipped. Additionally, the `_get_database_in_scope_task` and `_get_table_in_scope_task` methods have been modified to parse table properties as a dictionary, allowing for more straightforward lookup of the skip property for a table. The `test_skip_with_schema` and `test_unskip_with_schema` methods in the `tests/unit/test_cli.py` file have also been updated. The `test_skip_with_schema` method now includes the catalog name `hive_metastore` in the `ALTER SCHEMA` statement, ensuring that the schema is properly skipped. The `test_unskip_with_schema` method has been modified to use the `SET DBPROPERTIES` statement to set the value of the `databricks.labs.ucx.skip` property to `false`, effectively unskipping the schema. Furthermore, the `execute` method in the `sbe` module and the queries in the `mock_backend` module have been updated to match the new commands. These changes address the issue of improperly skipping schemas and ensure that the code functions as intended, allowing users to skip and unskip schemas as needed. Overall, these modifications improve the reliability and correctness of the skip/unskip schema functionality, ensuring that it behaves as expected in different scenarios. * Fixed `Total Tables` widget in assessment to only show table counts ([#3738](#3738)). In this release, we have addressed the issue with the `Total Tables` widget in the assessment dashboard as part of resolving [#3738](#3738) and in relation to [#3252](#3252). The revised `00_3_count_total_tables.sql` query in the `src/databricks/labs/ucx/queries/assessment/main/` directory now includes a WHERE clause to filter out views from the table count query. By excluding views and only displaying table counts in the `Total Tables` widget, the scope of changes is limited to the SQL query itself. The diff reflects the addition of the WHERE clause and necessary indentation. The commit has been manually tested as part of our quality assurance process, and the successful test results are documented in the `Tests` section of the commit message. * Fixed broken anchor for doc release ([#3720](#3720)). In this release, we have developed and implemented fixes to address issues with the Databricks workflows documentation used in the migration process. The previous version contained a broken anchor reference for the workflow process, which has now been corrected. This improvement includes the addition of a manual test to verify the fix. The revised documentation enables users to view the status of deployed workflows and rerun failed workflows using the `workflows` and `repair-run` commands, respectively. These updates simplify the management and troubleshooting of workflows, enhancing the overall user experience. * Fixed broken anchors in documentation ([#3712](#3712)). In this release, we have made significant improvements to the UCX process documentation, addressing issues related to broken anchors, outdated command names, and syntax. The commands `enable_hms_federation` and `create_federated_catalog` have been renamed to `enable-hms-federation` and `create-federated-catalog`, respectively. These updates include corresponding changes to the command syntax and have been manually tested to ensure accuracy. Additionally, we have added a new command, `validate-groups-membership`, which can be executed prior to the group migration workflow for added confidence. In case of no matching account group in the UCX-installed workspace, the `create-account-groups` command is now available. This release also includes updates to the section titles and links to enhance clarity and reflect current functionality. * Fixed notebook sources with `NotebookLinter.apply` ([#3693](#3693)). A new `Github.py` file has been added to the `databricks/labs/ucx/` directory, providing functionality for working with GitHub issues. It includes an `IssueType` enum, a `construct_new_issue_url` function, and constants for constructing URLs to the documentation and GitHub repository. The `NotebookLinter` class has been updated to include notebook fixing functionality, and the `PythonLinter` class has been introduced to run `apply` on an Abstract Syntax Tree (AST) tree. The `Notebook.apply` method has been implemented to apply changes to notebook sources and the legacy `NotebookMigrator` has been removed. These changes also include various unit and integration tests and modifications to the existing `databricks labs ucx migrate-local-code` command. The `DOCS_URL` method has been added to the `databricks.labs.ucx.github` module, and the error message for external metastore connectivity issues now includes a link to the UCX installation instruction in the documentation. * Fixed the broken documentation links in dashboards ([#3726](#3726)). This revision updates documentation links in various dashboards to correct broken links and enhance the user experience. Specifically, it addresses issues [#3725](#3725) and [#3726](#3726) by updating links in the "Assessment Overview," "Assessment Summary," and `Compute summary` dashboards, as well as the `group migration` and `table upgrade` documentation. The changes include replacing local Markdown file links with online documentation links and updating links to point to the correct documentation sections in the UCX GitHub repository. Although the changes have been manually tested, no unit or integration tests have been added, and staging environment verification has not been performed. Despite this, the revisions ensure accurate and up-to-date documentation links, improving the usability of the dashboards. * Force `MaybeDependency` to have a `Dependency` OR `list[Problem]`, not neither nor both ([#3635](#3635)). This commit enforces the `MaybeDependency` object to have either a `Dependency` or a `list[Problem]`, but not neither or both, in order to handle known libraries during import registration. It resolves issue [#3585](#3585), breaks up issue [#3626](#3626), and progresses issue [#1527](#1527), while modifying code linting logic and updating unit tests to accommodate these changes. Specifically, new classes like `KnownLoader`, `KnownDependency`, and `KnownProblem` have been introduced, and the `_resolve_allow_list` method has been updated to reflect the new enforcement. Additionally, tests have been added and modified to ensure the correct behavior of the modified logic, with a focus on handling directories, resolving children in context, and detecting known problems in imported libraries. * HMS Federation Documentation ([#3688](#3688)). The HMS Federation feature allows Hive Metastore (HMS) to be federated to a catalog, acting as a step towards migrating to Unity Catalog or as a hybrid solution where both HMS and UC access to the data is required. This feature provides an alternative to the table migration process, eliminating the need for table mapping, creating catalogs and schemas, and migrating Hive metastore data objects. The `enable_hms_federation` command enables the Hive Metastore federation process, while the `create_federated_catalog` command creates a UC catalog that mirrors all the schemas and tables in the source Hive Metastore. The `migrate-glue-credentials` command, which is AWS-only, creates a UC Service Credential for GLUE. These new commands are documented in the HMS Federation Documentation section and are now part of the migration process documentation with the data reconciliation step following it. To enable HMS Federation, use the `enable-hms-federation` and `create-federated-catalog` commands. * Make `MaybeTree` the main Python AST entrypoint for constructing the syntax tree ([#3550](#3550)). In this release, the main entry point for constructing the Python AST syntax tree has been changed from `Tree` to `MaybeTree` in the open-source library. This change involves moving class methods and static methods that construct a `MaybeTree` from the `Tree` class to the `MaybeTree` class, and making the class method that normalizes the source code before parsing the only entry point. The `normalized_parse` method has been renamed to `from_source_code` to match the commonly used naming for class methods within UCX. The `walk` and `first_statement` methods have been removed from `MaybeTree` as they were repetitions from `Tree`'s methods. These changes aim to enforce normalization and improve code consistency. Additionally, unit tests have been added and the Python linting related code has been modified to work with the new `MaybeTree` class. This change resolves issues [#3457](#3457) and [#3213](#3213). * Make fixer diagnostic codes unique ([#3582](#3582)). This commit modifies the `databricks labs ucx migrate-local-code` command to make fixer diagnostic codes unique, ensuring accurate code migration and fixing. Two new methods have been added for modifying and adding unit and integration tests. Diagnostic codes for the `table-migrated-to-uc` issue are now unique depending on the context where the table is referenced: SQL, Python, or Python-SQL. This ensures the appropriate fixer is applied when addressing code migration issues, improving overall functionality and user experience. Additionally, the commit updates the documentation to include the new postfixes for the `table-migrated-to-uc` linter code and their descriptions, making it clearer for developers to diagnose and resolve issues related to table migration. * Removed the linting false positive for missing table format warning when using `spark.table` ([#3589](#3589)). In this release, linting false positives related to missing table format warnings when using `spark.table` have been addressed, resolving issue [#3545](#3545). The linting logic and unit tests have been updated to handle changes in the default format for table references in Databricks Runtime 8.0, which now uses Delta as the default format. These changes improve the accuracy of the linting process, reducing unnecessary warnings and enhancing the overall developer experience. Additionally, the `test_linting_walker_populates_paths` unit test in the `test_jobs.py` file has been updated to use a different file path for testing. * Removed tree from `PythonSequentialLinter` ([#3535](#3535)). In this release, the `PythonSequentialLinter` has been refactored to no longer manipulate the code tree, and instead, the tree manipulation logic has been moved to `NotebookLinter`. This change improves the separation of concerns between the two components, resulting in a more modular and maintainable codebase. The `NotebookLinter` now handles early failure when resolving the code used by a notebook and attaches `%run` notebook trees as a child tree to the cell that calls the notebook. The code linting functionality has been modified, and the `databricks labs ucx lint-local-code` command has been updated. These changes resolve [#3543](#3543) and progress [#3514](#3514) and are dependent on PRs [#3529](#3529) and [#3550](#3550). The changes have been manually tested and include added and modified unit tests. Additionally, the `Advice` class has been updated to include a type variable `T`, which allows for more specific type hinting when creating instances of the class and its subclasses. * Rename file language helper function ([#3661](#3661)). In this code change, the helper function for determining the file language and checking its support by the linter has been renamed and refactored. The function, previously called `file_language`, has been updated and now named `infer_file_language_if_supported`. This change clarifies the function's purpose as it not only infers the file language but also checks if the file is supported by the linter, acting as a filter. The function returns a `Language` object if the file is supported or `None` if it is not. The `infer_file_language_if_supported` function has been used in other parts of the codebase, such as the `is_a_notebook` function. This change improves the codebase's readability and maintainability by making the helper function's purpose more explicit. The related code has been updated to use the new function accordingly. * Scope crawled jobs in `JobsCrawler` with `include_job_ids` ([#3658](#3658)). In this release, the `JobsCrawler` class in the `workflow_task.py` file has been updated to include a new optional parameter `include_job_ids` in the constructor. This parameter allows users to specify a list of job IDs to include in the crawling process, improving efficiency in large workspaces. Additionally, a check has been added to the `_assess_jobs` method to skip jobs whose IDs are not in the list of included IDs. Integration tests have been added to ensure the correct behavior of the new feature. This change resolves issue [#3656](#3656), which requested the ability to crawl jobs based on a specific list of job IDs. It is recommended to add a comment to the code explaining the purpose and usage of the `include_job_ids` parameter and update the documentation accordingly. * Support fixing `LocalFile`'s with `FileLinter` ([#3660](#3660)). In this release, we have added new methods `write_text`, `safe_write_text`, `back_up_path`, and `revert_back_up_path` to the `base.py` file to support fixing files in `LocalFile` containers and adding unit tests and integration tests. The `LocalFile` class in the "files.py" file has been extended to include new methods and properties, such as `apply`, `migrated_code`, `back_up_path`, and `back_up_original_and_flush_migrated_code`, enabling fixing files using linters and writing changes back to the container. The `databricks labs ucx migrate-local-code` command has also been updated to utilize the new functionality. These changes address issue [#3514](#3514), ensuring the proper handling of errors during file writing and providing automated fixing of code issues within LocalFiles. * Updated `migate-local-code` to use latest linter functionality ([#3700](#3700)). In this update, the `migrate-local-code` command has been enhanced by incorporating the latest linter functionality. The `LocalFileMigrator` and `LocalCodeLinter` classes have been merged, and the interfaces of `.fix` and `.apply` methods have been aligned. A new `FixerWalker` has been introduced to address dependencies in the dependency graph, and the existing `databricks labs ucx migrate-local-code` command has been updated accordingly. Relevant unit tests and integration tests have been added and modified to ensure the correctness of the changes, which resolve issue [#3514](#3514) and supersede issue [#3520](#3520). The `lint-local-code` command has also been updated with a flag to specify the path for linting. The `migate-local-code` command now lints local code and generates advice on how to make it compatible with the Unity Catalog, and can also apply local code fixes to make them compatible. * Updated sqlglot requirement from <26.3,>=25.5.0 to >=25.5.0,<26.4 ([#3572](#3572)). In this pull request, we have updated the requirement for the `sqlglot` library in the 'pyproject.toml' file, changing it from being greater than or equal to version 25.5.0 and less than 26.3, to being greater than or equal to version 25.5.0 and less than 26.4. This change is part of issue [#3572](#3572) and was made to allow for the use of the latest version of 'sqlglot'. The pull request includes a changelog from the `sqlglot` repository, detailing the changes made in each version between 25.5.0 and 26.4. The commits relevant to this update include bumping the version of `sqlglotrs` to various versions between 0.3.7 and 0.3.14. This pull request was automatically generated by Dependabot, a tool that creates pull requests to update the dependencies in a project. It is now ready for review and merging. * Updated sqlglot requirement from <26.4,>=25.5.0 to >=25.5.0,<26.7 ([#3677](#3677)). In this release, we have updated the `sqlglot` dependency from version `>=25.5.0,<26.4` to `>=25.5.0,<26.7`. This change allows us to leverage the latest version of `sqlglot`, which includes various bug fixes and improvements, such as avoiding redundant casts in FROM/TO_UTC_TIMESTAMP and enhancing UUID support. Although there are some breaking changes introduced in the latest version, they should not affect our project's functionality. Additionally, this update includes several bug fixes and improvements for specific dialects such as Redshift, BigQuery, and TSQL. Overall, this update enhances the performance and functionality of the `sqlglot` library, ensuring compatibility with the latest version. * Use cached property for table migration index on local checkout context ([#3711](#3711)). In this release, we introduce a new cached property, `_migration_index`, to the `LocalCheckoutContext` class, designed to store the table migration index for the local checkout context. This change aims to prevent multiple recrawling when the migration index is empty. The `linter_context_factory` method has been refactored to utilize the new `_migration_index` property, and the `CurrentSessionState` parameter is removed. Additionally, the `local_code_linter` method has been updated to leverage the new `LinterContext` instance with the `_migration_index` property, instead of using the `linter_context_factory` method. The `LocalCodeLinter` object now accepts a new callable lambda function, returning a `LinterContext` instance with the `_migration_index` property. These enhancements improve code performance by reducing the migration index crawls in the local checkout context and simplify the code by eliminating the `CurrentSessionState` parameter. * [DOCS] Explain when to run `remove-workspace-local-backup-groups` workflow ([#3707](#3707)). In this release, the UCX component of the application has been enhanced with new Databricks workflows for orchestrating the group migration process. The `workflows` command displays the status of the workflows, and the `repair-run` command allows for rerunning failed workflows. The group migration workflow is specifically designed to be executed after a successful assessment workflow, and running it is followed by an optional `remove-workspace-local-backup-groups` workflow. This final step removes unnecessary workspace-level backup groups and their associated permissions, keeping the workspace clean and organized. The `remove-workspace-local-backup-groups` workflow should only be executed after confirming the successful migration of all groups involved. Dependency updates: * Updated sqlglot requirement from <26.3,>=25.5.0 to >=25.5.0,<26.4 ([#3572](#3572)). * Updated sqlglot requirement from <26.4,>=25.5.0 to >=25.5.0,<26.7 ([#3677](#3677)).

JCZuurmond mentioned this issue Jan 13, 2025

[TRACKER]: CLI command for migrating jobs and folders of files/notebooks #1398

Open

3 tasks

github-project-automation bot added this to UCX Jan 13, 2025

github-project-automation bot moved this to Todo in UCX Jan 13, 2025

JCZuurmond added migrate/code Abstract Syntax Trees and other dark magic migrate/python Pull requests that update Python code labels Jan 13, 2025

JCZuurmond changed the title ~~Add CLI command to migrate a folder of files/notebooks on local FS~~ [FEATURE] Add CLI command to migrate a folder of files/notebooks on local file system Jan 13, 2025

JCZuurmond changed the title ~~[FEATURE] Add CLI command to migrate a folder of files/notebooks on local file system~~ [FEATURE] Add CLI command to migrate a folder of Python files and notebooks on local file system Jan 13, 2025

JCZuurmond mentioned this issue Jan 13, 2025

Added initial version of databricks labs ucx migrate-local-code command #1067

Merged

JCZuurmond self-assigned this Jan 13, 2025

JCZuurmond moved this from Todo to In Progress in UCX Jan 13, 2025

This was referenced Jan 14, 2025

Fix cli command to migrate local folder #3520

Closed

Rename Python AST's Tree methods for clarity #3524

Merged

Delete stale code: NotebookLinter._load_source_from_run_cell #3529

Merged

Remove tree from PythonSequentialLinter #3535

Merged

gueniai mentioned this issue Jan 21, 2025

Release v0.54.0 #3530

Merged

gueniai mentioned this issue Jan 23, 2025

Release v0.54.0 #3570

Merged

This was referenced Jan 29, 2025

Move graph walkers to separate module #3581

Merged

Make fixer diagnostic codes unique #3582

Merged

Align DependencyProblem with LocatedAdvice #3583

Merged

Let LocalFileLinter reuse the Dependency instead falling back on its path #3586

Closed

JCZuurmond mentioned this issue Feb 5, 2025

[TECH DEBT] Let LocalFileLinter reuse the Dependency instead falling back on its path #3641

Merged

7 tasks

JCZuurmond mentioned this issue Feb 10, 2025

Support fixing LocalFile's with FileLinter #3660

Merged

4 tasks

This was referenced Feb 14, 2025

Fix notebook sources with NotebookLinter.apply #3693

Merged

Update migate-local-code to use latest linter functionality #3700

Merged

JCZuurmond closed this as completed in #3700 Feb 19, 2025

github-project-automation bot moved this from In Progress to Done in UCX Feb 19, 2025

gueniai mentioned this issue Feb 25, 2025

Release v0.56.0 #3680

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add CLI command to migrate a folder of Python files and notebooks on local file system #3514

[FEATURE] Add CLI command to migrate a folder of Python files and notebooks on local file system #3514

JCZuurmond commented Jan 13, 2025 •

edited

Loading

[FEATURE] Add CLI command to migrate a folder of Python files and notebooks on local file system #3514

[FEATURE] Add CLI command to migrate a folder of Python files and notebooks on local file system #3514

Comments

JCZuurmond commented Jan 13, 2025 • edited Loading

Description

JCZuurmond commented Jan 13, 2025 •

edited

Loading