Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: unskip command does not work with schemas/databases #3494

Closed
1 task done
maruppel opened this issue Jan 7, 2025 · 1 comment · Fixed by #3567
Closed
1 task done

[BUG]: unskip command does not work with schemas/databases #3494

maruppel opened this issue Jan 7, 2025 · 1 comment · Fixed by #3567

Comments

@maruppel
Copy link

maruppel commented Jan 7, 2025

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When attempting to run unskip on a schema, the following error is shown:

"[REDACTED]databricks\labs\ucx\lib\src\databricks\labs\ucx\hive_metastore\mapping.py", line 192, in unskip_schema
    self._sql_backend.execute(
    ~~~~~~~~~~~~~~~~~~~~~~~~~^
        f"ALTER SCHEMA hive_metastore.{escape_sql_identifier(schema)} UNSET DBPROPERTIES IF EXISTS('{self.UCX_SKIP_PROPERTY}');"
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "[REDACTED].databricks\labs\ucx\state\venv\Lib\site-packages\databricks\sdk\retries.py", line 54, in wrapper
    raise err
  File "[REDACTED].databricks\labs\ucx\state\venv\Lib\site-packages\databricks\sdk\retries.py", line 33, in wrapper
    return func(*args, **kwargs)
  File "[REDACTED].databricks\labs\ucx\state\venv\Lib\site-packages\databricks\labs\lsql\backends.py", line 220, in execute
    self._sql.execute(sql, catalog=catalog, schema=schema)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[REDACTED].databricks\labs\ucx\state\venv\Lib\site-packages\databricks\labs\lsql\core.py", line 268, in execute
    self._raise_if_needed(status)
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "[REDACTED].databricks\labs\ucx\state\venv\Lib\site-packages\databricks\labs\lsql\core.py", line 505, in _raise_if_needed
    raise error_class(error_message)
databricks.sdk.errors.platform.BadRequest:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'UNSET'. SQLSTATE: 42601 (line 1, pos 38)
 
== SQL ==
ALTER SCHEMA hive_metastore.`[REDACTED]` UNSET DBPROPERTIES IF EXISTS('databricks.labs.ucx.skip');`

Expected Behavior

The unskip command removes the databricks.labs.ucx.skip property from the schema.

Steps To Reproduce

No response

Cloud

Azure

Operating System

Windows

Version

latest via Databricks CLI

Relevant log output

`"[REDACTED]databricks\labs\ucx\lib\src\databricks\labs\ucx\hive_metastore\mapping.py", line 192, in unskip_schema
    self._sql_backend.execute(
    ~~~~~~~~~~~~~~~~~~~~~~~~~^
        f"ALTER SCHEMA hive_metastore.{escape_sql_identifier(schema)} UNSET DBPROPERTIES IF EXISTS('{self.UCX_SKIP_PROPERTY}');"
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "[REDACTED].databricks\labs\ucx\state\venv\Lib\site-packages\databricks\sdk\retries.py", line 54, in wrapper
    raise err
  File "[REDACTED].databricks\labs\ucx\state\venv\Lib\site-packages\databricks\sdk\retries.py", line 33, in wrapper
    return func(*args, **kwargs)
  File "[REDACTED].databricks\labs\ucx\state\venv\Lib\site-packages\databricks\labs\lsql\backends.py", line 220, in execute
    self._sql.execute(sql, catalog=catalog, schema=schema)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[REDACTED].databricks\labs\ucx\state\venv\Lib\site-packages\databricks\labs\lsql\core.py", line 268, in execute
    self._raise_if_needed(status)
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "[REDACTED].databricks\labs\ucx\state\venv\Lib\site-packages\databricks\labs\lsql\core.py", line 505, in _raise_if_needed
    raise error_class(error_message)
databricks.sdk.errors.platform.BadRequest:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'UNSET'. SQLSTATE: 42601 (line 1, pos 38)
 
== SQL ==
ALTER SCHEMA hive_metastore.`[REDACTED]` UNSET DBPROPERTIES IF EXISTS('databricks.labs.ucx.skip');`
@JoelMorton
Copy link

JoelMorton commented Jan 8, 2025

Adding related info if helpful. migrate-tables only checks for the existence of property databricks.labs.ucx.skip. Even if this property is set to false, a schema will be skipped in the migrate-tables job.

migrate-tables job skips "uc_test_2" schema:
INFO [d.l.u.hive_metastore.mapping][checking_databases_for_skip_property_4] Database uc_test_2 is marked to be skipped

That schema's property is set to false:
describe schema extended uc_test_2
((databricks.labs.ucx.skip,false))

@gueniai gueniai removed the bug label Jan 23, 2025
@github-project-automation github-project-automation bot moved this to Todo in UCX Jan 23, 2025
github-merge-queue bot pushed a commit that referenced this issue Jan 27, 2025
@github-project-automation github-project-automation bot moved this from Todo to Done in UCX Jan 27, 2025
gueniai added a commit that referenced this issue Feb 12, 2025
* Added documentation to use Delta Live Tables migration ([#3587](#3587)). In this release, we have extended the migration process for Delta Live Tables (DLT) to Unity Catalog (UC) in Databricks. After table migration, users can now perform a Delta Live Table pipeline migration process, which clones Hive Metastore DLT pipelines to the Unity Catalog. The cloned pipeline will copy over all the data and checkpoints during the first update and run normally thereafter. While migration is in progress, maintenance is automatically paused for both pipelines. Users should consider several factors, such as ensuring names are fully qualified and not editing pipeline-defining notebooks during cloning. Additionally, we have added documentation for using Delta Live Tables migration, including a detailed description, usage, and a link to issue [#2065](#2065). The new pipeline migration commands, such as `migrate-dlt-pipelines`, allow developers and administrators to migrate specific Delta Live Tables pipelines, specifying a list of pipeline IDs to include or exclude, providing greater flexibility during the migration process.
* Added the CLI command for migrating DLT pipelines ([#3579](#3579)). A new CLI command, "migrate-pipelines", has been added to facilitate the migration of DLT pipelines from HMS to UC using the DLT Migration API. This command includes optional flags `include-pipeline-ids` and `exclude-pipeline-ids` to specify which pipeline IDs to include or exclude from the migration. The associated functionality is implemented in the newly introduced `PipelinesMigrator` class, which takes in `include_pipeline_ids` and `exclude_pipeline_ids` as arguments. The changes also include an update to the `table_ownership_grant_loader` method, although it is unrelated to the new CLI command. While the new command has been manually tested, unit tests, integration tests, and verification on the staging environment are yet to be completed.
* Addressed Bug with Dashboard migration ([#3663](#3663)). The recent update addresses a bug in the migration of Dashboards by modifying the behavior of the WorkspaceClient object in the "tests/unit/assessment/test_dashboards.py" file. The bug, which affected the RedashDashboardCrawler's ability to snapshot and manage dashboard data, was caused by the lack of an implementation for the `dashboards.get` method in the mocked `dashboards.list` method. This issue has been resolved by adding a mocked implementation of the `dashboards.get` method, which returns a dashboard object with the specified ID. Additionally, the `_crawl` method now iterates through each dashboard and filters out those with no ID, and the `_list_dashboards` method checks if the dashboard ID is None, fetches the dashboard details if it is not, and appends them to the dashboards list. This ensures that the migration process proceeds smoothly and that only dashboards with valid IDs and details are included in the final list.
* Extend code migration progress documentation ([#3588](#3588)). In this documentation update, we have added two new sections, `Code Migration` and "Final Details," to provide detailed instructions for the code migration process. The `Code Migration` section outlines the recommended approach for migrating code, including using the migration-progress dashboard, migrate- commands, and setting the default catalog to Unity Catalog. Users are advised to use the linter to identify compatibility issues before migrating the code. The `Final Details` section includes instructions for running the cluster-remap command to remap clusters to be UC compatible. Additionally, we have updated the link for code migration to the new section and renumbered the steps in the migration process. The documentation for UCX commands has been updated to provide more detailed information about code migration, including starting the migration process with linting, using dashboards, and employing local commands. A new section for finalizing the migration has been added, and the commands `lint-local-code`, `migrate-local-code`, `migrate-dbsql-dashboards`, and `revert-dbsql-dashboards` have been added to facilitate the code migration process. This update resolves issue [#2231](#2231) and includes relevant user documentation.
* Fixed Skip/Unskip schema functionality ([#3567](#3567)). In this release, we have addressed issue [#3494](#3494) related to the Skip/Unskip schema functionality in the Hive Metastore. The commit with ID [#3567](#3567) modifies the `mapping.py` and `test_mapping.py` files to ensure that schemas are skipped and unskipped correctly during the migration process. In the `skip_schema` method, the schema is now explicitly set to be skipped by applying the `UCX_SKIP_PROPERTY` table property to the `hive_metastore` schema. The `unskip_schema` method removes the `UCX_SKIP_PROPERTY` table property from the `hive_metastore` schema if it exists. The `_get_database_in_scope_task` and `_get_table_in_scope_task` methods have also been updated to correctly check the `UCX_SKIP_PROPERTY` value when determining whether to include a database or table in the migration process. Additionally, tests for the CLI have been updated to ensure that the `ALTER SCHEMA` command works correctly with fully qualified schema names. The `test_skip_with_schema` function now includes the catalog and schema names for the `ALTER SCHEMA` command, while the `test_unskip_with_schema` function uses `SET DBPROPERTIES` instead of `UNSET DBPROPERTIES IF EXISTS`. These changes improve the reliability and predictability of the Skip/Unskip schema functionality in the Hive Metastore.
* Make `MaybeTree` the main Python AST entrypoint for constructing the syntax tree ([#3550](#3550)). In this commit, the `MaybeTree` class has become the main entrypoint for constructing the Python AST syntax tree, replacing the previous usage of the `Tree` class. This change enforces normalization of source code before parsing and resolves issues [#3457](#3457) and [#3213](#3213). Relevant class methods have been moved from `Tree` to `MaybeTree`, which now includes a `from_source_code` method for normalizing and parsing source code. The `walk` and `first_statement` methods have been removed from `MaybeTree` as they were repetitions from `Tree`'s methods. Python linting related code has been modified, and unit tests have been added to ensure proper functionality of the new `MaybeTree` class and its methods.
* Make fixer diagnostic codes unique ([#3582](#3582)). In this release, fixer diagnostic codes have been made unique for the `databricks labs ucx migrate-local-code` command to ensure the correct fixer is used for code migration and fixing. This change impacts the command by adding new diagnostic codes for migrated tables, specifically for Python, SQL, and SQL-related migrations, and updating the existing command to use these new codes. Additionally, unit tests and integration tests have been added and modified to cover the new functionality, and manual testing has been performed to ensure correct behavior.
* Removed the linting false positive for missing table format warning when using `spark.table` ([#3589](#3589)). This change resolves a linting false positive issue related to missing table format warnings when using the `spark.table` command, specifically addressing a backwards incompatible change in DBR version 8.0 where table-creation with implicit format is no longer supported. The linter has been updated to include more precise match for method names and unit tests have been modified accordingly. This change ensures that the linter correctly identifies the absence of the table format warning in specific scenarios involving `spark.table`, without affecting the core functionalities of the system.
* Removed tree from `PythonSequentialLinter` ([#3535](#3535)). In this release, the tree manipulation functionality has been removed from the `PythonSequentialLinter` and added to the `NotebookLinter`. The `PythonSequentialLinter` now focuses solely on sequential linting, while the `NotebookLinter` handles tree manipulation and management for notebooks. This includes early failure on code resolution failure, attachment of `%run` notebook trees as child trees to the calling cell, and modifications to the existing `databricks labs ucx lint-local-code` command. Additionally, new type hints, a `dataclass` named `Advice`, and new and modified unit tests have been included. These changes improve the separation of concerns between linters and tree manipulation, making the codebase easier to understand and maintain.
* Updated sqlglot requirement from <26.3,>=25.5.0 to >=25.5.0,<26.4 ([#3572](#3572)). In this release, we have updated the required version range of the sqlglot library in the pyproject.toml file. The new version range (greater than or equal to 25.5.0 and less than 26.4) allows for the installation of the latest version of sqlglot while avoiding potential compatibility issues that may arise with versions 26.4 and above. This update includes several bug fixes and improvements, such as changes to how single VALUES clauses in CTEs are expanded and how LEVEL columns in CONNECT BY queries are treated. The API documentation and CHANGELOG.md file for various versions of sqlglot have also been updated as part of this pull request. We recommend all users to update to this version to take advantage of the latest features and improvements.

Dependency updates:

 * Updated sqlglot requirement from <26.3,>=25.5.0 to >=25.5.0,<26.4 ([#3572](#3572)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants