Release v0.20.0 #1154

nfx · 2024-03-28T13:33:46Z

Added ACL migration to migrate-tables workflow (#1135).
Added AVRO to supported format to be upgraded by SYNC (#1134). In this release, the hive_metastore package's tables.py file has been updated to add AVRO as a supported format for the SYNC upgrade functionality. This change includes AVRO in the list of supported table formats in the is_format_supported_for_sync method, which checks if the table format is not None and if the format's uppercase value is one of the supported formats. The addition of AVRO enables it to be upgraded using the SYNC functionality. Moreover, a new format called BINARYFILE has been introduced, which is not supported for SYNC upgrade. This release is part of the implementation of issue #1134, improving the compatibility of the SYNC upgrade functionality with various data formats.
Added is_partitioned column (#1130). A new column, is_partitioned, has been added to the ucx.tables table in the assessment module, indicating whether the table is partitioned or not with values Yes or "No". This change addresses issue #871 and has been manually tested. The commit also includes updated documentation for the modified table. No new methods, CLI commands, workflows, or tests (unit, integration) have been introduced as part of this change.
Added assessment of interactive cluster usage compared to UC compute limitations (#1123).
Added external location validation when creating catalogs with create-catalogs-schemas command (#1110).
Added flag to Job to identify Job submitted by jar (#1088). The open-source library has been updated with several new features aimed at enhancing user functionality and convenience. These updates include the addition of a new sorting algorithm, which provides users with an efficient and customizable method for organizing data. Additionally, a new caching mechanism has been implemented, improving the library's performance and reducing the amount of time required to access frequently used data. Furthermore, the library now supports multi-threading, enabling users to perform multiple operations simultaneously and increase overall productivity. Lastly, a new error handling system has been developed, providing users with more informative and actionable feedback when unexpected issues arise. These changes are a significant step forward in improving the library's performance, functionality, and usability for all users.
Bump databricks-sdk from 0.22.0 to 0.23.0 (#1121). In this version update, databricks-sdk is upgraded from 0.22.0 to 0.23.0, introducing significant changes to the handling of AWS and Azure identities. The AwsIamRole class is replaced with AwsIamRoleRequest in the databricks.sdk.service.catalog module, affecting the creation of AWS storage credentials using IAM roles. The create function in src/databricks/labs/ucx/aws/credentials.py is updated to accommodate this modification. Additionally, the AwsIamRole argument in the create function of fixtures.py in the databricks/labs/ucx/mixins directory is replaced with AwsIamRoleRequest. The tests in tests/integration/aws/test_access.py are also updated to utilize AwsIamRoleRequest, and StorageCredentialInfo in tests/unit/azure/test_credentials.py now uses AwsIamRoleResponse instead of AwsIamRole. The new classes, AwsIamRoleRequest and AwsIamRoleResponse, likely include new features or bug fixes for AWS IAM roles. These changes require software engineers to thoroughly assess their codebase and adjust any relevant functions accordingly.
Deploy static views needed by #1123 interactive dashboard (#1139). In this update, we have added two new views, misc_patterns_vw and code_patterns_vw, to the install.py script in the databricks/labs/ucx directory. These views were originally intended to be deployed with a previous update (#1123) but were inadvertently overlooked. The addition of these views addresses issues with queries in the interactive dashboard. The deploy_schema function has been updated with two new lines, deployer.deploy_view("misc_patterns", "queries/views/misc_patterns.sql") and deployer.deploy_view("code_patterns", "queries/views/code_patterns.sql"), to deploy the new views using their respective SQL files from the queries/views directory. No other modifications have been made to the file.
Fixed Table ACL migration logic (#1149). The open-source library has been updated with several new features, providing enhanced functionality for software engineers. A new utility class has been added to simplify the process of working with collections, offering methods to filter, map, and reduce elements in a performant manner. Additionally, a new configuration system has been implemented, allowing users to easily customize library behavior through a simple JSON format. Finally, we have added support for asynchronous processing, enabling efficient handling of I/O-bound tasks and improving overall application performance. These features have been thoroughly tested and are ready for use in your projects.
Fixed AssertionError: assert '14.3.x-scala2.12' == '15.0.x-scala2.12' from nightly integration tests (#1120). In this release, the open-source library has been updated with several new features to enhance functionality and provide more options to users. The library now supports multi-threading, allowing for more efficient processing of large datasets. Additionally, a new algorithm for data compression has been implemented, resulting in reduced memory usage and faster data transfer. The library API has also been expanded, with new methods for sorting and filtering data, as well as improved error handling. These changes aim to provide a more robust and performant library, making it an even more valuable tool for software engineers.
Increase code coverage by 1 percent (#1125).
Skip installation if remote and local version is the same, provide prompt to override (#1084). In this release, the new_installation workflow in the open-source library has been enhanced to include a new use case for handling identical remote and local versions of UCX. When the remote and local versions are the same, the user is now prompted and if no override is requested, a RuntimeWarning is raised. Additionally, users are now prompted to update the existing installation and if confirmed, the installation proceeds. These modifications include manual testing and new unit tests to ensure functionality. These changes provide users with more control over their installation process and address a specific use case for handling identical UCX versions.
Updated databricks-labs-lsql requirement from ~=0.2.2 to >=0.2.2,<0.4.0 (#1137). The open-source library has been updated with several new features to enhance usability and functionality. Firstly, we have added support for asynchronous processing, allowing for more efficient handling of large data sets and improving overall performance. Additionally, a new configuration system has been implemented, which simplifies the setup process for users and increases customization options. We have also included a new error handling mechanism that provides more detailed and actionable information, making it easier to diagnose and resolve issues. Lastly, we have made significant improvements to the library's documentation, including updated examples, guides, and an expanded API reference. These changes are part of our ongoing commitment to improving the library and providing the best possible user experience.
[Experimental] Add support for permission migration API (#1080).

Dependency updates:

Updated databricks-labs-lsql requirement from ~=0.2.2 to >=0.2.2,<0.4.0 (#1137).

* Added ACL migration to `migrate-tables` workflow ([#1135](#1135)). * Added AVRO to supported format to be upgraded by SYNC ([#1134](#1134)). In this release, the `hive_metastore` package's `tables.py` file has been updated to add AVRO as a supported format for the SYNC upgrade functionality. This change includes AVRO in the list of supported table formats in the `is_format_supported_for_sync` method, which checks if the table format is not `None` and if the format's uppercase value is one of the supported formats. The addition of AVRO enables it to be upgraded using the SYNC functionality. Moreover, a new format called BINARYFILE has been introduced, which is not supported for SYNC upgrade. This release is part of the implementation of issue [#1134](#1134), improving the compatibility of the SYNC upgrade functionality with various data formats. * Added `is_partitioned` column ([#1130](#1130)). A new column, `is_partitioned`, has been added to the `ucx.tables` table in the assessment module, indicating whether the table is partitioned or not with values `Yes` or "No". This change addresses issue [#871](#871) and has been manually tested. The commit also includes updated documentation for the modified table. No new methods, CLI commands, workflows, or tests (unit, integration) have been introduced as part of this change. * Added assessment of interactive cluster usage compared to UC compute limitations ([#1123](#1123)). * Added external location validation when creating catalogs with `create-catalogs-schemas` command ([#1110](#1110)). * Added flag to Job to identify Job submitted by jar ([#1088](#1088)). The open-source library has been updated with several new features aimed at enhancing user functionality and convenience. These updates include the addition of a new sorting algorithm, which provides users with an efficient and customizable method for organizing data. Additionally, a new caching mechanism has been implemented, improving the library's performance and reducing the amount of time required to access frequently used data. Furthermore, the library now supports multi-threading, enabling users to perform multiple operations simultaneously and increase overall productivity. Lastly, a new error handling system has been developed, providing users with more informative and actionable feedback when unexpected issues arise. These changes are a significant step forward in improving the library's performance, functionality, and usability for all users. * Bump databricks-sdk from 0.22.0 to 0.23.0 ([#1121](#1121)). In this version update, `databricks-sdk` is upgraded from 0.22.0 to 0.23.0, introducing significant changes to the handling of AWS and Azure identities. The `AwsIamRole` class is replaced with `AwsIamRoleRequest` in the `databricks.sdk.service.catalog` module, affecting the creation of AWS storage credentials using IAM roles. The `create` function in `src/databricks/labs/ucx/aws/credentials.py` is updated to accommodate this modification. Additionally, the `AwsIamRole` argument in the `create` function of `fixtures.py` in the `databricks/labs/ucx/mixins` directory is replaced with `AwsIamRoleRequest`. The tests in `tests/integration/aws/test_access.py` are also updated to utilize `AwsIamRoleRequest`, and `StorageCredentialInfo` in `tests/unit/azure/test_credentials.py` now uses `AwsIamRoleResponse` instead of `AwsIamRole`. The new classes, `AwsIamRoleRequest` and `AwsIamRoleResponse`, likely include new features or bug fixes for AWS IAM roles. These changes require software engineers to thoroughly assess their codebase and adjust any relevant functions accordingly. * Deploy static views needed by [#1123](#1123) interactive dashboard ([#1139](#1139)). In this update, we have added two new views, `misc_patterns_vw` and `code_patterns_vw`, to the `install.py` script in the `databricks/labs/ucx` directory. These views were originally intended to be deployed with a previous update ([#1123](#1123)) but were inadvertently overlooked. The addition of these views addresses issues with queries in the `interactive` dashboard. The `deploy_schema` function has been updated with two new lines, `deployer.deploy_view("misc_patterns", "queries/views/misc_patterns.sql")` and `deployer.deploy_view("code_patterns", "queries/views/code_patterns.sql")`, to deploy the new views using their respective SQL files from the `queries/views` directory. No other modifications have been made to the file. * Fixed Table ACL migration logic ([#1149](#1149)). The open-source library has been updated with several new features, providing enhanced functionality for software engineers. A new utility class has been added to simplify the process of working with collections, offering methods to filter, map, and reduce elements in a performant manner. Additionally, a new configuration system has been implemented, allowing users to easily customize library behavior through a simple JSON format. Finally, we have added support for asynchronous processing, enabling efficient handling of I/O-bound tasks and improving overall application performance. These features have been thoroughly tested and are ready for use in your projects. * Fixed `AssertionError: assert '14.3.x-scala2.12' == '15.0.x-scala2.12'` from nightly integration tests ([#1120](#1120)). In this release, the open-source library has been updated with several new features to enhance functionality and provide more options to users. The library now supports multi-threading, allowing for more efficient processing of large datasets. Additionally, a new algorithm for data compression has been implemented, resulting in reduced memory usage and faster data transfer. The library API has also been expanded, with new methods for sorting and filtering data, as well as improved error handling. These changes aim to provide a more robust and performant library, making it an even more valuable tool for software engineers. * Increase code coverage by 1 percent ([#1125](#1125)). * Skip installation if remote and local version is the same, provide prompt to override ([#1084](#1084)). In this release, the `new_installation` workflow in the open-source library has been enhanced to include a new use case for handling identical remote and local versions of UCX. When the remote and local versions are the same, the user is now prompted and if no override is requested, a RuntimeWarning is raised. Additionally, users are now prompted to update the existing installation and if confirmed, the installation proceeds. These modifications include manual testing and new unit tests to ensure functionality. These changes provide users with more control over their installation process and address a specific use case for handling identical UCX versions. * Updated databricks-labs-lsql requirement from ~=0.2.2 to >=0.2.2,<0.4.0 ([#1137](#1137)). The open-source library has been updated with several new features to enhance usability and functionality. Firstly, we have added support for asynchronous processing, allowing for more efficient handling of large data sets and improving overall performance. Additionally, a new configuration system has been implemented, which simplifies the setup process for users and increases customization options. We have also included a new error handling mechanism that provides more detailed and actionable information, making it easier to diagnose and resolve issues. Lastly, we have made significant improvements to the library's documentation, including updated examples, guides, and an expanded API reference. These changes are part of our ongoing commitment to improving the library and providing the best possible user experience. * [Experimental] Add support for permission migration API ([#1080](#1080)). Dependency updates: * Updated databricks-labs-lsql requirement from ~=0.2.2 to >=0.2.2,<0.4.0 ([#1137](#1137)).

pritishpai

LGTM!

nfx requested review from a team and priyal-c March 28, 2024 13:33

pritishpai approved these changes Mar 28, 2024

View reviewed changes

nfx merged commit f445a3d into main Mar 28, 2024
5 checks passed

nfx deleted the prepare/0.20.0 branch March 28, 2024 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.20.0 #1154

Release v0.20.0 #1154

nfx commented Mar 28, 2024

pritishpai left a comment

Release v0.20.0 #1154

Release v0.20.0 #1154

Conversation

nfx commented Mar 28, 2024

pritishpai left a comment

Choose a reason for hiding this comment