Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase code coverage by 1 percent #1125

Merged

Conversation

ericvergnaud
Copy link
Contributor

@ericvergnaud ericvergnaud commented Mar 26, 2024

Changes

added tests to improve unit tests code coverage by 1%, from 91% to 92%

Linked issues

None

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs ucx ...
  • added a new workflow
  • modified existing workflow: ...
  • added a new table
  • modified existing table: ...

Tests

  • manually tested
  • added unit tests
  • added integration tests
  • verified on staging environment (screenshot attached)

@ericvergnaud ericvergnaud requested review from a team and stikkireddy March 26, 2024 15:58
@CLAassistant
Copy link

CLAassistant commented Mar 26, 2024

CLA assistant check
All committers have signed the CLA.

@ericvergnaud
Copy link
Contributor Author

I bumped into some issues while adding these unit tests, and created corresponding tickets, see #1126 and #1127

Copy link

codecov bot commented Mar 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.68%. Comparing base (eb50f9a) to head (6907b9d).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1125      +/-   ##
==========================================
+ Coverage   88.83%   89.68%   +0.85%     
==========================================
  Files          65       61       -4     
  Lines        7174     7166       -8     
  Branches     1290     1290              
==========================================
+ Hits         6373     6427      +54     
+ Misses        534      475      -59     
+ Partials      267      264       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good start

# until I get clarifications on what would be done to run it faster
# created GH issue #1127
# def test_files_for_cli():
# ws = workspace_client_mock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ws = create_autospec(WorkspaceClient)
    ws.statement_execution.execute_statement.return_value = ExecuteStatementResponse(
        status=StatementStatus(state=StatementState.SUCCEEDED)
    )

should fix it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't for me... I suggest leaving as is for now since we have a ticket to address this ?

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented code & run make fmt, and good to go.

p.s. i really wonder why any of the static analysis tools didn't detect it.

# TODO the below is unmanageably slow when ran locally, so disabling for now
# until I get clarifications on what would be done to run it faster
# created GH issue #1127
# def test_files_for_cli():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove commented-out code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've marked these tests as skipped so we don't have to rewrite them, only fix them



# TODO crawl_permissions fails, filed GH issue #1129
# def test_runtime_crawl_permissions(mocker):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove the commented-out code. i wonder if pylint can detect it automatically.

# TODO can't debug this because pydevd_console_integration.py imports stuff from 'code' (Python builtins) which conflicts with databricks/labs/ucx/code
# (we probably want to rename our 'code' to 'source' or 'source_code', created GH issue #1126)
# def test_runtime_delete_backup_groups(mocker):
# with patch.dict(os.environ, {"DATABRICKS_RUNTIME_VERSION": "14.0"}):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove commented code

@ericvergnaud
Copy link
Contributor Author

local make fmt doesn't suggest any change, maybe there's something wrong with my install...

@ericvergnaud
Copy link
Contributor Author

I'm also seeing a discrepancy where locally I get:
image
But when I remove it, the CI fails due to that line of code... restoring it

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small change to fix the no-cheat check

def test_files_fix_ignores_unsupported_language():
languages = create_autospec(Languages)
files = Files(languages)
# pylint: disable=protected-access
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# pylint: disable=protected-access

this fails no-lint-disabled check:
image
image

left a suggestion on how to achieve the same intent, but without compromising access "guards".

P.S. acceptance would run once you're onboarded and part of databrickslabs gh org

Copy link
Contributor Author

@ericvergnaud ericvergnaud Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don't put this line, then make fmt fails. Is that preferable ?

files = Files(languages)
# pylint: disable=protected-access
files._extensions[".py"] = None
path = Path('unsupported.py')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
path = Path('unsupported.py')
path = Path('unsupported.unsupported')

Copy link
Contributor Author

@ericvergnaud ericvergnaud Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would fall under unsupported extension (already tested line 12), not unsupported language

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(that said not sure it makes sense to support extensions without a language, but that's a separate conversation)

@nfx
Copy link
Collaborator

nfx commented Mar 27, 2024

ok to merge

@nfx nfx merged commit 32e3bda into databrickslabs:main Mar 27, 2024
6 of 8 checks passed
@ericvergnaud ericvergnaud deleted the increase-code-coverage-by-1-percent branch March 27, 2024 13:20
nfx added a commit that referenced this pull request Mar 28, 2024
* Added ACL migration to `migrate-tables` workflow ([#1135](#1135)).
* Added AVRO to supported format to be upgraded by SYNC ([#1134](#1134)). In this release, the `hive_metastore` package's `tables.py` file has been updated to add AVRO as a supported format for the SYNC upgrade functionality. This change includes AVRO in the list of supported table formats in the `is_format_supported_for_sync` method, which checks if the table format is not `None` and if the format's uppercase value is one of the supported formats. The addition of AVRO enables it to be upgraded using the SYNC functionality. Moreover, a new format called BINARYFILE has been introduced, which is not supported for SYNC upgrade. This release is part of the implementation of issue [#1134](#1134), improving the compatibility of the SYNC upgrade functionality with various data formats.
* Added `is_partitioned` column ([#1130](#1130)). A new column, `is_partitioned`, has been added to the `ucx.tables` table in the assessment module, indicating whether the table is partitioned or not with values `Yes` or "No". This change addresses issue [#871](#871) and has been manually tested. The commit also includes updated documentation for the modified table. No new methods, CLI commands, workflows, or tests (unit, integration) have been introduced as part of this change.
* Added assessment of interactive cluster usage compared to UC compute limitations ([#1123](#1123)).
* Added external location validation when creating catalogs with `create-catalogs-schemas` command ([#1110](#1110)).
* Added flag to Job to identify Job submitted by jar ([#1088](#1088)). The open-source library has been updated with several new features aimed at enhancing user functionality and convenience. These updates include the addition of a new sorting algorithm, which provides users with an efficient and customizable method for organizing data. Additionally, a new caching mechanism has been implemented, improving the library's performance and reducing the amount of time required to access frequently used data. Furthermore, the library now supports multi-threading, enabling users to perform multiple operations simultaneously and increase overall productivity. Lastly, a new error handling system has been developed, providing users with more informative and actionable feedback when unexpected issues arise. These changes are a significant step forward in improving the library's performance, functionality, and usability for all users.
* Bump databricks-sdk from 0.22.0 to 0.23.0 ([#1121](#1121)). In this version update, `databricks-sdk` is upgraded from 0.22.0 to 0.23.0, introducing significant changes to the handling of AWS and Azure identities. The `AwsIamRole` class is replaced with `AwsIamRoleRequest` in the `databricks.sdk.service.catalog` module, affecting the creation of AWS storage credentials using IAM roles. The `create` function in `src/databricks/labs/ucx/aws/credentials.py` is updated to accommodate this modification. Additionally, the `AwsIamRole` argument in the `create` function of `fixtures.py` in the `databricks/labs/ucx/mixins` directory is replaced with `AwsIamRoleRequest`. The tests in `tests/integration/aws/test_access.py` are also updated to utilize `AwsIamRoleRequest`, and `StorageCredentialInfo` in `tests/unit/azure/test_credentials.py` now uses `AwsIamRoleResponse` instead of `AwsIamRole`. The new classes, `AwsIamRoleRequest` and `AwsIamRoleResponse`, likely include new features or bug fixes for AWS IAM roles. These changes require software engineers to thoroughly assess their codebase and adjust any relevant functions accordingly.
* Deploy static views needed by [#1123](#1123) interactive dashboard ([#1139](#1139)). In this update, we have added two new views, `misc_patterns_vw` and `code_patterns_vw`, to the `install.py` script in the `databricks/labs/ucx` directory. These views were originally intended to be deployed with a previous update ([#1123](#1123)) but were inadvertently overlooked. The addition of these views addresses issues with queries in the `interactive` dashboard. The `deploy_schema` function has been updated with two new lines, `deployer.deploy_view("misc_patterns", "queries/views/misc_patterns.sql")` and `deployer.deploy_view("code_patterns", "queries/views/code_patterns.sql")`, to deploy the new views using their respective SQL files from the `queries/views` directory. No other modifications have been made to the file.
* Fixed Table ACL migration logic ([#1149](#1149)). The open-source library has been updated with several new features, providing enhanced functionality for software engineers. A new utility class has been added to simplify the process of working with collections, offering methods to filter, map, and reduce elements in a performant manner. Additionally, a new configuration system has been implemented, allowing users to easily customize library behavior through a simple JSON format. Finally, we have added support for asynchronous processing, enabling efficient handling of I/O-bound tasks and improving overall application performance. These features have been thoroughly tested and are ready for use in your projects.
* Fixed `AssertionError: assert '14.3.x-scala2.12' == '15.0.x-scala2.12'` from nightly integration tests ([#1120](#1120)). In this release, the open-source library has been updated with several new features to enhance functionality and provide more options to users. The library now supports multi-threading, allowing for more efficient processing of large datasets. Additionally, a new algorithm for data compression has been implemented, resulting in reduced memory usage and faster data transfer. The library API has also been expanded, with new methods for sorting and filtering data, as well as improved error handling. These changes aim to provide a more robust and performant library, making it an even more valuable tool for software engineers.
* Increase code coverage by 1 percent ([#1125](#1125)).
* Skip installation if remote and local version is the same, provide prompt to override ([#1084](#1084)). In this release, the `new_installation` workflow in the open-source library has been enhanced to include a new use case for handling identical remote and local versions of UCX. When the remote and local versions are the same, the user is now prompted and if no override is requested, a RuntimeWarning is raised. Additionally, users are now prompted to update the existing installation and if confirmed, the installation proceeds. These modifications include manual testing and new unit tests to ensure functionality. These changes provide users with more control over their installation process and address a specific use case for handling identical UCX versions.
* Updated databricks-labs-lsql requirement from ~=0.2.2 to >=0.2.2,<0.4.0 ([#1137](#1137)). The open-source library has been updated with several new features to enhance usability and functionality. Firstly, we have added support for asynchronous processing, allowing for more efficient handling of large data sets and improving overall performance. Additionally, a new configuration system has been implemented, which simplifies the setup process for users and increases customization options. We have also included a new error handling mechanism that provides more detailed and actionable information, making it easier to diagnose and resolve issues. Lastly, we have made significant improvements to the library's documentation, including updated examples, guides, and an expanded API reference. These changes are part of our ongoing commitment to improving the library and providing the best possible user experience.
* [Experimental] Add support for permission migration API ([#1080](#1080)).

Dependency updates:

 * Updated databricks-labs-lsql requirement from ~=0.2.2 to >=0.2.2,<0.4.0 ([#1137](#1137)).
@nfx nfx mentioned this pull request Mar 28, 2024
nfx added a commit that referenced this pull request Mar 28, 2024
* Added ACL migration to `migrate-tables` workflow
([#1135](#1135)).
* Added AVRO to supported format to be upgraded by SYNC
([#1134](#1134)). In this
release, the `hive_metastore` package's `tables.py` file has been
updated to add AVRO as a supported format for the SYNC upgrade
functionality. This change includes AVRO in the list of supported table
formats in the `is_format_supported_for_sync` method, which checks if
the table format is not `None` and if the format's uppercase value is
one of the supported formats. The addition of AVRO enables it to be
upgraded using the SYNC functionality. Moreover, a new format called
BINARYFILE has been introduced, which is not supported for SYNC upgrade.
This release is part of the implementation of issue
[#1134](#1134), improving
the compatibility of the SYNC upgrade functionality with various data
formats.
* Added `is_partitioned` column
([#1130](#1130)). A new
column, `is_partitioned`, has been added to the `ucx.tables` table in
the assessment module, indicating whether the table is partitioned or
not with values `Yes` or "No". This change addresses issue
[#871](#871) and has been
manually tested. The commit also includes updated documentation for the
modified table. No new methods, CLI commands, workflows, or tests (unit,
integration) have been introduced as part of this change.
* Added assessment of interactive cluster usage compared to UC compute
limitations
([#1123](#1123)).
* Added external location validation when creating catalogs with
`create-catalogs-schemas` command
([#1110](#1110)).
* Added flag to Job to identify Job submitted by jar
([#1088](#1088)). The
open-source library has been updated with several new features aimed at
enhancing user functionality and convenience. These updates include the
addition of a new sorting algorithm, which provides users with an
efficient and customizable method for organizing data. Additionally, a
new caching mechanism has been implemented, improving the library's
performance and reducing the amount of time required to access
frequently used data. Furthermore, the library now supports
multi-threading, enabling users to perform multiple operations
simultaneously and increase overall productivity. Lastly, a new error
handling system has been developed, providing users with more
informative and actionable feedback when unexpected issues arise. These
changes are a significant step forward in improving the library's
performance, functionality, and usability for all users.
* Bump databricks-sdk from 0.22.0 to 0.23.0
([#1121](#1121)). In this
version update, `databricks-sdk` is upgraded from 0.22.0 to 0.23.0,
introducing significant changes to the handling of AWS and Azure
identities. The `AwsIamRole` class is replaced with `AwsIamRoleRequest`
in the `databricks.sdk.service.catalog` module, affecting the creation
of AWS storage credentials using IAM roles. The `create` function in
`src/databricks/labs/ucx/aws/credentials.py` is updated to accommodate
this modification. Additionally, the `AwsIamRole` argument in the
`create` function of `fixtures.py` in the `databricks/labs/ucx/mixins`
directory is replaced with `AwsIamRoleRequest`. The tests in
`tests/integration/aws/test_access.py` are also updated to utilize
`AwsIamRoleRequest`, and `StorageCredentialInfo` in
`tests/unit/azure/test_credentials.py` now uses `AwsIamRoleResponse`
instead of `AwsIamRole`. The new classes, `AwsIamRoleRequest` and
`AwsIamRoleResponse`, likely include new features or bug fixes for AWS
IAM roles. These changes require software engineers to thoroughly assess
their codebase and adjust any relevant functions accordingly.
* Deploy static views needed by
[#1123](#1123) interactive
dashboard ([#1139](#1139)).
In this update, we have added two new views, `misc_patterns_vw` and
`code_patterns_vw`, to the `install.py` script in the
`databricks/labs/ucx` directory. These views were originally intended to
be deployed with a previous update
([#1123](#1123)) but were
inadvertently overlooked. The addition of these views addresses issues
with queries in the `interactive` dashboard. The `deploy_schema`
function has been updated with two new lines,
`deployer.deploy_view("misc_patterns",
"queries/views/misc_patterns.sql")` and
`deployer.deploy_view("code_patterns",
"queries/views/code_patterns.sql")`, to deploy the new views using their
respective SQL files from the `queries/views` directory. No other
modifications have been made to the file.
* Fixed Table ACL migration logic
([#1149](#1149)). The
open-source library has been updated with several new features,
providing enhanced functionality for software engineers. A new utility
class has been added to simplify the process of working with
collections, offering methods to filter, map, and reduce elements in a
performant manner. Additionally, a new configuration system has been
implemented, allowing users to easily customize library behavior through
a simple JSON format. Finally, we have added support for asynchronous
processing, enabling efficient handling of I/O-bound tasks and improving
overall application performance. These features have been thoroughly
tested and are ready for use in your projects.
* Fixed `AssertionError: assert '14.3.x-scala2.12' ==
'15.0.x-scala2.12'` from nightly integration tests
([#1120](#1120)). In this
release, the open-source library has been updated with several new
features to enhance functionality and provide more options to users. The
library now supports multi-threading, allowing for more efficient
processing of large datasets. Additionally, a new algorithm for data
compression has been implemented, resulting in reduced memory usage and
faster data transfer. The library API has also been expanded, with new
methods for sorting and filtering data, as well as improved error
handling. These changes aim to provide a more robust and performant
library, making it an even more valuable tool for software engineers.
* Increase code coverage by 1 percent
([#1125](#1125)).
* Skip installation if remote and local version is the same, provide
prompt to override
([#1084](#1084)). In this
release, the `new_installation` workflow in the open-source library has
been enhanced to include a new use case for handling identical remote
and local versions of UCX. When the remote and local versions are the
same, the user is now prompted and if no override is requested, a
RuntimeWarning is raised. Additionally, users are now prompted to update
the existing installation and if confirmed, the installation proceeds.
These modifications include manual testing and new unit tests to ensure
functionality. These changes provide users with more control over their
installation process and address a specific use case for handling
identical UCX versions.
* Updated databricks-labs-lsql requirement from ~=0.2.2 to
>=0.2.2,<0.4.0
([#1137](#1137)). The
open-source library has been updated with several new features to
enhance usability and functionality. Firstly, we have added support for
asynchronous processing, allowing for more efficient handling of large
data sets and improving overall performance. Additionally, a new
configuration system has been implemented, which simplifies the setup
process for users and increases customization options. We have also
included a new error handling mechanism that provides more detailed and
actionable information, making it easier to diagnose and resolve issues.
Lastly, we have made significant improvements to the library's
documentation, including updated examples, guides, and an expanded API
reference. These changes are part of our ongoing commitment to improving
the library and providing the best possible user experience.
* [Experimental] Add support for permission migration API
([#1080](#1080)).

Dependency updates:

* Updated databricks-labs-lsql requirement from ~=0.2.2 to
>=0.2.2,<0.4.0
([#1137](#1137)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants