Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed databricks labs ucx repair-run command to execute correctly #801

Merged
merged 11 commits into from
Jan 19, 2024

Conversation

prajin-29
Copy link
Contributor

Changes

Fixing the issue for repair run CLI databricks labs ucx repair-run . When a CLI tries to repair run a job before if updates its response json to either FAILED or SUCCESS it was failing with NoneType exception.

Added a check in repair_run inside install.py to check the status of the response and wait for 20 seconds to get it updated .

Enhanced the code to repair run already repaired job.

Linked issues

closes #787

Resolves #787

Functionality

  • modified the cli command databricks labs ucx repair-run which was failing in regression testing

Tests

  • This has been manually tested
  • added unit tests test_repair_run_result_state in test_install.py
  • This was tested using integration test
  • Screenshot 2024-01-17 at 11 08 42 AM

Copy link

codecov bot commented Jan 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (af80620) 84.07% compared to head (add2404) 84.13%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #801      +/-   ##
==========================================
+ Coverage   84.07%   84.13%   +0.05%     
==========================================
  Files          39       39              
  Lines        4872     4890      +18     
  Branches      913      916       +3     
==========================================
+ Hits         4096     4114      +18     
  Misses        564      564              
  Partials      212      212              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite to retried decorator.


while not state.result_state and (time.time() - start_time < timeout):
logger.info("Waiting for the result_state to update the state")
time.sleep(10)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not unit testable, see how we use retried() decorator in workspace access package (dbsql permissions, secrets acls, etc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nfx .Updated the code with retried logic.

@@ -893,13 +895,28 @@ def repair_run(self, workflow):
return
latest_job_run = job_runs[0]
state = latest_job_run.state

while not state.result_state and (time.time() - start_time < timeout):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor this into private method and decode with retried

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nfx .Refactored the same with retried logic.

Comment on lines 890 to 897
def _get_result_state(self, job_id):
job_runs = list(self._ws.jobs.list_runs(job_id=job_id, limit=1))
latest_job_run = job_runs[0]
if not latest_job_run.state.result_state:
logger.info("Waiting for the result_state to update the state")
time.sleep(10)
job_state = latest_job_run.state.result_state.value
return job_state
Copy link
Collaborator

@nfx nfx Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _get_result_state(self, job_id):
job_runs = list(self._ws.jobs.list_runs(job_id=job_id, limit=1))
latest_job_run = job_runs[0]
if not latest_job_run.state.result_state:
logger.info("Waiting for the result_state to update the state")
time.sleep(10)
job_state = latest_job_run.state.result_state.value
return job_state
def _get_result_state(self, job_id):
job_runs = list(self._ws.jobs.list_runs(job_id=job_id, limit=1))
if len(job_runs) == 0:
raise AttributeError("no job runs found")
latest_job_run = job_runs[0]
if not latest_job_run.state.result_state:
raise AttributeError("no result state in job run")
job_state = latest_job_run.state.result_state.value
return job_state

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have retried(on=[AttributeError], but don't throw it anywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If latest_job_run.state is None then latest_job_run.state.result_state.value will throw AttributeError. But I have rewritten now to raise the exception.

For Job Runs during the initial stage itself we are exiting immediately if don't have any job run for the job_id with proper message.

# Conflicts:
#	src/databricks/labs/ucx/install.py
@nfx nfx changed the title Fixing the Issue for Repair Run databricks labs ucx repair-run Fixed databricks labs ucx repair-run command to execute correctly Jan 19, 2024
@nfx nfx merged commit b45fa41 into main Jan 19, 2024
7 checks passed
@nfx nfx deleted the feature/fix_repai_run branch January 19, 2024 09:14
nfx added a commit that referenced this pull request Jan 19, 2024
* Added `databricks labs ucx validate-groups-membership` command to validate groups to see if they have same membership across acount and workspace level ([#772](#772)).
* Added baseline for getting Azure Resource Role Assignments ([#764](#764)).
* Added issue and pull request templates ([#791](#791)).
* Added linked issues to PR template ([#793](#793)).
* Added optional `debug_truncate_bytes` parameter to the config and extend the default log truncation limit ([#782](#782)).
* Added support for crawling grants and applying Hive Metastore UDF ACLs ([#812](#812)).
* Changed Python requirement from 3.10.6 to 3.10 ([#805](#805)).
* Extend error handling of delta issues in crawlers and hive metastore ([#795](#795)).
* Fixed `databricks labs ucx repair-run` command to execute correctly ([#801](#801)).
* Fixed handling of `DELTASHARING` table format ([#802](#802)).
* Fixed listing of workflows via CLI ([#811](#811)).
* Fixed logger import path for DEBUG notebook ([#792](#792)).
* Fixed move table command to delete table/view regardless if permissions are present, skipping corrupted tables when crawling table size and making existing tests more stable ([#777](#777)).
* Fixed the issue of `databricks labs ucx installations` and `databricks labs ucx manual-workspace-info` ([#814](#814)).
* Increase the unit test coverage for cli.py ([#800](#800)).
* Mount Point crawler lists /Volume with four variations which is confusing ([#779](#779)).
* Updated README.md to remove mention of deprecated install.sh ([#781](#781)).
* Updated `bug` issue template ([#797](#797)).
* Fixed writing log readme in multiprocess safe way ([#794](#794)).
@nfx nfx mentioned this pull request Jan 19, 2024
nfx added a commit that referenced this pull request Jan 19, 2024
* Added `databricks labs ucx validate-groups-membership` command to
validate groups to see if they have same membership across acount and
workspace level
([#772](#772)).
* Added baseline for getting Azure Resource Role Assignments
([#764](#764)).
* Added issue and pull request templates
([#791](#791)).
* Added linked issues to PR template
([#793](#793)).
* Added optional `debug_truncate_bytes` parameter to the config and
extend the default log truncation limit
([#782](#782)).
* Added support for crawling grants and applying Hive Metastore UDF ACLs
([#812](#812)).
* Changed Python requirement from 3.10.6 to 3.10
([#805](#805)).
* Extend error handling of delta issues in crawlers and hive metastore
([#795](#795)).
* Fixed `databricks labs ucx repair-run` command to execute correctly
([#801](#801)).
* Fixed handling of `DELTASHARING` table format
([#802](#802)).
* Fixed listing of workflows via CLI
([#811](#811)).
* Fixed logger import path for DEBUG notebook
([#792](#792)).
* Fixed move table command to delete table/view regardless if
permissions are present, skipping corrupted tables when crawling table
size and making existing tests more stable
([#777](#777)).
* Fixed the issue of `databricks labs ucx installations` and `databricks
labs ucx manual-workspace-info`
([#814](#814)).
* Increase the unit test coverage for cli.py
([#800](#800)).
* Mount Point crawler lists /Volume with four variations which is
confusing ([#779](#779)).
* Updated README.md to remove mention of deprecated install.sh
([#781](#781)).
* Updated `bug` issue template
([#797](#797)).
* Fixed writing log readme in multiprocess safe way
([#794](#794)).
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
* Added `databricks labs ucx validate-groups-membership` command to
validate groups to see if they have same membership across acount and
workspace level
([#772](#772)).
* Added baseline for getting Azure Resource Role Assignments
([#764](#764)).
* Added issue and pull request templates
([#791](#791)).
* Added linked issues to PR template
([#793](#793)).
* Added optional `debug_truncate_bytes` parameter to the config and
extend the default log truncation limit
([#782](#782)).
* Added support for crawling grants and applying Hive Metastore UDF ACLs
([#812](#812)).
* Changed Python requirement from 3.10.6 to 3.10
([#805](#805)).
* Extend error handling of delta issues in crawlers and hive metastore
([#795](#795)).
* Fixed `databricks labs ucx repair-run` command to execute correctly
([#801](#801)).
* Fixed handling of `DELTASHARING` table format
([#802](#802)).
* Fixed listing of workflows via CLI
([#811](#811)).
* Fixed logger import path for DEBUG notebook
([#792](#792)).
* Fixed move table command to delete table/view regardless if
permissions are present, skipping corrupted tables when crawling table
size and making existing tests more stable
([#777](#777)).
* Fixed the issue of `databricks labs ucx installations` and `databricks
labs ucx manual-workspace-info`
([#814](#814)).
* Increase the unit test coverage for cli.py
([#800](#800)).
* Mount Point crawler lists /Volume with four variations which is
confusing ([#779](#779)).
* Updated README.md to remove mention of deprecated install.sh
([#781](#781)).
* Updated `bug` issue template
([#797](#797)).
* Fixed writing log readme in multiprocess safe way
([#794](#794)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot repair failed assessment job with databricks labs ucx repair-run
2 participants