Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove WorkflowLinter as it is part of the Assessment workflow #3036

Merged
merged 3 commits into from
Oct 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 3 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -401,9 +401,8 @@ which can be used for further analysis and decision-making through the [assessme
9. `assess_pipelines`: This task scans through all the Pipelines and identifies those pipelines that have Azure Service Principals embedded in their configurations. A list of all the pipelines with matching configurations is stored in the `$inventory.pipelines` table.
10. `assess_azure_service_principals`: This task scans through all the clusters configurations, cluster policies, job cluster configurations, Pipeline configurations, and Warehouse configuration and identifies all the Azure Service Principals who have been given access to the Azure storage accounts via spark configurations referred in those entities. The list of all the Azure Service Principals referred in those configurations is saved in the `$inventory.azure_service_principals` table.
11. `assess_global_init_scripts`: This task scans through all the global init scripts and identifies if there is an Azure Service Principal who has been given access to the Azure storage accounts via spark configurations referred in those scripts.
12. `assess_dashboards`: This task scans through all the dashboards and analyzes embedded queries for migration problems. It also collects direct filesystem access patterns that require attention.
13. `assess_workflows`: This task scans through all the jobs and tasks and analyzes notebooks and files for migration problems. It also collects direct filesystem access patterns that require attention.

12. `assess_dashboards`: This task scans through all the dashboards and analyzes embedded queries for migration problems which it persists in `$inventory_database.query_problems`. It also collects direct filesystem access patterns that require attention which it persists in `$inventory_database.directfs_in_queries`.
13. `assess_workflows`: This task scans through all the jobs and tasks and analyzes notebooks and files for migration problems which it persists in `$inventory_database.workflow_problems`. It also collects direct filesystem access patterns that require attention which it persists in `$inventory_database.directfs_in_paths`.

![report](docs/assessment-report.png)

Expand Down Expand Up @@ -726,27 +725,10 @@ in the Migration dashboard.

[[back to top](#databricks-labs-ucx)]

## Jobs Static Code Analysis Workflow

> Please note that this is an experimental workflow.

The `experimental-workflow-linter` workflow lints accessible code from 2 sources:
- all workflows/jobs present in the workspace
- all dashboards/queries present in the workspace
The linting emits problems indicating what to resolve for making the code Unity Catalog compatible.
The linting also locates direct filesystem access that need to be migrated.

Once the workflow completes:
- problems are stored in the `$inventory_database.workflow_problems`/`$inventory_database.query_problems` table
- direct filesystem access are stored in the `$inventory_database.directfs_in_paths`/`$inventory_database.directfs_in_queries` table
- all the above are displayed in the Migration dashboard.
### Linter message codes

![code compatibility problems](docs/code_compatibility_problems.png)

[[back to top](#databricks-labs-ucx)]

### Linter message codes

Here's the detailed explanation of the linter message codes:

#### `cannot-autofix-table-reference`
Expand Down
56 changes: 28 additions & 28 deletions docs/table_persistence.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,34 +4,34 @@ List of all UCX objects and their respective metadata.

## Overview

Table Utilization:

| Table | Generate Assessment | Update Migration Progress | Migrate Groups | Migrate External Tables | Upgrade Jobs | Migrate tables | Migrate Data Reconciliation | Workflow linter |
|--------------------------|---------------------|---------------------------|----------------|-------------------------|--------------|----------------|-----------------------------|-----------------|
| tables | RW | RW | | RO | | RO | | |
| grants | RW | RW | | RW | | RW | | |
| mounts | RW | | | RO | RO | RO | | |
| permissions | RW | | RW | RO | | RO | | |
| jobs | RW | RW | | | RO | | | |
| clusters | RW | RW | | | | | | |
| directfs_in_paths | RW | RW | | | | | | RW |
| directfs_in_queries | RW | RW | | | | | | RW |
| external_locations | RW | | | RO | | | | |
| workspace | RW | | RO | | RO | | | |
| workspace_objects | RW | | | | | | | |
| azure_service_principals | RW | | | | | | | |
| global_init_scripts | RW | | | | | | | |
| pipelines | RW | RW | | | | | | |
| groups | RW | | RO | | | | | |
| table_size | RW | | | | | | | |
| submit_runs | RW | | | | | | | |
| policies | RW | RW | | | | | | |
| migration_status | | RW | | RW | | RW | | |
| query_problems | RW | RW | | | | | | RW |
| workflow_problems | RW | RW | | | | | | RW |
| udfs | RW | RW | RO | | | | | |
| logs | RW | | RW | RW | | RW | RW | |
| recon_results | | | | | | | RW | |
Table utilization per workflow:

| Table | Generate Assessment | Update Migration Progress | Migrate Groups | Migrate External Tables | Upgrade Jobs | Migrate tables | Migrate Data Reconciliation |
|--------------------------|---------------------|---------------------------|----------------|-------------------------|--------------|----------------|-----------------------------|
| tables | RW | RW | | RO | | RO | |
| grants | RW | RW | | RW | | RW | |
| mounts | RW | | | RO | RO | RO | |
| permissions | RW | | RW | RO | | RO | |
| jobs | RW | RW | | | RO | | |
| clusters | RW | RW | | | | | |
| directfs_in_paths | RW | RW | | | | | |
| directfs_in_queries | RW | RW | | | | | |
| external_locations | RW | | | RO | | | |
| workspace | RW | | RO | | RO | | |
| workspace_objects | RW | | | | | | |
| azure_service_principals | RW | | | | | | |
| global_init_scripts | RW | | | | | | |
| pipelines | RW | RW | | | | | |
| groups | RW | | RO | | | | |
| table_size | RW | | | | | | |
| submit_runs | RW | | | | | | |
| policies | RW | RW | | | | | |
| migration_status | | RW | | RW | | RW | |
| query_problems | RW | RW | | | | | |
| workflow_problems | RW | RW | | | | | |
| udfs | RW | RW | RO | | | | |
| logs | RW | | RW | RW | | RW | RW |
| recon_results | | | | | | | RW |

**RW** - Read/Write, the job generates or updates the table.<br/>
**RO** - Read Only
Expand Down
10 changes: 7 additions & 3 deletions src/databricks/labs/ucx/assessment/workflows.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,13 +186,17 @@ def crawl_groups(self, ctx: RuntimeContext):
@job_task
def assess_dashboards(self, ctx: RuntimeContext):
"""Scans all dashboards for migration issues in SQL code of embedded widgets.
Also stores direct filesystem accesses for display in the migration dashboard."""
JCZuurmond marked this conversation as resolved.
Show resolved Hide resolved

Also, stores direct filesystem accesses for display in the migration dashboard.
JCZuurmond marked this conversation as resolved.
Show resolved Hide resolved
"""
ctx.query_linter.refresh_report(ctx.sql_backend, ctx.inventory_database)

@job_task
def assess_workflows(self, ctx: RuntimeContext):
"""Scans all jobs for migration issues in notebooks.
Also stores direct filesystem accesses for display in the migration dashboard."""
"""Scans all jobs for migration issues in notebooks jobs.

Also, stores direct filesystem accesses for display in the migration dashboard.
"""
ctx.workflow_linter.refresh_report(ctx.sql_backend, ctx.inventory_database)


Expand Down
2 changes: 0 additions & 2 deletions src/databricks/labs/ucx/runtime.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
)
from databricks.labs.ucx.progress.workflows import MigrationProgress
from databricks.labs.ucx.recon.workflows import MigrationRecon
from databricks.labs.ucx.source_code.workflows import ExperimentalWorkflowLinter
from databricks.labs.ucx.workspace_access.workflows import (
GroupMigration,
PermissionsMigrationAPI,
Expand Down Expand Up @@ -58,7 +57,6 @@ def all(cls):
ScanTablesInMounts(),
MigrateTablesInMounts(),
PermissionsMigrationAPI(),
ExperimentalWorkflowLinter(),
MigrationRecon(),
Failing(),
]
Expand Down
19 changes: 0 additions & 19 deletions src/databricks/labs/ucx/source_code/workflows.py

This file was deleted.

20 changes: 0 additions & 20 deletions tests/integration/source_code/test_jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,26 +32,6 @@
from tests.unit.source_code.test_graph import _TestDependencyGraph


@retried(on=[NotFound], timeout=timedelta(minutes=5))
def test_running_real_workflow_linter_job(installation_ctx, make_job) -> None:
# Deprecated file system path in call to: /mnt/things/e/f/g
job = make_job(content="spark.read.table('a_table').write.csv('/mnt/things/e/f/g')\n")
ctx = installation_ctx.replace(config_transform=lambda wc: replace(wc, include_job_ids=[job.job_id]))
ctx.workspace_installation.run()
ctx.deployed_workflows.run_workflow("experimental-workflow-linter")
ctx.deployed_workflows.validate_step("experimental-workflow-linter")

# This test merely tests that the workflows produces records of the expected types; record content is not checked.
cursor = ctx.sql_backend.fetch(f"SELECT COUNT(*) AS count FROM {ctx.inventory_database}.workflow_problems")
result = next(cursor)
if result['count'] == 0:
installation_ctx.deployed_workflows.relay_logs("experimental-workflow-linter")
assert False, "No workflow problems found"
dfsa_records = installation_ctx.directfs_access_crawler_for_paths.snapshot()
used_table_records = installation_ctx.used_tables_crawler_for_paths.snapshot()
assert dfsa_records and used_table_records
JCZuurmond marked this conversation as resolved.
Show resolved Hide resolved


@retried(on=[NotFound], timeout=timedelta(minutes=2))
def test_linter_from_context(simple_ctx, make_job) -> None:
# This code is similar to test_running_real_workflow_linter_job, but it's executed on the caller side and is easier
Expand Down