databrickslabs · pritishpai · Sep 25, 2025 · Sep 25, 2025 · Sep 26, 2025 · Sep 26, 2025
@@ -20,3 +20,4 @@ It includes the following:
 - [Table Upgrade](/docs/reference/table_upgrade)
 - [Troubleshooting Guide](/docs/reference/troubleshooting)
 - [Workflows](/docs/reference/workflows)
+- [Workspace Table Scanning](/docs/reference/workspace-table-scanning)
@@ -251,6 +251,11 @@ The output is processed and displayed in the migration dashboard using the in `r
   - run the [`create-table-mapping` command](/docs/reference/commands#create-table-mapping)
     - or manually create a `mapping.csv` file in Workspace -> Applications -> ucx
 
+## [EXPERIMENTAL] Workspace Code Scanner Workflow
+
+The [`workspace-code-scanner-experimental`](/docs/reference/workspace-table-scanning) workflow scans all notebooks and files in the workspace for used tables in the workspace. The workflow performs a static analysis of the code to identify the tables and views used in the code. This is useful to identify schemas being used so that the assessment can be focused on those schemas. THe results are stored in 'used_tables_in_workspace' table in the inventory database.
+
+
 ## [EXPERIMENTAL] Migration Progress Workflow
 
 The `migration-progress-experimental` workflow populates the tables visualized in the

@@ -0,0 +1,161 @@
+# Workspace Table Scanning
+
+UCX now supports comprehensive table usage detection across your entire Databricks workspace, beyond just workflows and dashboards. This expanded capability allows you to discover all table references in notebooks and files within specified workspace paths.
+
+## Overview
+
+The new workspace scanning feature expands this to:
+- **Workspace**: Tables used in any notebook or file within specified workspace paths
+
+**Key Benefits:**
+- **Discovery-first approach**: Runs as standalone workflow before assessment
+- **Scope optimization**: Can limit Hive Metastore scanning to only databases that are referenced
+- **Complete coverage**: Finds table usage beyond just workflows and dashboards
+- **Independent execution**: Run on-demand without full assessment cycle
+
+## How It Works
+
+The workspace table scanner:
+
+1. **Discovers Objects**: Recursively scans workspace paths to find all notebooks and supported files
+2. **Analyzes Content**: Uses UCX's linting framework to extract table usage from each object
+3. **Tracks Lineage**: Maintains detailed source lineage information for each table reference
+4. **Stores Results**: Saves findings to the `used_tables_in_workspace` inventory table
+
+## Supported File Types
+
+The scanner supports:
+- **Notebooks**: Python, SQL
+- **Files**: Python (.py), SQL (.sql)
+
+## Configuration
+
+### Via Standalone Workflow
+
+UCX now includes a dedicated `workspace-table-scanner` workflow that runs independently:
+
+**Workflow Parameters:**
+- `paths`: JSON list of workspace paths to scan (default: `["/"]`)
+
+### Via CLI command
+You can also run the scanner via the UCX CLI:
+
+```bash
+databricks ucx workspace-table-scanner --paths '["/Users", "/Shared"]'
+```
+
+### Programmatic Usage
+
+```python
+from databricks.labs.ucx.source_code.linters.workspace import WorkspaceTablesLinter
+from databricks.labs.ucx.source_code.used_table import UsedTablesCrawler
+
+# Initialize components
+workspace_linter = WorkspaceTablesLinter(
+    ws=workspace_client,
+    sql_backend=sql_backend,
+    inventory_database="ucx_inventory",
+    path_lookup=path_lookup,
+    used_tables_crawler=UsedTablesCrawler.for_workspace(sql_backend, "ucx_inventory")
+)
+
+# Scan specific paths
+workspace_paths = ["/Users/data_team", "/Shared/analytics"]
+workspace_linter.scan_workspace_for_tables(workspace_paths)
+```
+
+## Typical Workflow Sequence
+
+For optimal UCX assessment with scope optimization:
+
+```bash
+# 1. Run workspace-table-scanner first (standalone)
+
+# 2. Use results to configure scope-limited assessment
+# The scanner workflow will log suggested include_databases configuration
+
+# 3. Update your UCX config with discovered databases
+# include_databases: ["database1", "database2", "database3"]
+
+# 4. Run assessment with optimized scope
+databricks workflows run assessment
+
+
+**Scope Optimization Example:**
+```sql
+-- Query to get databases for config
+SELECT DISTINCT schema_name
+FROM ucx_inventory.used_tables_in_workspace
+WHERE catalog_name = 'hive_metastore'
+ORDER BY schema_name;
+```
+
+## Results and Analysis
+
+### Inventory Table
+
+Results are stored in `{inventory_database}.used_tables_in_workspace` with the following schema:
+
+| Column | Type | Description |
+|--------|------|-------------|
+| `catalog_name` | string | Catalog containing the table |
+| `schema_name` | string | Schema containing the table |
+| `table_name` | string | Name of the table |
+| `source_id` | string | Path to the workspace object |
+| `source_lineage` | array | Detailed lineage information |
+| `is_write` | boolean | Whether this is a write operation |
+
+### Example Queries
+
+**Most used tables across workspace:**
+```sql
+SELECT
+    catalog_name,
+    schema_name,
+    table_name,
+    COUNT(*) as usage_count
+FROM ucx_inventory.used_tables_in_workspace
+GROUP BY catalog_name, schema_name, table_name
+ORDER BY usage_count DESC;
+```
+
+**Table usage by workspace area:**
+```sql
+SELECT
+    CASE
+        WHEN source_id LIKE '%/Users/%' THEN 'User Notebooks'
+        WHEN source_id LIKE '%/Shared/%' THEN 'Shared Notebooks'
+        WHEN source_id LIKE '%/Repos/%' THEN 'Repository Code'
+        ELSE 'Other'
+    END as workspace_area,
+    COUNT(DISTINCT CONCAT(catalog_name, '.', schema_name, '.', table_name)) as unique_tables,
+    COUNT(*) as total_references
+FROM ucx_inventory.used_tables_in_workspace
+GROUP BY workspace_area;
+```
+
+**Files with the most table dependencies:**
+```sql
+SELECT
+    source_id,
+    COUNT(DISTINCT CONCAT(catalog_name, '.', schema_name, '.', table_name)) as table_count
+FROM ucx_inventory.used_tables_in_workspace
+GROUP BY source_id
+ORDER BY table_count DESC
+LIMIT 20;
+```
+
+## Best Practices
+
+### Path Selection
+- Start with critical paths like `/Shared/production` or specific team directories
+- Avoid scanning entire workspace initially to gauge performance impact
+- Exclude test/scratch directories to focus on production code
+
+### Regular Scanning
+- Run workspace scans weekly or monthly to track evolving dependencies
+- Compare results over time to identify new table dependencies
+
+### Result Analysis
+- Combine workspace results with workflow and dashboard results for complete picture
+- Use the lineage information to understand code relationships
@@ -98,6 +98,16 @@ commands:
         description: (Optional) Whether to run the assess-workflows for the collection of workspaces with ucx
           installed. Default is False.
 
+  - name: run-workspace-code-scanner
+    description: (Experimental) trigger the `workspace-code-scanner-experimental` job to scan the workspace code for fetching tables referenced in the codebase.
+    flags:
+      - name: paths
+        description: The workspace paths to the directory to scan.
+      - name: run-as-collection
+        description: (Optional) Whether to run the workspace-code-scanner for the collection of workspaces with ucx
+          installed. Default is False.
+
+
   - name: update-migration-progress
     description: trigger the `migration-progress-experimental` job to refresh the inventory that tracks the workspace
       resources and their migration status.

@@ -247,3 +247,25 @@ def failing_task(self, ctx: RuntimeContext):
         logger.warning("This is a test warning message.")
         logger.error("This is a test error message.")
         raise ValueError("This task is supposed to fail.")
+
+
+class WorkspaceCodeScanner(Workflow):
+    def __init__(self):
+        super().__init__('workspace-code-scanner-experimental', [JobParameterDefinition(name="paths", default="")])
+
+    @job_task
+    def scan_workspace_code(self, ctx: RuntimeContext):
+        """Scan workspace for table usage using WorkspaceTablesLinter."""
+        logger.info("Starting workspace table scanning")
+
+        # Get the path parameter and split by comma if multiple paths
+        path_param = ctx.named_parameters.get("paths", "")
+        if not path_param:
+            logger.error("No path parameter provided. Please provide a comma-separated list of paths to scan.")
+        else:
+            paths = [p.strip() for p in path_param.split(",") if p.strip()]
+
+            # Create and use the workspace linter
+            workspace_linter = ctx.workspace_tables_linter
+            workspace_linter.scan_workspace_for_tables(paths)
+            logger.info("Workspace table scanning completed and results stored in inventory database")
@@ -257,6 +257,25 @@ def run_assess_workflows(
         deployed_workflows.run_workflow("assess-workflows", skip_job_wait=run_as_collection)
 
 
+@ucx.command
+def run_workspace_code_scanner_experimental(
+    w: WorkspaceClient, run_as_collection: bool = False, a: AccountClient | None = None, paths: str | None = None
+):
+    """Manually trigger the workspace-code-scanner-experimental job."""
+    if paths is None:
+        logger.error("--paths is a required parameter.")
+        return
+
+    workspace_contexts = _get_workspace_contexts(w, a, run_as_collection)
+    for ctx in workspace_contexts:
+        workspace_id = ctx.workspace_client.get_workspace_id()
+        deployed_workflows = ctx.deployed_workflows
+        logger.info(f"Starting 'workspace-code-scanner-experimental' workflow in workspace: {workspace_id}")
+        deployed_workflows.run_workflow(
+            "workspace-code-scanner-experimental", named_parameters={"paths": paths}, skip_job_wait=run_as_collection
+        )
+
+
 @ucx.command
 def update_migration_progress(
     w: WorkspaceClient,

@@ -66,6 +66,7 @@
 from databricks.labs.ucx.progress.install import VerifyProgressTracking
 from databricks.labs.ucx.source_code.graph import DependencyResolver
 from databricks.labs.ucx.source_code.linters.jobs import WorkflowLinter
+from databricks.labs.ucx.source_code.linters.workspace import WorkspaceCodeLinter
 from databricks.labs.ucx.source_code.known import KnownList
 from databricks.labs.ucx.source_code.folders import FolderLoader
 from databricks.labs.ucx.source_code.files import FileLoader, ImportFileResolver
@@ -610,6 +611,16 @@ def query_linter(self) -> QueryLinter:
             self.config.debug_listing_upper_limit,
         )
 
+    @cached_property
+    def workspace_tables_linter(self) -> WorkspaceCodeLinter:
+        return WorkspaceCodeLinter(
+            self.workspace_client,
+            self.sql_backend,
+            self.inventory_database,
+            self.path_lookup,
+            self.used_tables_crawler_for_workspace,
+        )
+
     @cached_property
     def directfs_access_crawler_for_paths(self) -> DirectFsAccessCrawler:
         return DirectFsAccessCrawler.for_paths(self.sql_backend, self.inventory_database)
@@ -626,6 +637,10 @@ def used_tables_crawler_for_paths(self):
     def used_tables_crawler_for_queries(self):
         return UsedTablesCrawler.for_queries(self.sql_backend, self.inventory_database)
 
+    @cached_property
+    def used_tables_crawler_for_workspace(self):
+        return UsedTablesCrawler.for_workspace(self.sql_backend, self.inventory_database)
+
     @cached_property
     def redash(self) -> Redash:
         return Redash(

@@ -6,7 +6,7 @@
 from databricks.sdk.config import with_user_agent_extra
 
 from databricks.labs.ucx.__about__ import __version__
-from databricks.labs.ucx.assessment.workflows import Assessment, Failing, AssessWorkflows
+from databricks.labs.ucx.assessment.workflows import Assessment, Failing, AssessWorkflows, WorkspaceCodeScanner
 from databricks.labs.ucx.contexts.workflow_task import RuntimeContext
 from databricks.labs.ucx.framework.tasks import Workflow, parse_args
 from databricks.labs.ucx.installer.logs import TaskLogger
@@ -52,6 +52,7 @@ def all(cls):
                 ConvertWASBSToADLSGen2(),
                 PermissionsMigrationAPI(),
                 MigrationRecon(),
+                WorkspaceCodeScanner(),
                 Failing(),
             ]
         )