Remove tree from `PythonSequentialLinter` #3535

JCZuurmond · 2025-01-17T10:07:24Z

Changes

Remove tree from PythonSequentialLinter as the sequential linter should just sequence linting, not be used as an intermediate for manipulating the code tree.

Remove tree manipulation related logic from PythonSequentialLinter
Rewrite NotebookLinter to do the (notebook) tree manipulation instead:
- Let _load_tree_from_notebook early return on Failure similarly to dependency graph building: if we cannot resolve the code used by a notebook then fail early
- Attach subsequent cell as child tree to the cell before
- Attach %run notebook trees a child tree to the cell that calls the notebook

Linked issues

Resolves #3543
Progresses #3514

Linked PRs

Stacked on :

Rename Python AST's Tree methods for clarity #3524

Requires :

Functionality

modified code linting related
modified existing command: databricks labs ucx lint-local-code

Tests

manually tested
added and modified unit tests

JCZuurmond

@asnare and @pritishpai : Please review this PR after #3550. I expect to create more PRs to continue to improve the linter

JCZuurmond · 2025-01-21T07:29:11Z

src/databricks/labs/ucx/source_code/notebooks/sources.py

            return
        for cell in self._notebook.cells:
-            if not self._context.is_supported(cell.language.language):
+            try:


This is why: #3544

JCZuurmond · 2025-01-22T08:21:30Z

src/databricks/labs/ucx/source_code/notebooks/sources.py

@@ -129,163 +125,162 @@ class NotebookLinter:
    to the code cells according to the language of the cell.
    """

-    @classmethod
-    def from_source(


This was only used by tests, it should not be the way to create the notebook linter in UCX

JCZuurmond · 2025-01-22T09:52:52Z

src/databricks/labs/ucx/source_code/python/python_ast.py

        """
        if not isinstance(self.node, Module):
            raise NotImplementedError(f"Cannot attach nodes to: {type(self.node).__name__}")
        self_module: Module = cast(Module, self.node)
        for node in nodes:
            node.parent = self_module
-            self_module.body.append(node)


This is a notable change! It avoids duplicates as adding nodes to the body of the parent tree duplicates the nodes (over all trees)

JCZuurmond · 2025-01-22T09:53:09Z

src/databricks/labs/ucx/source_code/python/python_ast.py

@@ -668,7 +675,8 @@ def collect_dfsas(self, source_code: str) -> Iterable[DirectFsAccess]:
    def collect_dfsas_from_tree(self, tree: Tree) -> Iterable[DirectFsAccessNode]: ...


-class PythonSequentialLinter(Linter, DfsaCollector, TableCollector):
+class PythonSequentialLinter(PythonLinter, DfsaPyCollector, TablePyCollector):


Big clean up in this class!

JCZuurmond · 2025-01-22T09:54:39Z

tests/unit/source_code/samples/simple_notebook.py

+
+# COMMAND ----------
+
+# MAGIC %run ./test


Magic run was missing in example notebook test, combined with the expected output in the yaml below

JCZuurmond · 2025-01-22T09:55:59Z

tests/unit/source_code/test_functional.py

@@ -244,7 +244,6 @@ def test_functional(sample: Functional, mock_path_lookup, simple_dependency_reso
        ("_child_that_uses_missing_value.py", "parent_that_dbutils_runs_child_that_misses_value_from_parent.py"),
        ("_child_that_uses_value_from_parent.py", "grand_parent_that_dbutils_runs_parent_that_magic_runs_child.py"),
        ("_child_that_uses_missing_value.py", "parent_that_imports_child_that_misses_value_from_parent.py"),
-        ("_child_that_uses_value_from_parent.py", "grand_parent_that_imports_parent_that_magic_runs_child.py"),


This test is removed as it tests a non-existing situation: a file imports a notebook that runs another notebook. It does not work because the imported notebook is considered to be a file because our import resolver always returns files and not notebooks (as it should)

JCZuurmond · 2025-01-22T09:56:17Z

tests/unit/source_code/test_notebook_linter.py

-def test_notebook_linter_name(mock_path_lookup) -> None:
-    source = """-- Databricks notebook source"""
-    linter = NotebookLinter.from_source(index, mock_path_lookup, CurrentSessionState(), source, Language.SQL)
-    assert linter.name() == "notebook-linter"


Unused method/attribute outside this unit test

asnare

Looks mostly fine, although some minor things I'd like clarified. (Not requesting changes because I don't want to block merging if someone else approves, but also don't think it's ready to merge just yet.)

src/databricks/labs/ucx/source_code/notebooks/sources.py

asnare · 2025-01-23T17:56:11Z

src/databricks/labs/ucx/source_code/notebooks/sources.py

+        code_path_nodes = self._list_magic_lines_with_run_command(tree) + SysPathChange.extract_from_tree(
+            self._session_state, tree
+        )
+        maybe_tree = MaybeTree(None, None)


This should also be:

Suggested change

maybe_tree = MaybeTree(None, None)

maybe_tree: MaybeTree

That will then let the linter expose a problem: if there are no code_path_nodes to iterate over then we return an invalid instance.

What should we return in that case? (Can it be ruled out… do we always have nodes to iterate over?)

Finally, why do we return the last tree? Is it somehow more important than any earlier ones? (If we don't care, why return it? In that case the return type should probably be Failure | None).

On your first point, I updated the signature to be MaybeTree | None, which works - for now.

On the "why do we return the last tree", that logic is t.b.d.. At least, this PR resolves a bug with the notebook linting and adds (much) more unit tests. At the same time, I do not know exactly (yet) how the Python trees should be linked/merged for all to work as expected.

So, currently, it returns the last tree as the trees are sort of chained, e.g. the second notebook can go "up" to the first cell and the third cell can go "up" to the second (and via the second cell "find" the first cell). It becomes more complicated when introducing new notebooks with the run magic. That is the part I am not sure about yet, but I am more confident than before this PR as it introduces more tests

src/databricks/labs/ucx/source_code/notebooks/sources.py

tests/unit/source_code/notebooks/test_sources.py

JCZuurmond added migrate/code Abstract Syntax Trees and other dark magic migrate/python Pull requests that update Python code python Pull requests that update Python code labels Jan 17, 2025

JCZuurmond requested review from asnare and pritishpai January 17, 2025 10:07

JCZuurmond self-assigned this Jan 17, 2025

JCZuurmond requested a review from a team as a code owner January 17, 2025 10:07

JCZuurmond had a problem deploying to account-admin January 17, 2025 10:07 — with GitHub Actions Error

JCZuurmond had a problem deploying to account-admin January 17, 2025 10:07 — with GitHub Actions Failure

JCZuurmond mentioned this pull request Jan 17, 2025

Rename Python AST's Tree methods for clarity #3524

Merged

4 tasks

JCZuurmond added 18 commits January 17, 2025 11:53

Let append tree return None

b53e0fd

Test bidirectionality of appended trees

108dd7b

Rename append_tree to attach tree

8a22d90

Clean test for attaching child tree

6c0ebcf

Rewrite test for module propagation

fd2c761

Rewrite test for not implemented error

2c46a01

Test append globals

ca30dcf

Narrow not implemented test

1d22084

Test appending globals during attach tree

48ddf4e

Refactor append_globals to extend_globals

1211dda

Test appending nodes sets parent on node

043fc6b

Test appending nodes adds nodes to end of body

b0e39ef

Move append_nodes method up

51358f2

Rename append_nodes to attach_nodes

460e88f

Narrow raising not implemented error test

357a2e5

Add docstring for attach nodes

d9ffd91

Change defining sources in test

a3c023b

Update constructing sources

1e7a4b9

JCZuurmond force-pushed the fix/python-ast-tree-unclarities branch from 60bcf2e to 1e7a4b9 Compare January 17, 2025 10:53

Test PythonLinter with dummy advices

249857e

JCZuurmond added 10 commits January 21, 2025 11:26

Test using variable from ran child notebook

9701f17

Test infer from parent using extend globals

02e8cb3

Test infer from grand parent using extend globals

6f8a819

Fix test name

47e3273

Test inferring from sibling tree

e800b7e

Test simulate using value from child notebook

b648b2f

Test simulate using value from parent notebook

25549ec

Test propagating module with extend globals

09470e6

Let NotebookLinter fail early while parsing

2a16d90

Rewrite notebook linter to only extend globals

5bebad4

JCZuurmond temporarily deployed to account-admin January 21, 2025 16:00 — with GitHub Actions Inactive

JCZuurmond added 2 commits January 22, 2025 09:31

Add test showing unresolvable node issue

f25174a

Pass tree globals to next cells tree

0c67f3e

JCZuurmond had a problem deploying to account-admin January 22, 2025 09:46 — with GitHub Actions Error

JCZuurmond commented Jan 22, 2025

View reviewed changes

Merge branch 'main' into fix/remove-tree-from-python-sequential-linter

a273997

JCZuurmond temporarily deployed to account-admin January 22, 2025 09:58 — with GitHub Actions Inactive

Add assumption to docstring

6360e7b

JCZuurmond mentioned this pull request Jan 23, 2025

Fix cli command to migrate local folder #3520

Draft

5 tasks

asnare reviewed Jan 23, 2025

View reviewed changes

JCZuurmond added 3 commits January 28, 2025 16:29

Merge branch 'main' into fix/remove-tree-from-python-sequential-linter

54607a6

Add new line to test sources

4afb7e9

Get first advice with next

2c0f32b

JCZuurmond temporarily deployed to account-admin January 28, 2025 15:38 — with GitHub Actions Inactive

JCZuurmond requested a review from FastLee January 28, 2025 15:41

Let process code node return failure

c5a7f28

JCZuurmond temporarily deployed to account-admin January 28, 2025 16:17 — with GitHub Actions Inactive

JCZuurmond requested a review from asnare January 28, 2025 16:21

Merge branch 'main' into fix/remove-tree-from-python-sequential-linter

02e2254

JCZuurmond temporarily deployed to account-admin January 31, 2025 09:03 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove tree from `PythonSequentialLinter` #3535

Remove tree from `PythonSequentialLinter` #3535

JCZuurmond commented Jan 17, 2025 •

edited

Loading

JCZuurmond left a comment

JCZuurmond Jan 21, 2025

JCZuurmond Jan 22, 2025

JCZuurmond Jan 22, 2025

JCZuurmond Jan 22, 2025

JCZuurmond Jan 22, 2025

JCZuurmond Jan 22, 2025

JCZuurmond Jan 22, 2025

asnare left a comment

asnare Jan 23, 2025

JCZuurmond Jan 28, 2025

Remove tree from PythonSequentialLinter #3535

Are you sure you want to change the base?

Remove tree from PythonSequentialLinter #3535

Conversation

JCZuurmond commented Jan 17, 2025 • edited Loading

Changes

Linked issues

Linked PRs

Functionality

Tests

JCZuurmond left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asnare left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Remove tree from `PythonSequentialLinter` #3535

Remove tree from `PythonSequentialLinter` #3535

JCZuurmond commented Jan 17, 2025 •

edited

Loading