Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove tree from PythonSequentialLinter #3535

Open
wants to merge 98 commits into
base: main
Choose a base branch
from

Conversation

JCZuurmond
Copy link
Member

@JCZuurmond JCZuurmond commented Jan 17, 2025

Changes

Remove tree from PythonSequentialLinter as the sequential linter should just sequence linting, not be used as an intermediate for manipulating the code tree.

  • Remove tree manipulation related logic from PythonSequentialLinter
  • Rewrite NotebookLinter to do the (notebook) tree manipulation instead:
    • Let _load_tree_from_notebook early return on Failure similarly to dependency graph building: if we cannot resolve the code used by a notebook then fail early
    • Attach subsequent cell as child tree to the cell before
    • Attach %run notebook trees a child tree to the cell that calls the notebook

Linked issues

Resolves #3543
Progresses #3514

Linked PRs

Stacked on :

Requires :

Functionality

  • modified code linting related
  • modified existing command: databricks labs ucx lint-local-code

Tests

  • manually tested
  • added and modified unit tests

@JCZuurmond JCZuurmond added migrate/code Abstract Syntax Trees and other dark magic migrate/python Pull requests that update Python code python Pull requests that update Python code labels Jan 17, 2025
@JCZuurmond JCZuurmond self-assigned this Jan 17, 2025
@JCZuurmond JCZuurmond requested a review from a team as a code owner January 17, 2025 10:07
@JCZuurmond JCZuurmond force-pushed the fix/python-ast-tree-unclarities branch from 60bcf2e to 1e7a4b9 Compare January 17, 2025 10:53
Copy link
Member Author

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asnare and @pritishpai : Please review this PR after #3550. I expect to create more PRs to continue to improve the linter

return
for cell in self._notebook.cells:
if not self._context.is_supported(cell.language.language):
try:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why: #3544

@@ -129,163 +125,162 @@ class NotebookLinter:
to the code cells according to the language of the cell.
"""

@classmethod
def from_source(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was only used by tests, it should not be the way to create the notebook linter in UCX

"""
if not isinstance(self.node, Module):
raise NotImplementedError(f"Cannot attach nodes to: {type(self.node).__name__}")
self_module: Module = cast(Module, self.node)
for node in nodes:
node.parent = self_module
self_module.body.append(node)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a notable change! It avoids duplicates as adding nodes to the body of the parent tree duplicates the nodes (over all trees)

@@ -668,7 +675,8 @@ def collect_dfsas(self, source_code: str) -> Iterable[DirectFsAccess]:
def collect_dfsas_from_tree(self, tree: Tree) -> Iterable[DirectFsAccessNode]: ...


class PythonSequentialLinter(Linter, DfsaCollector, TableCollector):
class PythonSequentialLinter(PythonLinter, DfsaPyCollector, TablePyCollector):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big clean up in this class!


# COMMAND ----------

# MAGIC %run ./test
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic run was missing in example notebook test, combined with the expected output in the yaml below

@@ -244,7 +244,6 @@ def test_functional(sample: Functional, mock_path_lookup, simple_dependency_reso
("_child_that_uses_missing_value.py", "parent_that_dbutils_runs_child_that_misses_value_from_parent.py"),
("_child_that_uses_value_from_parent.py", "grand_parent_that_dbutils_runs_parent_that_magic_runs_child.py"),
("_child_that_uses_missing_value.py", "parent_that_imports_child_that_misses_value_from_parent.py"),
("_child_that_uses_value_from_parent.py", "grand_parent_that_imports_parent_that_magic_runs_child.py"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is removed as it tests a non-existing situation: a file imports a notebook that runs another notebook. It does not work because the imported notebook is considered to be a file because our import resolver always returns files and not notebooks (as it should)

def test_notebook_linter_name(mock_path_lookup) -> None:
source = """-- Databricks notebook source"""
linter = NotebookLinter.from_source(index, mock_path_lookup, CurrentSessionState(), source, Language.SQL)
assert linter.name() == "notebook-linter"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused method/attribute outside this unit test

Copy link
Contributor

@asnare asnare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly fine, although some minor things I'd like clarified. (Not requesting changes because I don't want to block merging if someone else approves, but also don't think it's ready to merge just yet.)

src/databricks/labs/ucx/source_code/notebooks/sources.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/source_code/notebooks/sources.py Outdated Show resolved Hide resolved
code_path_nodes = self._list_magic_lines_with_run_command(tree) + SysPathChange.extract_from_tree(
self._session_state, tree
)
maybe_tree = MaybeTree(None, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be:

Suggested change
maybe_tree = MaybeTree(None, None)
maybe_tree: MaybeTree

That will then let the linter expose a problem: if there are no code_path_nodes to iterate over then we return an invalid instance.

What should we return in that case? (Can it be ruled out… do we always have nodes to iterate over?)

Finally, why do we return the last tree? Is it somehow more important than any earlier ones? (If we don't care, why return it? In that case the return type should probably be Failure | None).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On your first point, I updated the signature to be MaybeTree | None, which works - for now.

On the "why do we return the last tree", that logic is t.b.d.. At least, this PR resolves a bug with the notebook linting and adds (much) more unit tests. At the same time, I do not know exactly (yet) how the Python trees should be linked/merged for all to work as expected.

So, currently, it returns the last tree as the trees are sort of chained, e.g. the second notebook can go "up" to the first cell and the third cell can go "up" to the second (and via the second cell "find" the first cell). It becomes more complicated when introducing new notebooks with the run magic. That is the part I am not sure about yet, but I am more confident than before this PR as it introduces more tests

src/databricks/labs/ucx/source_code/notebooks/sources.py Outdated Show resolved Hide resolved
tests/unit/source_code/notebooks/test_sources.py Outdated Show resolved Hide resolved
tests/unit/source_code/notebooks/test_sources.py Outdated Show resolved Hide resolved
tests/unit/source_code/notebooks/test_sources.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
migrate/code Abstract Syntax Trees and other dark magic migrate/python Pull requests that update Python code python Pull requests that update Python code
Projects
Status: Ready for Review
Development

Successfully merging this pull request may close these issues.

[BUG]: Notebook linter includes lower cells within linting context
2 participants