Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove tree from PythonSequentialLinter #3535

Merged
merged 99 commits into from
Feb 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
b53e0fd
Let append tree return None
JCZuurmond Jan 15, 2025
108dd7b
Test bidirectionality of appended trees
JCZuurmond Jan 15, 2025
8a22d90
Rename append_tree to attach tree
JCZuurmond Jan 15, 2025
6c0ebcf
Clean test for attaching child tree
JCZuurmond Jan 15, 2025
fd2c761
Rewrite test for module propagation
JCZuurmond Jan 15, 2025
2c46a01
Rewrite test for not implemented error
JCZuurmond Jan 15, 2025
ca30dcf
Test append globals
JCZuurmond Jan 15, 2025
1d22084
Narrow not implemented test
JCZuurmond Jan 15, 2025
48ddf4e
Test appending globals during attach tree
JCZuurmond Jan 15, 2025
1211dda
Refactor `append_globals` to `extend_globals`
JCZuurmond Jan 15, 2025
043fc6b
Test appending nodes sets parent on node
JCZuurmond Jan 15, 2025
b0e39ef
Test appending nodes adds nodes to end of body
JCZuurmond Jan 15, 2025
51358f2
Move append_nodes method up
JCZuurmond Jan 15, 2025
460e88f
Rename append_nodes to attach_nodes
JCZuurmond Jan 15, 2025
357a2e5
Narrow raising not implemented error test
JCZuurmond Jan 15, 2025
d9ffd91
Add docstring for attach nodes
JCZuurmond Jan 15, 2025
a3c023b
Change defining sources in test
JCZuurmond Jan 16, 2025
1e7a4b9
Update constructing sources
JCZuurmond Jan 17, 2025
249857e
Test PythonLinter with dummy advices
JCZuurmond Jan 16, 2025
ba04281
Test linting unparsable python code
JCZuurmond Jan 16, 2025
18e9c98
Test sequential linter with dummy advices
JCZuurmond Jan 16, 2025
fb8d6ee
Test linting print(1) sets no globals
JCZuurmond Jan 16, 2025
96ef1ec
Test linting with one global
JCZuurmond Jan 16, 2025
e5365ef
Test linting with two globals
JCZuurmond Jan 16, 2025
aed3b12
Test linting separate code sources separates globals
JCZuurmond Jan 16, 2025
ee148eb
Test appending globals sets global
JCZuurmond Jan 16, 2025
84b5f69
Remove SquentialLinter.make_tree
JCZuurmond Jan 16, 2025
d142f7c
Refactor globals linter to fetch globals from body nodes
JCZuurmond Jan 16, 2025
00aad28
Sort globals for consistent testing
JCZuurmond Jan 16, 2025
4c1e79e
Test dummy DFSA Python collector
JCZuurmond Jan 16, 2025
f860a67
Test dummy used table Python collector
JCZuurmond Jan 16, 2025
906ba87
Delete dead code `PythonSequentialLinter.process_child_cell`
JCZuurmond Jan 16, 2025
feb0a8c
Format imports
JCZuurmond Jan 16, 2025
73991ae
Remove Tree from python sequential linter
JCZuurmond Jan 16, 2025
4cf6d8a
Fix type hinting for classmethod with child classes
JCZuurmond Jan 16, 2025
04fda6f
Let tree loading return failure
JCZuurmond Jan 16, 2025
7e16589
Connect cells using parents
JCZuurmond Jan 16, 2025
d2db4b6
Pass inherited tree to notebook linter
JCZuurmond Jan 17, 2025
14b8c45
Format
JCZuurmond Jan 17, 2025
5c28fc9
Disable test that does not reflect a realistic scenario
JCZuurmond Jan 17, 2025
1b8f4b5
Pass run cell's tree as parent to the notebook it is running
JCZuurmond Jan 17, 2025
b7998bc
Do not append child nodes to parents body
JCZuurmond Jan 17, 2025
70aa5bf
Use type over Type from type hinting
JCZuurmond Jan 17, 2025
c81e6b1
Rename attach_nodes to attach_child_nodes in Python analyzer
JCZuurmond Jan 17, 2025
3fb49a7
Delete test for unrealistic scenario
JCZuurmond Jan 17, 2025
1ff9891
Rename method to parse trees
JCZuurmond Jan 17, 2025
273a40d
Add tests for notebook linter
JCZuurmond Jan 17, 2025
7728719
Test for a table migration deprecation advice to be given
JCZuurmond Jan 17, 2025
772684c
Test for notebook cells to consider only code above
JCZuurmond Jan 17, 2025
2f4c7fa
Test inverse of previous commit
JCZuurmond Jan 17, 2025
5441ec5
Test inverse of reading table from other cell
JCZuurmond Jan 17, 2025
d6c0afc
Format
JCZuurmond Jan 17, 2025
435f6c8
Let PythonSequentialLinter inherit correctly
JCZuurmond Jan 17, 2025
fd8f04d
Remove PythonSequentialLinter initialization from NotebookLinter init
JCZuurmond Jan 17, 2025
7209aac
Test NotebookLinter to lint parse failure
JCZuurmond Jan 17, 2025
f2dbc56
Remove redundant if statement
JCZuurmond Jan 18, 2025
2603d14
Format
JCZuurmond Jan 20, 2025
23e5a49
Rewrite load children from tree
JCZuurmond Jan 20, 2025
8cef0ef
Remove redundant for-loops
JCZuurmond Jan 20, 2025
82f3ad1
Remove unused name method
JCZuurmond Jan 20, 2025
ce23e4f
Move load tree from run cell up
JCZuurmond Jan 20, 2025
7176276
Rename methods for consistency
JCZuurmond Jan 20, 2025
768706c
Rename Python tree cache for clarity
JCZuurmond Jan 20, 2025
1f44031
Always cache Python trees
JCZuurmond Jan 20, 2025
b80f2d3
Rename methods for clarity
JCZuurmond Jan 20, 2025
54895fb
Rename variable
JCZuurmond Jan 20, 2025
3a8bb37
Add docstrings
JCZuurmond Jan 20, 2025
f46955e
Remove redundant tree initialization
JCZuurmond Jan 20, 2025
be03c3b
Return failures for each notebook cell
JCZuurmond Jan 20, 2025
3646140
Fix expected start and end line
JCZuurmond Jan 20, 2025
b799076
Fix type hint
JCZuurmond Jan 20, 2025
da3eb20
Move from_source_code class method to tester class
JCZuurmond Jan 20, 2025
cf5679d
Change elif to if
JCZuurmond Jan 20, 2025
63a5d22
Merge branch 'main' into fix/remove-tree-from-python-sequential-linter
JCZuurmond Jan 21, 2025
8113828
Rename inherited tree to parent tree
JCZuurmond Jan 21, 2025
ae206b8
Test creating run cell from notebook
JCZuurmond Jan 21, 2025
4b1528f
Test infer value from parent's child
JCZuurmond Jan 21, 2025
5b98b44
Test Python trees simulating notebook running other notebook
JCZuurmond Jan 21, 2025
2e7663f
Test inferring value from grand parent
JCZuurmond Jan 21, 2025
9701f17
Test using variable from ran child notebook
JCZuurmond Jan 21, 2025
02e8cb3
Test infer from parent using extend globals
JCZuurmond Jan 21, 2025
6f8a819
Test infer from grand parent using extend globals
JCZuurmond Jan 21, 2025
47e3273
Fix test name
JCZuurmond Jan 21, 2025
e800b7e
Test inferring from sibling tree
JCZuurmond Jan 21, 2025
b648b2f
Test simulate using value from child notebook
JCZuurmond Jan 21, 2025
25549ec
Test simulate using value from parent notebook
JCZuurmond Jan 21, 2025
09470e6
Test propagating module with extend globals
JCZuurmond Jan 21, 2025
2a16d90
Let NotebookLinter fail early while parsing
JCZuurmond Jan 21, 2025
5bebad4
Rewrite notebook linter to only extend globals
JCZuurmond Jan 21, 2025
f25174a
Add test showing unresolvable node issue
JCZuurmond Jan 22, 2025
0c67f3e
Pass tree globals to next cells tree
JCZuurmond Jan 22, 2025
a273997
Merge branch 'main' into fix/remove-tree-from-python-sequential-linter
JCZuurmond Jan 22, 2025
6360e7b
Add assumption to docstring
JCZuurmond Jan 23, 2025
54607a6
Merge branch 'main' into fix/remove-tree-from-python-sequential-linter
JCZuurmond Jan 28, 2025
4afb7e9
Add new line to test sources
JCZuurmond Jan 28, 2025
2c0f32b
Get first advice with next
JCZuurmond Jan 28, 2025
c5a7f28
Let process code node return failure
JCZuurmond Jan 28, 2025
02e2254
Merge branch 'main' into fix/remove-tree-from-python-sequential-linter
JCZuurmond Jan 31, 2025
35285e3
Merge branch 'main' into fix/remove-tree-from-python-sequential-linter
JCZuurmond Feb 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions src/databricks/labs/ucx/source_code/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Any, BinaryIO, TextIO
from typing import Any, BinaryIO, TextIO, TypeVar

from astroid import NodeNG # type: ignore
from sqlglot import Expression, parse as parse_sql
Expand Down Expand Up @@ -40,6 +40,9 @@
logger = logging.getLogger(__name__)


T = TypeVar("T", bound="Advice")


@dataclass
class Advice:
code: str
Expand All @@ -66,7 +69,7 @@ def for_path(self, path: Path) -> LocatedAdvice:
return LocatedAdvice(self, path)

@classmethod
def from_node(cls, *, code: str, message: str, node: NodeNG) -> Advice:
def from_node(cls: type[T], *, code: str, message: str, node: NodeNG) -> T:
# Astroid lines are 1-based.
return cls(
code=code,
Expand Down
253 changes: 119 additions & 134 deletions src/databricks/labs/ucx/source_code/notebooks/sources.py

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ def build_inherited_context(self, child_path: Path) -> InheritedContext:
# append nodes
node_line = base_node.node.lineno
nodes = tree.nodes_between(last_line + 1, node_line - 1)
context.tree.attach_nodes(nodes)
context.tree.attach_child_nodes(nodes)
globs = tree.globals_between(last_line + 1, node_line - 1)
context.tree.extend_globals(globs)
last_line = node_line
Expand All @@ -86,7 +86,7 @@ def build_inherited_context(self, child_path: Path) -> InheritedContext:
assert context.tree is not None, "Tree should be initialized"
if last_line < line_count:
nodes = tree.nodes_between(last_line + 1, line_count)
context.tree.attach_nodes(nodes)
context.tree.attach_child_nodes(nodes)
globs = tree.globals_between(last_line + 1, line_count)
context.tree.extend_globals(globs)
return context
Expand Down
91 changes: 20 additions & 71 deletions src/databricks/labs/ucx/source_code/python/python_ast.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
Import,
ImportFrom,
Instance,
JoinedStr,
Module,
Name,
NodeNG,
Expand Down Expand Up @@ -220,30 +221,31 @@ def __repr__(self):
def attach_child_tree(self, tree: Tree) -> None:
"""Attach a child tree.

Attaching a child tree is a **stateful** operation for both the parent and child tree. After attaching a child
tree, a tree can be traversed starting from the parent or child tree. From both starting points all nodes in
both trees can be reached, though, the order of nodes will be different as that is relative to the starting
point.
1. Make parent tree of the nodes in the child tree
2. Extend parents globals with child globals

Attaching a child tree is a **stateful** operation for the child tree. After attaching a child
tree, the tree can be traversed starting from the child tree as a child knows its parent. However, the tree can
not be traversed from the parent tree as that node object does not contain a list with children trees.
"""
if not isinstance(tree.node, Module):
raise NotImplementedError(f"Cannot attach child tree: {type(tree.node).__name__}")
tree_module: Module = cast(Module, tree.node)
self.attach_nodes(tree_module.body)
self.attach_child_nodes(tree_module.body)
self.extend_globals(tree_module.globals)

def attach_nodes(self, nodes: list[NodeNG]) -> None:
"""Attach nodes.
def attach_child_nodes(self, nodes: list[NodeNG]) -> None:
"""Attach child nodes.

Attaching nodes is a **stateful** operation for both this tree's node, the parent node, and the child nodes.
After attaching the nodes, the parent node has the nodes in its body and the child nodes have this tree's node
as parent node.
Attaching a child tree is a **stateful** operation for the child tree. After attaching a child
tree, the tree can be traversed starting from the child tree as a child knows its parent. However, the tree can
not be traversed from the parent tree as that node object does not contain a list with children trees.
"""
if not isinstance(self.node, Module):
raise NotImplementedError(f"Cannot attach nodes to: {type(self.node).__name__}")
self_module: Module = cast(Module, self.node)
for node in nodes:
node.parent = self_module
self_module.body.append(node)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a notable change! It avoids duplicates as adding nodes to the body of the parent tree duplicates the nodes (over all trees)


def extend_globals(self, globs: dict[str, list[NodeNG]]) -> None:
"""Extend globals by extending the global values for each global key.
Expand Down Expand Up @@ -559,6 +561,11 @@ def visit_importfrom(self, node: ImportFrom) -> None:
return
self._matched_nodes.append(node)

def visit_joinedstr(self, node: JoinedStr) -> None:
if self._node_type is not JoinedStr:
return
self._matched_nodes.append(node)

def _matches(self, node: NodeNG, depth: int) -> bool:
if depth >= len(self._match_nodes):
return False
Expand Down Expand Up @@ -674,7 +681,8 @@ def collect_dfsas(self, source_code: str) -> Iterable[DirectFsAccess]:
def collect_dfsas_from_tree(self, tree: Tree) -> Iterable[DirectFsAccessNode]: ...


class PythonSequentialLinter(Linter, DfsaCollector, TableCollector):
class PythonSequentialLinter(PythonLinter, DfsaPyCollector, TablePyCollector):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big clean up in this class!

"""A linter for sequencing python linters and collectors."""

def __init__(
self,
Expand All @@ -685,74 +693,15 @@ def __init__(
self._linters = linters
self._dfsa_collectors = dfsa_collectors
self._table_collectors = table_collectors
self._tree: Tree | None = None

def lint(self, code: str) -> Iterable[Advice]:
maybe_tree = self._parse_and_append(code)
if maybe_tree.failure:
yield maybe_tree.failure
return
assert maybe_tree.tree is not None
yield from self.lint_tree(maybe_tree.tree)

def lint_tree(self, tree: Tree) -> Iterable[Advice]:
for linter in self._linters:
yield from linter.lint_tree(tree)

def _parse_and_append(self, code: str) -> MaybeTree:
maybe_tree = MaybeTree.from_source_code(code)
if maybe_tree.failure:
return maybe_tree
assert maybe_tree.tree is not None
self.append_tree(maybe_tree.tree)
return maybe_tree

def append_tree(self, tree: Tree) -> None:
self._make_tree().attach_child_tree(tree)

def append_nodes(self, nodes: list[NodeNG]) -> None:
self._make_tree().attach_nodes(nodes)

def append_globals(self, globs: dict) -> None:
self._make_tree().extend_globals(globs)

def process_child_cell(self, code: str) -> None:
this_tree = self._make_tree()
maybe_tree = MaybeTree.from_source_code(code)
if maybe_tree.failure:
# TODO: bubble up this error
logger.warning(maybe_tree.failure.message)
return
assert maybe_tree.tree is not None
this_tree.attach_child_tree(maybe_tree.tree)

def collect_dfsas(self, source_code: str) -> Iterable[DirectFsAccess]:
maybe_tree = self._parse_and_append(source_code)
if maybe_tree.failure:
logger.warning(maybe_tree.failure.message)
return
assert maybe_tree.tree is not None
for dfsa_node in self.collect_dfsas_from_tree(maybe_tree.tree):
yield dfsa_node.dfsa

def collect_dfsas_from_tree(self, tree: Tree) -> Iterable[DirectFsAccessNode]:
for collector in self._dfsa_collectors:
yield from collector.collect_dfsas_from_tree(tree)

def collect_tables(self, source_code: str) -> Iterable[UsedTable]:
maybe_tree = self._parse_and_append(source_code)
if maybe_tree.failure:
logger.warning(maybe_tree.failure.message)
return
assert maybe_tree.tree is not None
for table_node in self.collect_tables_from_tree(maybe_tree.tree):
yield table_node.table

def collect_tables_from_tree(self, tree: Tree) -> Iterable[UsedTableNode]:
for collector in self._table_collectors:
yield from collector.collect_tables_from_tree(tree)

def _make_tree(self) -> Tree:
if self._tree is None:
self._tree = Tree.new_module()
return self._tree
Loading