-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove tree from PythonSequentialLinter
#3535
base: main
Are you sure you want to change the base?
Conversation
60bcf2e
to
1e7a4b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asnare and @pritishpai : Please review this PR after #3550. I expect to create more PRs to continue to improve the linter
return | ||
for cell in self._notebook.cells: | ||
if not self._context.is_supported(cell.language.language): | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why: #3544
@@ -129,163 +125,162 @@ class NotebookLinter: | |||
to the code cells according to the language of the cell. | |||
""" | |||
|
|||
@classmethod | |||
def from_source( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was only used by tests, it should not be the way to create the notebook linter in UCX
""" | ||
if not isinstance(self.node, Module): | ||
raise NotImplementedError(f"Cannot attach nodes to: {type(self.node).__name__}") | ||
self_module: Module = cast(Module, self.node) | ||
for node in nodes: | ||
node.parent = self_module | ||
self_module.body.append(node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a notable change! It avoids duplicates as adding nodes to the body of the parent tree duplicates the nodes (over all trees)
@@ -668,7 +675,8 @@ def collect_dfsas(self, source_code: str) -> Iterable[DirectFsAccess]: | |||
def collect_dfsas_from_tree(self, tree: Tree) -> Iterable[DirectFsAccessNode]: ... | |||
|
|||
|
|||
class PythonSequentialLinter(Linter, DfsaCollector, TableCollector): | |||
class PythonSequentialLinter(PythonLinter, DfsaPyCollector, TablePyCollector): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big clean up in this class!
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %run ./test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic run was missing in example notebook test, combined with the expected output in the yaml below
@@ -244,7 +244,6 @@ def test_functional(sample: Functional, mock_path_lookup, simple_dependency_reso | |||
("_child_that_uses_missing_value.py", "parent_that_dbutils_runs_child_that_misses_value_from_parent.py"), | |||
("_child_that_uses_value_from_parent.py", "grand_parent_that_dbutils_runs_parent_that_magic_runs_child.py"), | |||
("_child_that_uses_missing_value.py", "parent_that_imports_child_that_misses_value_from_parent.py"), | |||
("_child_that_uses_value_from_parent.py", "grand_parent_that_imports_parent_that_magic_runs_child.py"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is removed as it tests a non-existing situation: a file imports a notebook that runs another notebook. It does not work because the imported notebook is considered to be a file because our import resolver always returns files and not notebooks (as it should)
def test_notebook_linter_name(mock_path_lookup) -> None: | ||
source = """-- Databricks notebook source""" | ||
linter = NotebookLinter.from_source(index, mock_path_lookup, CurrentSessionState(), source, Language.SQL) | ||
assert linter.name() == "notebook-linter" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused method/attribute outside this unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly fine, although some minor things I'd like clarified. (Not requesting changes because I don't want to block merging if someone else approves, but also don't think it's ready to merge just yet.)
code_path_nodes = self._list_magic_lines_with_run_command(tree) + SysPathChange.extract_from_tree( | ||
self._session_state, tree | ||
) | ||
maybe_tree = MaybeTree(None, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be:
maybe_tree = MaybeTree(None, None) | |
maybe_tree: MaybeTree |
That will then let the linter expose a problem: if there are no code_path_nodes
to iterate over then we return an invalid instance.
What should we return in that case? (Can it be ruled out… do we always have nodes to iterate over?)
Finally, why do we return the last tree? Is it somehow more important than any earlier ones? (If we don't care, why return it? In that case the return type should probably be Failure | None
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On your first point, I updated the signature to be MaybeTree | None
, which works - for now.
On the "why do we return the last tree", that logic is t.b.d.. At least, this PR resolves a bug with the notebook linting and adds (much) more unit tests. At the same time, I do not know exactly (yet) how the Python trees should be linked/merged for all to work as expected.
So, currently, it returns the last tree as the trees are sort of chained, e.g. the second notebook can go "up" to the first cell and the third cell can go "up" to the second (and via the second cell "find" the first cell). It becomes more complicated when introducing new notebooks with the run
magic. That is the part I am not sure about yet, but I am more confident than before this PR as it introduces more tests
Changes
Remove tree from
PythonSequentialLinter
as the sequential linter should just sequence linting, not be used as an intermediate for manipulating the code tree.PythonSequentialLinter
NotebookLinter
to do the (notebook) tree manipulation instead:_load_tree_from_notebook
early return onFailure
similarly to dependency graph building: if we cannot resolve the code used by a notebook then fail early%run
notebook trees a child tree to the cell that calls the notebookLinked issues
Resolves #3543
Progresses #3514
Linked PRs
Stacked on :
Tree
methods for clarity #3524Requires :
NotebookLinter._load_source_from_run_cell
#3529MaybeTree
the main Python AST entrypoint for constructing the syntax tree #3550Functionality
databricks labs ucx lint-local-code
Tests