Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AddImportsVisitor: add imports before the first non-import statement #1024

Merged
merged 5 commits into from
Oct 1, 2023

Conversation

andrecsilva
Copy link
Contributor

Summary

Proposed fix for #1023. It changes AddImportsVisitor so it only considers adding new imports after/at the first import block before any other statements, if any.

Test Plan

Added a few tests to test_add_imports.py

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 22, 2023
@codecov-commenter
Copy link

codecov-commenter commented Sep 23, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.05%. Comparing base (37277e5) to head (80dcbf8).
Report is 107 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1024      +/-   ##
==========================================
+ Coverage   91.04%   91.05%   +0.01%     
==========================================
  Files         255      255              
  Lines       26366    26414      +48     
==========================================
+ Hits        24004    24052      +48     
  Misses       2362     2362              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@zsol zsol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current behavior is useful in cases like:

from a import side_effect
a()

from b import foo
...
bar()

When we request the import for b.bar, the current behavior extends the second ImportFrom statement, but with this PR, we would end up with:

from a import side_effect
from b import bar
a()

from b import foo
...
bar()

Which is at least a bit unfortunate. It's also unfortunate that we don't capture this in any test case.

I think my main concern stems from the fact that we'd be always inserting imports before the first non-import statement even if it isn't necessary. If you could change the PR to only apply the new behavior when the old would obviously break code, I'd be happy to accept it

Comment on lines 20 to 32
class _GatherTopImportsBeforeStatements(GatherImportsVisitor):
"""
Works similarly to GatherImportsVisitor, but only considers imports
declared before any other statements of the module with the exception
of docstrings and __strict__ flag.
"""

def leave_Module(self, original_node: libcst.Module) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a great use of inheritance, _GatherTopImportsBeforeStatements doesn't really specialize the behavior of GatherImportsVisitor, it completely changes it.

A better way to model this would be to extract the two _handle_... methods into a mixin class and let both visitors inherit from that.

Then _GatherTopImportsBeforeStatements won't need to override visit_ImportFrom and visit_Import with dummy implementations (and potentially others in the future). Instead you could just implement visit_Module and return False at the end. Bonus points for making it significantly faster for large documents.

libcst/codemod/visitors/_add_imports.py Outdated Show resolved Hide resolved
libcst/codemod/visitors/_add_imports.py Outdated Show resolved Hide resolved
libcst/codemod/visitors/_add_imports.py Outdated Show resolved Hide resolved
@andrecsilva
Copy link
Contributor Author

I think the current behavior is useful in cases like:

from a import side_effect
a()

from b import foo
...
bar()

When we request the import for b.bar, the current behavior extends the second ImportFrom statement, but with this PR, we would end up with:

from a import side_effect
from b import bar
a()

from b import foo
...
bar()

Which is at least a bit unfortunate. It's also unfortunate that we don't capture this in any test case.

I think my main concern stems from the fact that we'd be always inserting imports before the first non-import statement even if it isn't necessary. If you could change the PR to only apply the new behavior when the old would obviously break code, I'd be happy to accept it

Thanks for pointing the issues. I've hopefully fixed most of them with the last commit.

I do have some questions about what is the expected behavior of AddImportsVisitor. I understand we want to avoid "duplicated" imports as you've pointed out, but the issue seems a bit more complex.

First, note that the current behavior is a bit more problematic than it seems. Consider the following and suppose we want to add from x import bar:

foo()
def f():
    from x import a
    pass
from x import b
bar()
from x import c

The current behavior would add bar to from x import a. It does so because it finds the first from x import ... from the visit_ImportFrom perspective and adds the new import there.

Now consider:

foo()
def f():
    from x import a
    bar()
from x import b
foo()
from x import c
bar()

What do you feel is the best place(s) to add bar import(s) here?

Do we want it to always add it to the "nearest" import block? or do we want a single import that covers all references to bar()?

Nevertheless, I feel that a general solution that cover these properties would be far more complex than what we have and the proposed solution in this PR is a good compromise.

Of course, if you have a good/simple solution for this, I'd be happy to accept suggestions.

@zsol
Copy link
Member

zsol commented Sep 30, 2023

I agree, AddImportsVisitor won't be able to do a perfect job all the time unless we include expensive scope analysis into it (and even then, there's only so much we can do). It's a balance. I do think the current behavior strikes a pretty good balance, and we should only add support for edge cases if they don't compromise on that balance.

And to directly answer your question: no, I have no solution for the cases you point out that wouldn't make AddImportsVisitor significantly more expensive.

The changes in this PR look good at a quick glance, but give me a bit more time to properly review them again, before merging.

@zsol zsol changed the title Add imports fix AddImportsVisitor: add imports before the first non-import statement Oct 1, 2023
@zsol
Copy link
Member

zsol commented Oct 1, 2023

Thanks for putting this together, @andrecsilva!

@zsol zsol merged commit f81cc8d into Instagram:main Oct 1, 2023
23 of 24 checks passed
manmartgarc pushed a commit to manmartgarc/LibCST that referenced this pull request Oct 3, 2023
…nstagram#1024)

* AddImportsVisitor will now only add at the top of module

- Also added new tests to cover these cases

* Fixed an issue with from imports

* Added a couple tests for AddImportsVisitor

* Refactoring of GatherImportsVisitor

* Refactors, typos and typing changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants