-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement uninit_vec lint #7682
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @xFrednet (or someone else) soon. Please see the contribution instructions for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welcome to the project! I've done a first review of the changes. The general logic is already checking for the correct conditions. 👍 My comments are mostly related to improvements of this logic and refactorings to catch more cases 🙃
Let me know if anything is unclear, or you need help with anything. 🙃
Note: You can also assign yourself to the linked issue by commenting @rustbot claim
on it. That's just to make sure that no one else also starts to work on the issue. :)
clippy_lints/src/uninit_vec.rs
Outdated
|
||
impl<'tcx> LateLintPass<'tcx> for UninitVec { | ||
fn check_block(&mut self, cx: &LateContext<'tcx>, block: &'tcx Block<'_>) { | ||
for w in block.stmts.windows(2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only catches cases where Vec::with_capacity(..);
is directly followed by vec.set_len(..);
. This will miss cases where the user has some additional statements in between. In Clippy we try to handle such lints by first finding the value initialization (in this case let vec = Vec::with_capacity(..);
) and then inspecting how the value is used in the rest of the block.
The initialization would be found by checking each local with check_local
, similar to how you're current doing it. We then have two options to check for the usage of the value:
- A
Visitor
can be used to check each expression inside a given block. Here you could get the containing block of the value declaration and walk each expression. You could then lint, if the first usage of the value is actually a call tovec.set_len(..);
- The second option is to store the ID of the value in
Self
and then check for the usage in this lint pass, which is essentially also just a visitor.
A great and hopefully not too confusing example is thevec_init_then_push
lint. You can find the code inclippy_lints/src/vec_init_then_push.rs
Option 2 would be a bit better performance wise, but can be a bit more complex. You can also try to implement this using a Visitor
. Let me know if you want to also have some example code for this 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only catches cases where
Vec::with_capacity(..);
is directly followed byvec.set_len(..);
This was an intentional decision to reduce false positives. The statements between them might be a legit initialization:
self.vec.reserve(1000);
self.init_vec(); // correctly initializes `self.vec`, e.g., by using `mem::write()`
self.vec.set_len(1000); // this call is now correct
For statistics, checking directly adjacent statements are enough to detect most of incorrect usage of set_len()
. We have a list of 30 soundness bugs that fall into this category, and 28 of them can be caught with the current implementation. The other two patterns that are not detected are:
let mut a = Vec::with_capacity(len);
let mut b = Vec::with_capacity(len);
unsafe {
a.set_len(len);
b.set_len(len);
}
and
let mut vec = Vec::with_capacity(len);
unsafe {
vec.set_len(len);
another_unsafe_fn(&mut vec);
}
I think I can slightly modify the block handling logic to detect the second case. Then, it would detect 29 out of 30 cases. Would that be good enough, or would you still prefer me to try detecting the first case?
My concern is the implementation complexity. (1) The block handling logic would become complicated because with_capacity()/reserve()
and set_len()
might not be in the same block due to unsafe
blocks. (2) The visitor needs to detect whether the target Vec
is saved to a local variable or to the field of a struct. Only the first case would be safe to detect with the visitor due to the false positives mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, from the example, I see what you mean. Cases where the Vec
is stored inside a struct or tuple would not be caught, that's correct. Most lints related to sequential usage only work for locals, as far as I know. Do you know how many of the mentioned cases, used the vector as a part of a struct or tuple?
(1) The block handling logic would become complicated because
with_capacity()/reserve()
andset_len()
might not be in the same block due tounsafe
blocks.
The Visitor or LintPass both also visit nested blocks. They would therefore also work with unsafe blocks or more complicated logic.
(2) The visitor needs to detect whether the target
Vec
is saved to a local variable or to the field of a struct. Only the first case would be safe to detect with the visitor due to the false positives mentioned above.
Hmm, I think the best way to handle this would be to track the local that the vector is stored in and check for the first usage. This would mean the vector itself if it's a local value or the struct that the vector is stored in. This shouldn't add too much additional complexity (I believe...).
However, I'm not totally sure on this. @flip1995 could you maybe take a look at this? 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know how many of the mentioned cases, used the vector as a part of a struct or tuple?
They were roughly 15% of the bugs.
The Visitor or LintPass both also visit nested blocks. They would therefore also work with unsafe blocks or more complicated logic.
Hmm, I think the best way to handle this would be to track the local that the vector is stored in and check for the first usage.
Oh, now I see what you were suggesting by "The second option is to store the ID of the value in Self
". I was initially confused by vec_init_then_push
example because it handles neither nested blocks nor in-between statements. Is it correct that you were providing it as an example of how to use a lint pass as a visitor while maintaining the state in self
, not necessarily as an example for implementing the visitor logic itself?
I understand what I need to do in that case. However, I think I might still prefer the current version over the version that checks in-between statements due to the simplicity and possible false positives. Would you mind if I open a separate PR for the visitor version that checks in-between statements? Then maybe we can compare the two versions and merge the one that looks better. I'll start working on it after addressing the comments about the lint messages below.
In addition, could you recommend a lint example that handles in-between statements? I was thinking of using HashMap
that maps the HirId
of Local
to its tracked status, and I was wondering if there is something that I can reuse from the existing lints. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know how many of the mentioned cases, used the vector as a part of a struct or tuple?
They were roughly 15% of the bugs.
In that case, we should definitely take care to also cover those cases. We can later use our lintcheck
tool to run Clippy on them to make sure that they are all caught by this lint 🙃
Oh, now I see what you were suggesting by "The second option is to store the ID of the value in
Self
". I was initially confused byvec_init_then_push
example because it handles neither nested blocks nor in-between statements. Is it correct that you were providing it as an example of how to use a lint pass as a visitor while maintaining the state inself
, not necessarily as an example for implementing the visitor logic itself?
Correct, sorry if the wording was a bit confusing. Clippy uses the internal Visitor
API from rustc. I found two examples that both check for the usage of a value (example 1, example 2). Note that both use path_to_local_id
as they are only tracking local values. You'll need to make sure that the tracked value is set to the base where the vector is stored. This will most likely require some additional logic to handle Field
expressions.
Would you mind if I open a separate PR for the visitor version that checks in-between statements?
I wouldn't mind at all, that would be a nice way to compare the two versions. If you do, you can request me as a reviewer by including r? @xFrednet
in your PR description 🙃
In addition, could you recommend a lint example that handles in-between statements? I was thinking of using
HashMap
that maps theHirId
ofLocal
to its tracked status, and I was wondering if there is something that I can reuse from the existing lints. Thanks!
If we handle in-between statements, we do so in the described fashion with either a Visitor
in example 1 and example 2 or a bit more advances by using the LintPass
directly, as can be seen in the vec_init_then_push
example. In case of Visitors, we usually use one per local value that could potentially trigger the lint.
I'm not totally sure where you want to use a HashMap
. Could you maybe describe what you were thinking of? 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use a separate visitor struct, we can create a visitor for each local variable (e.g., the use of id
field in needless_collect lint) so HashMap
wouldn't be needed in this case. However, it would have a quadratic performance in terms of # locals/block size. This seems fine since a visitor will be only created for each with_capacity()/reserve()
. I would probably implement another PR in this way.
If we use LintPass
itself as a visitor and handle all variables in a single loop, it would need to manage a state for each variable and might need a data structure like HashMap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. It would be interesting to do some profiling to compare a Visitor
implementation to a LintPass
implementation. This lint can choose either way, due to the fact that the actual logic for each expression is relatively simple. 🙃
I would recommend that you choose the method that seems more readable to you 🙃
Hmm, the CI failure seems like a false positive in the internal lint. I can't collapse |
True, that seems to be a false positive, you can simply allow the internal lint, with an |
I handled the PR comments, and this one ready to be reviewed again. I'll create another PR for an alternative implementation of this lint that checks in-between statements as discussed in the comment. |
Thank you for the update, I'll review it over the week. It's also awesome that you've already created an issue for the FP of our internal lint 👍 Small note regarding your last commit message: we usually try to avoid linking issues in commit messages, as that'll add a new reference when you would need to rebase your branch. And it'll also reference a Rust issue once we sync commits between this repo and the compiler repo. You can leave it in, that's just a note for future commits 🙃 |
☔ The latest upstream changes (presumably #7673) made this pull request unmergeable. Please resolve the merge conflicts. |
I thought a bit about the Visitor/LintPass vs You can prevent FP by checking if the assignee is used in between the So even though I like the Visitor version more, I'd be fine with the |
There are some use cases like recursive parsing, where the outer function allocates a buffer for the inner function and the inner function initializes it. I think preventing such FP is necessary when we take the visitor-based implementation (especially because the suggested category is "correctness").
I was thinking about using the originating place of projections and being conservative. For instance,
Merging this PR first and working on the visitor version based on it might be better than having two separate PRs for me as well. How about that @xFrednet? |
This conservative approach sounds good to me and was also the one I was thinking of 👍
Sure, that sounds good to me 👍. I want to take a final look before merging it. I should be able to review it on Monday 🙃
Could you add this documentation to the code as a preparation? 🙃 @rustbot label -S-waiting-on-author +S-waiting-on-review |
SGTM 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some smaller NITs that should be easy to fix. Please also add a note to the docs that the lint currently only supports statements that are next to each other. Then it's ready to be merged IMO. Thank you for the contributions 🙃
Addressed PR comments, need feedback on #7682 (comment) |
☔ The latest upstream changes (presumably #7773) made this pull request unmergeable. Please resolve the merge conflicts. |
Added testcases for |
Forgot to address the merge conflict 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, there is one last NIT, and then I think it's ready to be merged! 👍
Description updated! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good to me, thank you very much for all the work you put into this lint! I really appreciate it 🙃
@bors r+ 🎉 |
📌 Commit 4ed3a4f has been approved by |
☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test |
Thank you as well for the helpful and detailed reviews :) |
changelog: add the new lint [
uninit_vec
]Fix #7681