`unnecessary_join` lint #8579

yoav-lavi · 2022-03-24T12:19:10Z

Adds a lint called unnecessary_join that detects cases of .collect::<Vec<String>>.join("") or .collect::<Vec<_>>.join("") on an iterator, suggesting .collect::<String>() instead

Fixes: #8570

This is a reopen of #8573

changelog: add lint [unnecessary_join]

rust-highfive · 2022-03-24T12:19:13Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Manishearth (or someone else) soon.

Please see the contribution instructions for more information.

yoav-lavi · 2022-03-24T12:19:25Z

r? @flip1995

yoav-lavi · 2022-03-24T12:21:51Z

@flip1995 this is the same PR only with a squashed commit

flip1995 · 2022-03-24T12:49:15Z

Thanks! @bors r+

flip1995 · 2022-03-24T12:57:02Z

@bors r+

Seems like bors lost track of this PR. I re-synced bors, let's hope it works this time.

bors · 2022-03-24T12:57:05Z

📌 Commit b60a7fb has been approved by flip1995

bors · 2022-03-24T12:57:11Z

⌛ Testing commit b60a7fb with merge e29b3b5...

bors · 2022-03-24T13:11:05Z

☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test
Approved by: flip1995
Pushing e29b3b5 to master...

flip1995 · 2022-03-24T13:13:54Z

Great first contribution!

I noticed today that I never asked first-time contributors about our dev experience. So:

How was your experience as a first time contributor to Clippy? Did you find the docs helpful? How was your experience with our tooling? Is there anything you found in the docs/tooling that was unclear? Any other things we could improve on?

It would be great if you could give me/us a bit of feedback on (some of) those questions. You can also write me on Zulip/Discord/email, if you prefer. However, don't feel obligated to give us feedback, if you don't have time / don't want to :)

yoav-lavi · 2022-03-24T15:43:05Z

Thank you @flip1995!

It was a good experience! The docs were helpful for getting the project working and working with VSCode / Clion, and cargo dev was very nice to work with.

Here's a list of some things I had to gather from context, some might be due to being relatively new to Rust / compilers (some of these might be in the docs and I missed them):

Naming conventions in the project, e.g. "recv" (recursive? receive? if so, what's recursive or being received?), "ty" (type), "tcx" (T context? the lifetime of the context), "bless" (basically snapshot / approve the current stdout / stderr files)
The types of contexts and which to use (I recall seeing a general definition in the docs but not why you'd chose one or the other when creating a lint)
Where things are (rustc_hir / rustc_ast / rustc_middle etc.)
Method lints aren't normal lints and as such aren't covered by the lint generator directly (might be worth it to make an interactive step there and ask which kind of lint the user wants to create)
The suggestion in the lint implementation is also what's used for automatic fixing
Finding which span is the correct one to use (might be easier if there was some sort of automatic debug / pretty print showing the current match for the lint + which spans cover what within the match, would especially be helpful for with_hi / with_lo situations)
- This point might be related to the fact that debugging mostly involved printing, the tests I ran (I'm assuming) used a binary and didn't stop on breakpoints
Which fields are what when destructing a node (e.g. ExprKind::MethodCall(_path, [join_self_arg, join_arg], _span))
Creating a new test (where, what it should include, etc.)
Not being able to use a raw string in the lint description might be something others run into
When matching a method in method/mod.rs, the name you're matching is last rather than first (e.g. "join" for .collect().join())
The fact that a method isn't being handled in methods/mod.rs doesn't mean it shouldn't be there / it's is specifically for iterators rather than vectors / etc. but rather that it wasn't needed until now
Knowing whether my lint already existed / was relevant

General suggestions:

Example implementations for common lints / operations

Hopefully this helps! I had fun contributing and this is mostly a list of what came to mind and the items aren't in a particular order or were all necessarily large hurdles. I might be able to expand on the list later

flip1995 · 2022-03-24T16:27:41Z

Thanks for the summary!

I will include some of your review once I finally finish the Clippy book. I copied your comment to #6628 to not forget about it.

I can give you some answers now though:

recv = receiver and means the self argument of a method. We have a table of general abbreviations here, but it looks like we should extend this. 'tcx is the lifetime of everything that is connected with type information in rustc. Basically the lifetime of THIR.
Where things are: I guess this is just a matter of getting familiar with the compiler. the rustc_dev_guide is a great place to get a high level overview.

Knowing whether my lint already existed / was relevant

I guess we should monitor and respond to issues better. I guess we don't have the capacity to triage our issues as well as I would like it.

Example implementations for common lints / operations

Common Tools for writing lints should contain common operations for lints. "Common lints" is basically looking at existing lints.

yoav-lavi · 2022-03-24T16:38:31Z

You're welcome @flip1995!

Thanks for the answers!

I will include some of your review once I finally finish the Clippy book. I copied your comment to #6628 to not forget about it.

In what context would it be included?

I guess we should monitor and respond to issues better. I guess we don't have the capacity to triage our issues as well as I would like it.

Didn't mean this as an accusation, more in the direction of possibly creating an index of sort of existing lints (rather than just a list of names, with which items they relate to etc.) and possibly a playground to check if a piece of code is linted using all rules. The Rust playground has clippy but I'm not sure it turns everything on by default.

Edit: The clippy lint list seems to work with keywords like "iterator" so this might already be covered

Common Tools for writing lints should contain common operations for lints. "Common lints" is basically looking at existing lints.

Not sure I saw this when working on the lint, will take a deeper look. Regarding common lints - You can look at existing code, but having a few examples that are indexed by what they're doing would probably be helpful (rather than trying to find lints which do what you want) (the previous point might tie into this using the indexed list of lints).

flip1995 · 2022-03-24T16:48:11Z

In what context would it be included?

I plan to rewrite some of our dev decumentation to make it easier finding things. I will use your feedback as some pointers where I might want to start with that.

Didn't mean this as an accusation, more in the direction of possibly creating an index of sort of existing lints

No worries, I didn't took it that way.

We have the Lint documentation where searching for some keywords like "join" would show you that such a lint didn't exist. Maybe we could improve the search, but I'm not sure how further improve that.

Not sure I saw this when working on the lint, will take a deeper look.

If you find things are missing, that you feel would have been useful there, feel free to open a PR adding those or leave a comment in #6628

yoav-lavi · 2022-03-24T16:50:04Z

Sounds good!

We have the Lint documentation where searching for some keywords like "join" would show you that such a lint didn't exist. Maybe we could improve the search, but I'm not sure how further improve that.

I just noticed that that works and edited the original, that does cover the general usecase!

yoav-lavi · 2022-03-25T19:20:53Z

@flip1995 something else I ran into today and might be worth considering:

It seems like the pedantic category isn't always known to users

flip1995 · 2022-03-25T19:29:33Z

Thanks for pointing this out! We have that kind of on the radar and will also improve on that with the Clippy book. (If only the day had more than 24 hours, I would've already finished that 😐)

It's done by adding #![warn(clippy::pedantic)] to your lib.rs/main.rs file, btw. :)

yoav-lavi · 2022-03-25T19:51:05Z

Yeah the first comment on that question had the answer (which has eerily similar wording to your comment here haha):

I ran into https://reddit.com/r/rust/comments/a4wblu/how_to_configure_clippy_to_be_as_annoying_as/ a while back when looking up how to add more lint categories (this isn't my comment), but thought to share that with you since it may indicate a certain lack of visibility for non-default lints.

(If only the day had more than 24 hours, I would've already finished that 😐)

Not at all meant as criticism! Open source is hard, often thankless work done for free and from good intentions, my point is to bring new information to help the project rather than imply anything

flip1995 · 2022-03-25T19:57:27Z

Not at all meant as criticism! Open source is hard, often thankless work done for free and from good intentions, my point is to bring new information to help the project rather than imply anything

Thanks a lot for this, that really helps, even though I can't act on it right away! I definitely do not take your comments as if they were bad intended. I just have a 10 hour work day behind me and don't have the patience to put more time in formulating my replies better. 😅 Thanks again for your feedback. That will really help! 🚀

yoav-lavi · 2022-03-25T20:03:35Z

That's completely fine, just wanted to make sure I'm not sending across the wrong message.

Have a great weekend!

matthiasbeyer · 2022-03-26T09:57:24Z

I found this via discu.eu - very useful lint indeed.

May I ask: This seems to only be applicable if the join() argument is an empty string, right?
If I want to join a string by something else, say ", ", what would I rather write:

iter.collect::<Vec<String>>().join(", ");

// or
iter.intersperse(", ").collect::<String>()

I like to believe that the second version is more performant, but who I am to just guess this? Either way: Maybe there's potential for another lint?

yoav-lavi · 2022-03-26T10:50:14Z

Interesting! You may be onto something. I'm not next to a computer right now but I would benchmark the two using a few different variations to check. Notice that intersperse is nightly though so it may make sense to wait for it to be stable. We should let the maintainers weigh in though (cc @flip1995)

Edit: regarding your question, the current version supports "" specifically

matthiasbeyer · 2022-03-26T11:06:41Z

Notice that intersperse is nightly though so it may make sense to wait for it to be stable.

Ah, I didn't even notice this. Maybe though even some other iterator-chaining is more performant. I will also do some benchmarks (will be a nice learning experience) and post them here.

matthiasbeyer · 2022-03-26T11:23:20Z

So writing the benchmark was less complex than I thought. Maybe I did something wrong, because I never wrote benchmarks for rust code before. Here's the playground, it is implemented on nightly.

I get varying results though:

    Finished bench [optimized] target(s) in 0.01s
     Running unittests src/lib.rs (target/release/deps/bench_join-5b81b7a4eb2ee675)

running 4 tests
test bench_collect_join             ... bench:         146 ns/iter (+/- 7)
test bench_intersperce_collect_join ... bench:         200 ns/iter (+/- 7)
test bench_intersperce_manual_join  ... bench:         301 ns/iter (+/- 8)
test bench_manual_join              ... bench:         103 ns/iter (+/- 2)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out; finished in 12.13s

On my machine, the "manual join" implementation (the one with collect().join("") seems to produce the fastest code. When using the intersperse() function, the collect::<String>() implementation is way faster. Of course the problem size is really small here, maybe I need to provide a larger problem to the implementations.

matthiasbeyer · 2022-03-26T11:32:31Z

Ah yes, a bigger problem yields better visibility in the results:

running 4 tests
test bench_collect_join             ... bench:     570,037 ns/iter (+/- 64,868)
test bench_intersperce_collect_join ... bench:     699,265 ns/iter (+/- 39,398)
test bench_intersperce_manual_join  ... bench:   1,205,594 ns/iter (+/- 49,033)
test bench_manual_join              ... bench:     823,865 ns/iter (+/- 23,774)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out; finished in 7.41s

with (1..10_000) as problem on my machine.

yoav-lavi · 2022-03-26T11:50:37Z

The first result might be due to loop unrolling, does it work differently with iter or with a slightly longer input? Or when using a vec rather than an array?

matthiasbeyer · 2022-03-26T11:57:17Z

Might be, but a quick test (using vec![...] instead of [...]) showed similiar results.

yoav-lavi · 2022-03-26T12:09:38Z

I'll be able to check this later on but we did run into cases where the compiler unrolls the manual join but not the collect join, makes sense to me that this is the case here

yoav-lavi · 2022-03-26T19:52:26Z

@matthiasbeyer I get correct results:

use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn criterion_benchmark(criterion: &mut Criterion) {
    let mut benchmark_group = criterion.benchmark_group("unnecessary_join");

    benchmark_group.bench_function("collect and join", |bencher| {
        bencher.iter(|| {
            black_box(["1", "2", "3", "4", "5"])
                .into_iter()
                .map(String::from)
                .collect::<Vec<String>>()
                .join("")
        })
    });

    benchmark_group.bench_function("collect only", |bencher| {
        bencher.iter(|| {
            black_box(["1", "2", "3", "4", "5"])
                .into_iter()
                .map(String::from)
                .collect::<String>()
        })
    });
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

Benchmarking unnecessary_join/collect and join: Collecting 100 samples in estimated                                                                                     unnecessary_join/collect and join                        
                        time:   [198.64 ns 199.32 ns 199.96 ns]
Benchmarking unnecessary_join/collect only: Collecting 100 samples in estimated 5.00                                                                                    unnecessary_join/collect only                        
                        time:   [157.65 ns 157.98 ns 158.32 ns]

yoav-lavi · 2022-03-26T20:05:28Z

@matthiasbeyer with the other benchmarks:

(MacBook Pro (14-inch, 2021), 8 core M1 Pro, 16 GB RAM)

use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn criterion_benchmark(criterion: &mut Criterion) {
    let mut benchmark_group = criterion.benchmark_group("unnecessary_join");

    benchmark_group.bench_function("array_into_iter/collect_and_join", |bencher| {
        bencher.iter(|| {
            black_box(["1", "2", "3", "4", "5"])
                .into_iter()
                .map(String::from)
                .collect::<Vec<String>>()
                .join("")
        })
    });

    benchmark_group.bench_function("array_into_iter/collect_only", |bencher| {
        bencher.iter(|| {
            black_box(["1", "2", "3", "4", "5"])
                .into_iter()
                .map(String::from)
                .collect::<String>()
        })
    });

    benchmark_group.bench_function("vec_iter/collect_and_join", |bencher| {
        bencher.iter(|| {
            let vector = black_box(vec!["hello", "world"]);
            let _output = vector
                .iter()
                .map(|item| item.to_uppercase())
                .collect::<Vec<String>>()
                .join("");
        })
    });

    benchmark_group.bench_function("vec_iter/collect_only", |bencher| {
        bencher.iter(|| {
            let vector = black_box(vec!["hello", "world"]);
            let _output = vector
                .iter()
                .map(|item| item.to_uppercase())
                .collect::<String>();
        })
    });

    benchmark_group.bench_function("deno/collect_and_join", |bencher| {
        bencher.iter(|| {
            let url = black_box("https://google.com".to_owned());
            let split_specifier = url.as_str().split(':');
            let _ = split_specifier.skip(1).collect::<Vec<_>>().join("");
        })
    });

    benchmark_group.bench_function("deno/collect_only", |bencher| {
        bencher.iter(|| {
            let url = black_box("https://google.com".to_owned());
            let split_specifier = url.as_str().split(':');
            let _ = split_specifier.skip(1).collect::<String>();
        })
    });

    benchmark_group.bench_function("not_unrolled/collect_and_join", |bencher| {
        bencher.iter(|| {
            let _ = black_box(std::iter::repeat("'a, "))
                .take(10)
                .collect::<Vec<_>>()
                .join("");
        })
    });

    benchmark_group.bench_function("not_unrolled/collect_only", |bencher| {
        bencher.iter(|| {
            let _ = black_box(std::iter::repeat("'a, "))
                .take(10)
                .collect::<String>();
        })
    });
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

unnecessary_join/array_into_iter/collect_and_join                        
                        time:   [197.77 ns 197.92 ns 198.06 ns]    
unnecessary_join/array_into_iter/collect_only                        
                        time:   [160.49 ns 160.86 ns 161.24 ns]                                                                            

unnecessary_join/vec_iter/collect_and_join                        
                        time:   [156.61 ns 156.72 ns 156.83 ns]       
unnecessary_join/vec_iter/collect_only                        
                        time:   [118.48 ns 118.53 ns 118.59 ns]                                                                           

unnecessary_join/deno/collect_and_join                        
                        time:   [93.969 ns 94.001 ns 94.041 ns]      
unnecessary_join/deno/collect_only                        
                        time:   [70.695 ns 70.762 ns 70.835 ns]                                                                            

unnecessary_join/not_unrolled/collect_and_join                        
                        time:   [90.768 ns 90.830 ns 90.920 ns]   
unnecessary_join/not_unrolled/collect_only                        
                        time:   [139.73 ns 139.87 ns 140.02 ns]

The last benchmark being an example where the loop doesn't unroll when using collect::<String>()

yoav-lavi · 2022-03-26T20:15:52Z

It should yield a ~22% improvement in normal cases according to this benchmark

yoav-lavi · 2022-03-26T20:38:32Z

@matthiasbeyer It seems that when compiling these to x86_64-apple-darwin and running under Rosetta 2 the first two benchmarks are slightly in favor of collect and join, I think that's what you're seeing:

unnecessary_join/array_into_iter/collect_and_join                        
                        time:   [326.13 ns 326.82 ns 327.52 ns]
unnecessary_join/array_into_iter/collect_only                        
                        time:   [347.46 ns 347.93 ns 348.39 ns]
      
unnecessary_join/vec_iter/collect_and_join
                        time:   [261.55 ns 262.66 ns 263.74 ns]
unnecessary_join/vec_iter/collect_only                        
                        time:   [285.76 ns 286.24 ns 286.70 ns]     

unnecessary_join/deno/collect_and_join                        
                        time:   [169.84 ns 170.09 ns 170.32 ns]   
unnecessary_join/deno/collect_only                        
                        time:   [134.34 ns 134.47 ns 134.60 ns]

unnecessary_join/not_unrolled/collect_and_join                        
                        time:   [185.72 ns 185.91 ns 186.10 ns]
unnecessary_join/not_unrolled/collect_only                        
                        time:   [281.17 ns 281.30 ns 281.48 ns]

This is the assembly diff for an example that behaves like the first benchmark on x86_64: https://rust.godbolt.org/z/GKqhq5o3a

@flip1995 let me know if you think we should change the wording

matthiasbeyer · 2022-03-27T08:01:48Z

Thanks for helping me understanding what is going on here! 👍

flip1995 · 2022-04-06T08:08:38Z

@flip1995 let me know if you think we should change the wording

Maybe "usually more performant" -> "might be more performant" and "in most cases" -> "sometimes".

yoav-lavi · 2022-04-06T09:31:33Z

@flip1995 should I create a new PR?

flip1995 · 2022-04-06T09:32:20Z

That would be great!

yoav-lavi · 2022-04-06T09:45:24Z

@flip1995 I know some that people / projects use the pedantic category entirely rather than on a case basis, would it be a concern that those people / projects would have this change made automatically?

It's sort of contrary to the description of the category but I want to make sure we don't need to do anything specific regarding those cases

update unnecessary_join documentation changelog: none Updates the description of `unnecessary_join` in accordance with #8579 (comment). I've also added a line regarding differences in assembly output, please let me know if it should also make it in.

unnecessary_join lint

b60a7fb

rust-highfive assigned Manishearth Mar 24, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Mar 24, 2022

rust-highfive assigned flip1995 and unassigned Manishearth Mar 24, 2022

bors merged commit e29b3b5 into rust-lang:master Mar 24, 2022

flip1995 mentioned this pull request Mar 24, 2022

[Roadmap] Improve Clippy documentation #6628

Open

4 tasks

This was referenced Apr 6, 2022

unnecessary_join documentation update #8640

Closed

update unnecessary_join documentation #8644

Merged

blyxyas mentioned this pull request Jan 13, 2023

Compiler optimization suggestion - Iter<String>.collect::<Vec<String>>().join("") rust-lang/rust#95165

Closed

unnecessary_join lint #8579

unnecessary_join lint #8579

Uh oh!

Conversation

yoav-lavi commented Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Mar 24, 2022

Uh oh!

yoav-lavi commented Mar 24, 2022

Uh oh!

yoav-lavi commented Mar 24, 2022

Uh oh!

flip1995 commented Mar 24, 2022

Uh oh!

flip1995 commented Mar 24, 2022

Uh oh!

bors commented Mar 24, 2022

Uh oh!

bors commented Mar 24, 2022

Uh oh!

bors commented Mar 24, 2022

Uh oh!

flip1995 commented Mar 24, 2022

Uh oh!

yoav-lavi commented Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flip1995 commented Mar 24, 2022

Uh oh!

yoav-lavi commented Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flip1995 commented Mar 24, 2022

Uh oh!

yoav-lavi commented Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yoav-lavi commented Mar 25, 2022

Uh oh!

flip1995 commented Mar 25, 2022

Uh oh!

yoav-lavi commented Mar 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flip1995 commented Mar 25, 2022

Uh oh!

yoav-lavi commented Mar 25, 2022

Uh oh!

matthiasbeyer commented Mar 26, 2022

Uh oh!

yoav-lavi commented Mar 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthiasbeyer commented Mar 26, 2022

Uh oh!

matthiasbeyer commented Mar 26, 2022

Uh oh!

matthiasbeyer commented Mar 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yoav-lavi commented Mar 26, 2022

Uh oh!

matthiasbeyer commented Mar 26, 2022

Uh oh!

yoav-lavi commented Mar 26, 2022

Uh oh!

yoav-lavi commented Mar 26, 2022

Uh oh!

yoav-lavi commented Mar 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yoav-lavi commented Mar 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yoav-lavi commented Mar 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthiasbeyer commented Mar 27, 2022

Uh oh!

`unnecessary_join` lint #8579

`unnecessary_join` lint #8579

yoav-lavi commented Mar 24, 2022 •

edited

Loading

yoav-lavi commented Mar 24, 2022 •

edited

Loading

yoav-lavi commented Mar 24, 2022 •

edited

Loading

yoav-lavi commented Mar 24, 2022 •

edited

Loading

yoav-lavi commented Mar 25, 2022 •

edited

Loading

yoav-lavi commented Mar 26, 2022 •

edited

Loading

matthiasbeyer commented Mar 26, 2022 •

edited

Loading

yoav-lavi commented Mar 26, 2022 •

edited

Loading

yoav-lavi commented Mar 26, 2022 •

edited

Loading

yoav-lavi commented Mar 26, 2022 •

edited

Loading

yoav-lavi commented Apr 6, 2022 •

edited

Loading