Skip to content

[Tracking] Rollout of new lint clippy::needless_pass_by_value in all datafusion crates #18503

@2010YOUY01

Description

@2010YOUY01

Is your feature request related to a problem or challenge?

clippy::needless_pass_by_value is a non-default clippy lint rule, however this rule is quite useful for preventing certain unnecessary clones.
See more about supporting more non-default Clippy rules for datafusion in #18467
The example PR to enforce this rule in datafusion-optimizer crate: #18533

Task: Enforce needless_pass_by_value rule to all crates

Rationale

clippy::needless_pass_by_value's description can be found in https://rust-lang.github.io/rust-clippy/master/index.html#needless_pass_by_value. It is non-default lint rule probably it's too annoying for the general cases. Specifically, it has the Clippy category pedantic, and its description is lints which are rather strict or have occasional false positives` from https://doc.rust-lang.org/nightly/clippy

However, the violation of this rule might further introduce unnecessary clones, and datafusion has been suffering from the performance issue caused by redundant clones for quite some time. See EPIC for slow planning issue: #5637

Example: violation of needless_pass_by_value that leads to an unnecessary clone

#[allow(clippy::needless_pass_by_value)]
fn print_len(v: Vec<i32>) {
    println!("{}", v.len());
}

fn main() {
    let data = vec![1, 2, 3, 4];
    // We need to CLONE because `print_len` takes ownership.
    print_len(data.clone());
    println!("Still need data: {:?}", data);
}

If the linter can catch such issues, there will be no unnecessary copies

fn print_len(v: &[i32]) {
    println!("{}", v.len());
}

fn main() {
    let data = vec![1, 2, 3, 4];
    print_len(&data); // no clone, just borrow
    println!("Still need data: {:?}", data);
}

Describe the solution you'd like

Initial/example PR: #18468

  1. For a certain package, add this extra lint rule in lib.rs
// https://github.com/apache/datafusion/issues/18503
// Performance-related lint; safe to disable in tests to avoid noise.
#![deny(clippy::needless_pass_by_value)]
#![cfg_attr(test, allow(clippy::needless_pass_by_value))]

If a package is too large, we can carry it out module by module.
2. Fix violations if they might be on the performance critical path. For the following cases, it's okay to suppress lint violations using #[expect(clippy::needless_pass_by_value)]:

  • Tests. (the above macro already skips tests)
  • Intentional moves
  • Public APIs
    • For example, when modifying the datafusion-common crate, if a pub fn is externally visible (i.e., accessible from other crates), it should not be changed in order to maintain API compatibility.
  • Clearly not performance critical and also very hard to fix
  1. Ensure ./dev/rust_lint.sh passes

Describe alternatives you've considered

No response

Additional context

The individual tasks are quite AI-friendly, but make sure you understand the rationale and do a self-review before sending a PR; otherwise, the code review process can become painful.
See datafusion AI PR policy for details: https://datafusion.apache.org/contributor-guide/index.html#ai-assisted-contributions

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions