Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blacken source code? #683

Closed
jamesmyatt opened this issue Jul 8, 2019 · 6 comments · Fixed by #3722
Closed

Blacken source code? #683

jamesmyatt opened this issue Jul 8, 2019 · 6 comments · Fixed by #3722
Milestone

Comments

@jamesmyatt
Copy link
Contributor

Black is emerging as the standard code formatter for Python (https://github.com/python/black, https://www.youtube.com/watch?v=esZLCuWs_2Y)

Are you open to reformatting the code using this? If so, I can submit a PR for it.

@detrin
Copy link

detrin commented Aug 26, 2023

@jamesmyatt That is great idea! Would you be interested in making PR? It should be added to pyproject.toml and maybe into pre-commit.

@thatlittleboy Could you confirm that there is no black formatted used at the moment?

@thatlittleboy
Copy link
Collaborator

thatlittleboy commented Aug 26, 2023

@detrin We discussed that before over at dsgibbons#21, our conclusion was to not use black for the time being.

I think we can repoen the discussion for it at some stage, but I would say it's not a huge priority.

@detrin
Copy link

detrin commented Aug 26, 2023

@thatlittleboy Okay so I read the other issue and all the points are valid.

My habit is that when I join new project that wasn't maintained properly using black and any fancy tools can make the whole developer experience a bit easier. I see it as motor oil.

One concern though is that reformatting with black might make it very difficult to merge any changes between forks, as the diff would be enormous and merge conflicts would likely arise everywhere. It would be great if the original repo were already formatted with black, but we have to be pragmatic.

Agree.

but I would say it's not a huge priority.

Also, agree. I think splitting the repository would also help to make it more maintainable.

@connortann connortann added this to the 0.44.0 milestone Oct 3, 2023
@connortann connortann modified the milestones: 0.44.0, 0.45.0 Dec 6, 2023
@connortann connortann modified the milestones: 0.45.0, 0.46.0 Mar 4, 2024
@CloseChoice
Copy link
Collaborator

@connortann Is this already solved by #3550?

@connortann
Copy link
Collaborator

No, although it's related: that PR only related to docstrings.

We have applied the black code style (implemented via ruff format) to the notebooks, but not yet to the code. We could indeed extend the autoformatting to the code base, although I've been a bit hesitant so far as the diff is huge and would be a bit of a pain to review. However, I think now is probably a good a time as any, so I'll have a look on Monday.

@connortann connortann modified the milestones: 0.46.0, 0.47.0 Jun 26, 2024
@connortann
Copy link
Collaborator

connortann commented Jun 28, 2024

I made a quick script to see what the effect of the formatter for a range of line lengths, counting the lines changed in the resulting git diff.

A line length of 120 seems like a reasonable choice that would keep the diff to a minimum. FYI @CloseChoice

image

Script to reproduce:

import subprocess
import sys
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt

REPO = Path("~/projects/shap").expanduser().resolve()
RUFF = "/home/connor/miniforge3/envs/shap/bin/ruff"


def run_repo_command(command) -> str:
    result = subprocess.run(
        command,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        shell=True,
        text=True,
        cwd=REPO,
    )
    if result.returncode != 0:
        print(f"Error running command '{command}': {result.stderr}")
        sys.exit(1)
    return result.stdout.strip()


def evaluate_line_length_effect(line_length) -> dict[str, int]:
    print("Evaluating line length effect for length:", line_length)
    # Nb. on my local "format" branch, the .py files are included by the ruff formatter.
    run_repo_command("git checkout format")
    run_repo_command("git reset --hard")
    run_repo_command(f"{RUFF} format shap tests --line-length {line_length}")
    diff = run_repo_command("git diff --shortstat")
    # E.g."181 files changed, 13157 insertions(+), 5131 deletions(-)"
    metrics = [int(s) for s in diff.split() if s.isdigit()]
    assert len(metrics) == 3, "Unexpected output from git diff --shortstat: " + diff
    return dict(
        length=line_length,
        changed=metrics[0],
        insertions=metrics[1],
        deletions=metrics[2],
    )


def scrape():
    data = [evaluate_line_length_effect(length) for length in range(50, 200)]
    df = pd.DataFrame(data)
    df.to_csv("line_length_effect.csv", index=False)


def plot():
    df = pd.read_csv("line_length_effect.csv")
    fig, ax = plt.subplots()
    df.plot(x="length", y="changed", ax=ax)
    ax2 = df.plot(x="length", y="insertions", ax=ax, secondary_y=True)
    ax2.set_ylim(bottom=0)
    plt.show()


if __name__ == "__main__":
    scrape()
    plot()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants