Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default isort && default black #3322

Closed
wants to merge 1 commit into from
Closed

default isort && default black #3322

wants to merge 1 commit into from

Conversation

dariocurr
Copy link

@dariocurr dariocurr commented Apr 10, 2022

Hi, I am Dario, a developer at BuildNN.

I tried to give a look at the code to solve #3321 but i found the code hard to read. I just gave a default run of isort and black to ease contributions

I would like to fix the plenty of flake8 inconsistencies as well but I don't know if it's appreciated

I guess it's easier and safer to run them by yourself instead of checking thousands of lines, but if you would like to solve all flake8's inconsistencies just say it

@piskvorky
Copy link
Owner

piskvorky commented Apr 10, 2022

Hi, thanks. I'm not opposed in principle, seems like a good idea (although TBH I'm surprised seeing " instead of ' in strings, and imports not sorted alphabetically, gave you enough headache to prompt this PR).

This PR doesn't seem to match the coding style we find practical though. E.g. I see no end trailing comma at the end of lists. Or the > 80 char line length. Is it possible to fix this?

We run automated flake8 tests as part of CI and this PR seems to trigger a lot of errors. I'm not sure whether it's related to your changes or not, but we'd need a squeaking clean CI run before accepting such massive changeset. Its advantage is mild / cosmetic, so its risk must be low / non-existent to match.

@gojomo @mpenkov WDYT?

@piskvorky piskvorky added documentation Current issue related to documentation housekeeping internal tasks and processes labels Apr 10, 2022
@gojomo
Copy link
Collaborator

gojomo commented Apr 10, 2022

The trend towards opinionated auto-formatting, and black in Python, is interesting. I was considering raising it as a future possibility, to align Gensim with larger trends, & forestall some of the digressions/learning-curves involved with getting new devs sync'd to project-style.

But, there's a bit of up-front cost in adjusting habits, and the big reformat diffs in the history that might make other changes hard to follow (like any big change-of-style transition). The biggest factor is how comfortable the existing devs are to the new practices.

Like @piskvorky, the default-to-double-quotes is a bit different than I'd pick, & initially disorienting, but I understand the reasons for it & it's far from a deal-breaker. Line-lengths are actually one of the areas that, despite black's strong opinions, it lets you change. (Though, rigorous application of some of its other line-breaking conventions rules may make its default 88-char limit less confining.) And black will respect, & even add, trailing commas when such series are already, or necessarily, multiline.

So as a matter of going-with-the-overall-open-source-Python-community-practices, I'd be mildly in favor of moving towards black-enforced formattting - but not necessarily ASAP, or in one big jump.

It'd be best to start right after a fairly major, & stable, release – so that there's more time for devs to adjust to the altered-style before any other fixes/release are urgent. While there's a good case to "get it all over with" in one leap, we could also consider "dipping our toes in" with an incremental, module-by-module switchover - perhaps starting with modules that have historically received fewer updates, to iron out any unexpected interactions with flake/CI/doc-comments/etc. Then, with comfort, set a target for a full (& applied-on-commit, enforced-by-CI) switchover.

Some of the edge-case auto-reformatting might be better with a human touch, or review, that still reaches black-conformance. For example, at...

https://github.com/RaRe-Technologies/gensim/blob/79a8d09f6f0720d359a9663207274829801c60b9/gensim/models/keyedvectors.py#L749-L752

...a comment that was just barely over the 88-char limit broke things onto 3 lines, with extra parens - whereas changing the line-ending comment to a line-before comment, sparing the parens, keeping the code on 1 line would strike me as a better style. Still, that's pretty minor (and most such cases might disappear if we alloed a longer line-length), so might not justify the extra effort of human review before going forward.

@piskvorky
Copy link
Owner

piskvorky commented Apr 11, 2022

That's a good point. The strict line length discourages inline comments. Especially if black inserts the comment (weirdly) at the closing bracket – I saw a lot of that in the diff. We'd need to switch all inline comments into standalone-comment-line-before-the-code-line.

No big deal IMO.

So if the black settings allow for it, and if we can reasonably automate the style checks / formatting (git hooks?) in CI, we can go ahead.

But seeing how much time we all "have" lately (I had another kid), I'd probably be in favour of a single full PR switch = task done. I doubt any of us will have the capacity / attention span for incremental tinkering and checking and re-checking, even if that would be ideal.

@dariocurr
Copy link
Author

dariocurr commented Apr 11, 2022

I understand all of your points and agree with you. I am glad that my PR has started an internal discussion.
As I said it is easier and safer for you to change on your own, either step by step or in one big jump.

Yes, the main point is.

The biggest factor is how comfortable the current developers are with the new practices.

Also, I opened this PR and raised this discussion because:

  1. I haven't looked at the code-style wiki section (my bad) (maybe mentioning it in the main README, over here might be helpful)
  2. I couldn't find any specification in the setup.cfg file.

I guess the latter could be something useful for, say, at least:

  • specify the line-length you use
  • specify the flake8's inconsistencies that you want to ignore (e.g E402)

@piskvorky
Copy link
Owner

piskvorky commented Apr 11, 2022

Let's see what @mpenkov thinks but I believe the discussion has reached its conclusion :-)

@dariocurr If you can make black a) split lines at 120 chars instead of 80, b) insert trailing commas, we're good to go.

Flake8 I'm not sure about, I think we have a few exceptions like "Yes this return value is unused but we still assign it and give it a name, because that makes the code easier to read / understand". But that's on a case-by-case basis, I don't know that we have any general "flake8 exceptions".

@mpenkov
Copy link
Collaborator

mpenkov commented Apr 11, 2022

This discussion about code formatting has come up multiple times before, like here.

Personally, I dislike automatic code formatters like black and never use them, for two reasons. First, the cost-to-benefit ratio is not worth it. Sure, with black, your code looks consistent, but it consistently looks like crap (sure, you can tweak some parameters, but I still don't like the output). To me, the main benefit of black seems to end flame wars about code style, but gensim never suffered from that particular problem. Like @piskvorky, it's also difficult for me to see how cosmetic issues made the code so difficult to work with that it prompted this PR. If the code is that crap, then it's unlikely that formatting alone will fix it. So, I don't see the practical benefit of reformatting our entire code base, especially at this stage in the project's lifecycle.

Second, I typically put effort into formatting my code to make it readable, and I don't appreciate some program stomping over that it reformatting it due to some arbitrary rules. That said, the vast majority of the gensim code wasn't written by me, so this is more of a personal point than an argument against applying black to gensim.

In the end, it's @piskvorky's call. If you do go ahead with this, my recommendation would be to try it out on a reasonably-sized module like keyedvectors.py first, and then see if you like the results before applying it to the entire repo. Like @gojomo said, definitely do this after the next major release, because there'll be havoc and it'll take time to bring things back to equilibrium. On top of that, you probably want to merge existing PRs first, because it may be difficult to do once the develop branch has been reformatted.

@dariocurr
Copy link
Author

dariocurr commented Apr 11, 2022

I don't know what to say. This is certainly your call. I raised this discussion because since gensim is an open-source project, in my humble opinion, it would be nice to have a common standard that is easy to use,

@piskvorky black obviously supports custom line length, I should look into the issue of trailing commas instead. I could try to meet your needs, but I guess you should decide what the standards are first.

By the way, what about import sorting?

Anyway, thank you for your time and attention

@mpenkov
Copy link
Collaborator

mpenkov commented Apr 11, 2022

By the way, what about import sorting?

Sure, that makes sense. I think we're currently splitting the imports into three sections:

  1. Standard library
  2. Non-standard library
  3. Gensim submodules

and sorting each section, but this isn't enforced anywhere by linters or otherwise.

@dariocurr dariocurr closed this May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Current issue related to documentation housekeeping internal tasks and processes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants