-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Candidate clang-format config file #603
Conversation
To run it, in the root of the source tree: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To approve painful changes, it would be nice to see a well-articulated proposal covering the obvious concerns and any known caveats: Why do you think that clang-format advantages are worth the large amount of conflicts the switch will create across branches? What are those advantages, and why are they worth the cost? Can clang-format do much better than mimicking our current (very limited) Code Style? Etc.
As you know from my squid-dev email, I only looked at a small portion of the previous incarnation of this branch but, FWIW, I saw:
- many small improvements (e.g., [1]),
- some small regressions (e.g., [2,3]),
- more complex #include ordering rules (e.g., [4]),
- numerous bad comment movements (e.g., [5]),
- numerous pointless(?) last empty line removals (e.g., [6]).
I suspect some of the problems mentioned above can be reduced via clang-format configuration adjustments, but I did not study this issue.
The old references below were posted on squid-dev earlier but no longer work. I did not invest the time necessary to refresh them for this PR. Their English descriptions can be used to navigate the diff.
-
Better tertiary operator spacing near master...kinkie:clang-format#diff-94b81725fceba313922066c3ce6071a7R19
-
Broken xgetnameinfo() formatting near master...kinkie:clang-format#diff-31086c4678ad7442b4024e422f6e4351R318
-
Extra ip6_parsenumeric() function name indent near master...kinkie:clang-format#diff-31086c4678ad7442b4024e422f6e4351R337
-
This was discussed in a few comments of another PR, starting with Maintenance: Move sort-includes.pl to scripts/maintenance/ #599 (comment)
-
Manual comment alignment for related code lines is a Bad Idea, but even if we do automate all such alignments, Squid code is not written to help clang identify related lines correctly as illustrated by the
jobs_ member declaration changes near master...kinkie:clang-format#diff-57b6a8db44fb379395f9be80a4c44c5aR82 -
Virtually any modified source file can be used as an example, but here is one (compat/inet_pton.h) where nothing else has changed: master...kinkie:clang-format#diff-563683e32b22291b9d5622652ec544d9L33
Moving the conversation over from PR599:
I think a rule makes sense to avoid circular dependencies. Sure, we could chase them but what's the benefit?
In terms of translation units, it's not too difficult. In the end it's a convention and no more.
There is for sure space for iterations; the configuration is a first attempt.
Some high visibility projects are using it (among others, chromium, webkit, qt, linux). Of course this is not to say that we need to, just that it has been tested in large codebases and found adequate. If we adopt it, it is likely there will be aspects of our coding style we'll need to change if we want to adopt the tool. The question is if it's worth it. I am quite convinced it is, but my level of acceptance of large changes is notoriously higher than average
From my experiment, this risk is very low. It is also possible to use git commit hooks to ensure that new PRs are already conformant when they are submitted. Our current ad hoc solution makes this less feasible.
Sure |
FYI: I do not recall seeing a coherent set of arguments convincing others that this change is worth it. Perhaps the reasons are obvious to you. They are not to me, and I cannot get the answers myself without investing a lot of time. The current lack of a well-articulated proposal does not mean we should not switch, of course. It means that we are operating in the dark, discussing without structure, and that is clearly a bad way to make decisions.
In what way does an astyle-based solution make anything less feasible? I see no serious feasibility differences between astyle and clang-format as far as formatting automation is concerned. What am I missing? |
[2] doesn't seem like a regression but it's not what we want either. [3] We can probably get rid of SQUIDCEXTERN by now, I suppose that is what's confusing the parser; there is not a lot of C calling into c++ anymore, is there?
I tested your idea of using negative weights to enforce our rules to override the "main include" concept and it works.
We can try to switch the AlignTrailingComments to false
This cannot be configured that I could see. Is it a dealbreaker?
I've added them as code blocks below.
I think you're referring to compat/GnuRegex.c:
|
Unfortunately, your guesses for old references 1 and 2 were wrong. I cannot quickly supply correct snippets because the PR branch code they were based on is no longer easily available on GitHub (one more reason to avoid force-pushes in publicly shared branches and to provide stable diffs folks can annotate). I doubt it is critically important to restore all of those references though -- I am sure you can find a lot more improvements and regressions than I did if you carefully inspect the current diff. You have correctly restored the old reference 3, thank you! I doubt that regression is related to Please do not misinterpret my six items as a comprehensive list. I just wanted to provide a glimpse at what issues a proper "formatter replacement" proposal should investigate/cover/address IMO. I am not working on such a proposal. |
Good point. I will build a PR for SQUIDCEXTERN removal tonight. All the src/ ones can simply erase, the include and compat ones need converting to C++ compatible sub-scope form. |
I'll try to focus on providing a reasonable set of arguments to try and argue why clang-format is better for us than the existing astyle + formatter.pl + other custom solutions as we are using now One thing that is I think not under contention is that using some tool to enforce some coding style is beneficial; this reduces friction and formatting changes due to individual contributors' own styles and tooling configuration differences. To get this benefit, it doesn't really matter what the style is as long as it's consistent.
For a post explaining why and how mongodb switched from astyle to clang-format , please see https://engineering.mongodb.com/post/succeeding-with-clangformat-part-1-pitfalls-and-planning/ I hope these arguments will be enough to convince that there is benefit in going ahead; if so, we can discuss about what style to adopt, where the tool supports (or mandates) something which is not covered in our existing guidelines |
Thank you. FTR, I have been asking for analysis not because I have serious specific/substantiated doubts about clang-format advantages, but because I think we should make well-informed decisions, especially when it comes to large-cost changes. Please keep this overall goal in mind when reading my responses below. We need good information to make a decision. This is not an exersize of convincing me (or anybody else) that you are right. This is an exercise in presenting a high-quality analysis.
Yes, of course. I would even go one step further -- the more [reasonable] styling we can automatically apply/enforce, the better. Ideally, all styling should be automatically applied/enforced.
FYI: AFAICT, the posted experiment does not demonstrate what you are describing because it does not compare clang-format changes across recent releases (those are shown) with astyle changes across recent releases (not shown AFAICT).
The last part, which is crucial to your argument, has not been fairly demonstrated AFAICT. To be fair, we have to show lots of changes beyond our control when upgrading from Astyle v3.0 to v3.1. The current ancient Astyle version we use is a red herring here IMO -- clearly, we could have upgraded it much earlier. This is less important, but I can speculate that clang-format releases led to significant formatting changes in the earlier days of that tool as well.
Yes, although I would rephrase this assertion to emphasize the long-term potential of a clang-based tool rather than our confidence in the absence of clang-format bugs: Having access to higher quality input does not necessarily imply fewer output bugs. It only means that clang-format can (potentially) do more. Other factors decrease the probability of clang-format bugs a lot more than seeing a detailed AST IMO.
Can you quickly estimate the number of configuration options that astyle has? That clang-format has? I do not think we need an in-depth analysis here. I hope that an estimate or two would either support this point well or remove it from the consideration.
Can you make the opposite argument about astyle? If not, this is not a relevant claim (or, to be more precise, this is a necessary precondition for using a tool).
Yes, this is valuable, assuming there are no (or fewer) astyle integrations.
Astyle supports all that as well AFAICT so this first part is unimportant for this analysis IMO.
This may be important. If we switch, we will go for a flag day, but the ability to reformat patches may be very valuable anyway.
If I ignore invalid and incomplete arguments, then I am left with "higher long-term potential", "better integration with editors", and "patch reformatting support". Out of those three, only the "better integration with editors" argument can be honestly used as a partial justification for switching "today". I do not think that is enough to warrant such a costly switch. However, I hope you can demonstrate that the clang-format offers a lot more control over the sources, which could (with all the other factors being equal or better for clang-format) be enough to warrant a switch. Also, if possible, please see if you can find other examples of largish projects switching from astyle to clang-format (and confirm that you cannot find examples of the opposite switches!). This information can boost our confidence that we are on the right path (even if those projects switch for wrong or unrelated reasons, like the mongodb example you linked to). |
I agree.
It is indeed the case that astyle version 3.1 has the same results as 2.04; we can probably remove the version check. @yadij , what do you think?
I have counted 54 options in astyle and 107 in clang-format. I have skimmed through the manuals, and tried to group options influencing the same behavior in different ways.
A quick search on google has not found any major project mentioning astyle in their check-in instructions.
I agree.
Is the comparison in number of option sufficient to convince you of that? In order to go more in depth I would have to manually compare the semantic of each of these 100 different options, you can imagine how I am not really keen to do it
astyle -> clang-format: I've found open-quantum-safe (open-quantum-safe/liboqs#56), openthread (openthread/openthread#2533) . Interestigly, I've stumbled upon whatastyle (https://pypi.org/project/whatstyle/), which claims to try and match a style format best matching the style a source is written in. |
Thank you for reporting this. These stats do not raise any red flags, but the difference alone is not substantial enough to claim a proof of a significant advantage IMO.
No, that would be rather irrational of me. It is just a good stepping stone towards a convincing demonstration. After all, it is not about the raw number of options but what they control. I was hoping for a more drastic difference in raw numbers (e.g., an order of magnitude) or a completely different approach to specifying the desirable format.
FWIW, I do not think that comparing the semantics of each option is necessary or even helpful for those reviewing the "let's switch to clang-format" proposal. It would simply overwhelm the reviewers with low-level information. If we are going to switch, then I would expect somebody to go through those 107 options and set them the best we can (including leaving some at their defaults), but that activity can probably wait and is rather different from the comparative option-by-option analysis you rightfully want to avoid. The proposal needs to demonstrate that clang-format is substantially better and, hence, worth the switching costs. If I were to make that proposal, I would try to show several specific areas of significant improvements over what modern astyle can do and demonstrate that there are no serious regressions. With 100 configuration options, I suspect both demonstrations are possible, but past demonstrations did neither. With your updates in mind, I will re-state the current pros and cons in the next comment. |
AFAICT, here are the current switching arguments, with a few invalid ones thrown away and a few adjustments based on Francesco's research and the discussion: Pros:
Cons:
Conclusion: Clang-format has several important advantages (see pros1 and pros2) and causes no important formatting regressions (see cons4 for a list of minor ones). Switching is likely so we should start enjoying a better tool sooner rather than later. Let's work on the best format/configuration file that we can agree on and switch. All that is left is filling out X, Y, Z as well as A, B, C... |
From our code style guidelines:
|
IIRC the delay in moving to std::regex was due to CentOS compiler being outdated. With CentOS 7 hitting EOL end of this year I hope we can resolve all these problems in v6 anyway. |
Centos 6 is hitting EOL on 2020-11-30; Centos 7 is EOL'ed on 2024-06-30. But yes, I agree. |
@kinkie, thank you for providing verbal descriptions of anticipated improvement examples. To be honest, I expected improvements that are a lot more valuable than the ones currently listed, but perhaps I was overly optimistic about current clang-format abilities. Or perhaps clang-format primary advantages are not in the individual formatting options. I have a related question: Is there a way to configure clang-format so that a given large piece of non-trivial C++ code will always be formatted exactly the same way, no matter how the developer formats that code? I am hoping for a "yes" answer. A "no" answer to that question would imply that current clang-format just cannot control some C++ code formatting aspects; in other words, there exist two different inputs (of the same C++ code) such that their formatted versions will remain different. The formatting regressions list feels inadequate to me because it does not contain previously discussed formatting regressions. There absence makes the reader wonder whether there are a lot more problems that we are currently unaware of. An honest comprehensive disclosure of formatting regressions is an important prerequisite for going forward IMO. Please note that unlike the X, Y, Z examples in my template, the list of known regressions (A, B, C, ...) has to be as comprehensive as we can make it. We do not need a link to every regression of the same kind, but we do need a list item for every regression kind that we know about. The current list also seems to focus on the amount of changes, which is already covered in another "cons" bullet. When building the formatting regressions list, the amount of changes is irrelevant. In this context, we should welcome any change that is improving formatting or, at the very least, makes it more consistent. This list should focus on formatting that became worse after applying clang-format. |
One advantage I see is we could move away from our current descriptive code guidelines and onto "format with clang-format version X or later and this configuration file".
The answer is "no". I've ran an experiment: take client_side.cc, arguably our most complicated source file, format it with clang-format-10, then run it through a random astyle style, and then clang-format-10 again and compare the differences.
or
Maybe there is an opportunity to make the configuration more prescritptive, or we can accept this
getnameinfo.cc is quite messy, PR#619 is meant to clean it up, and it takes care of the noted regressions there (xgetnameinfo and ip6_parsenumeric). header sorting: negative priorities work in keeping headers sorted according to our specifications, although with some oddities as we have noted when subdirectories are involved. The ordering is however consistent, so I'd leave this be trailing comments alignment: I'm afraid there's only two options: either align them or force them to be a fixed number of characters before the last statement. I don't want to downplay your concerns, but I fail to see this as a showstopper. If you believe it is a showstopper (no need to explain why), then say so, and I'll just drop the PR.
The patch is 121k lines long . Going over it is going to be a lot of work. For instance, is
a regression? Sure, it uses more lines, but to me it is more readable. A lot of positive changes I see are about enforcing the rule of having a function's return type be on its own line - we have been inconsistent in enforcing it A pretty important regression can be
debugs() is hard to undertand, and as such the formatter gets confused and either creates very long lines or aligns to the wrong point
I'm at 2% of the difference, and it's rather late in the day. Please let me know if there is any showstopper for you. IMO the regressions are worth the extra rigour this tool enables us, but I understand it's subjective so it carries limited weight. |
Such a move, as we discussed before, is a good approach (under certain conditions), but I do not see a unique clang-format advantage here. The same move can, in principle, be done while still using astyle AFAICT. Yes, clang-format should be able to control more aspects, but that nearly orthogonal advantage is already counted elsewhere (pros item 1).
AFAICT, before we make any conclusions from this experiment, we have to make sure that no clang-format configuration options can control the currently-changing output. For example, perhaps clang-format can split all one-liners longer than N characters and merge all shorter multi-liners? And control the position of the opening
I believe that the more uniform code formatting is, the better, but that is a somewhat secondary concern. A "yes" answer to my question would allow us to leave all C++ formatting to the tool (with the exception of the text inside comments), which would be a significant improvement because it will squash review arguments over C++ code formatting.
Please note that our current primary goal is to catalogue regressions/problems, not decide what to do about them. We can move to the decision/mitigation phases while all the regressions are known. I will try to resist discussing what problems we should leave unaddressed until those phases.
Yes, but it is the work that has to be done to make an informed decision: We should not switch until we understand what happens when we do (and until we know how to increase the benefits and/or reduce the pain from the switch). Needless to say, nobody is forcing you to do this tedious work! However, if you want us to switch, the onus is on you to prepare a quality proposal, which includes, IMO, itemizing regression kinds. IIRC, after spending less than an hour looking at the initial diff, I found regressions that you have not warned us about, and that scares me -- how many more different regressions are out there? Also, please note that, AFAICT, you are looking at a diff produced with a handful of clang-format options. If we switch, we will probably want to use a lot more options, which will probably change the diff. I do not know what the most efficient way to navigate this process is, but I bet it will require spending many hours working with large diffs.
I agree that assessing some formatting changes require judgement calls. Use your best judgement and err on the side of inclusion. It does not hurt to warn about a few change categories that everybody is going to be OK with.
FWIW, I consider this an improvement. I would also prefer clang-format to initialize each member on its own line (except for legitimate one-liners where everything fits on one line). BTW, "more lines" is not a very good criteria for judging formats IMO.
Yes, this change makes formatting worse. Hopefully, we can find a way to address this problem. For example, I would argue that alignment should use four spaces for each nesting level change, not opening parens or other alignment markers. It would also be great to leave unimportant
I am not sure I understand the question. I hope we can find solutions (or decide to ignore) all the problems seen so far. Right now, the unknowns are the only known showstopper.
Right now, this is not so much about subjective decisions as it is about decisions in the dark. Here is how it feels:
If your claim is that there are no regressions worse discussing (for whatever reason), you can, of course, make that claim (and explain your reasoning). You are in the driver seat. |
abandoning, we're staying on astyle for the time being |
Just to avoid misunderstanding: The above fact does not imply that future attempts to improve formatting, including attempts using clang tools, are not welcomed per se. When somebody works on formatting improvements, they should learn the lessons this PR offers, of course, but "do not do it" is not one of those lessons. |
Thanks for the clarification, there was no ill intent perceived.
While I thnk that clang-format is superior for what we need to do, astyle
works well enough and we simply have bigger fishes to fry :)
…On Thu, May 20, 2021 at 3:39 PM Alex Rousskov ***@***.***> wrote:
we're staying on astyle for the time being
Just to avoid misunderstanding: The above fact does not imply that future
attempts to improve formatting, including attempts using clang tools, are
not welcomed per se. When somebody works on formatting improvements, they
should learn the lessons this PR offers, of course, but "do not do it" is
*not* one of those lessons.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#603 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHPVDB42G4XVQHVCWQHEZLTOUGK5ANCNFSM4MIANDNA>
.
--
Francesco
|
Right. I actually left that comment for future developers. I think you and I are more-or-less on the same page here. |
This is a candidate to have a discussion about whether
to adopt clang-format. This configuration is compatible
with clang-format-9 and later