Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Style change proposal: target and file lists should never be put on one line #9

Closed
petamas opened this issue Feb 3, 2023 · 42 comments

Comments

@petamas
Copy link

petamas commented Feb 3, 2023

First of all, I was really happy when I found gersemi, it looks pretty great, and it is maintained, unlike cmake-format, yay! I personally prefer less opinionated formatters, but the default style of gersemi fits our use case fairly well. However, there is one specific case where I feel like it could be improved, so I'm proposing a style enhancement.

The README shows this example of formatting:

target_link_libraries(
    foobar
    PUBLIC example::dependency_one example::dependency_two
    PRIVATE
        example::some_util
        external::some_lib
        external::another_lib
        Boost::Boost
)

I'd argue that target_link_libraries() and similar functions should always be formatted this way:

target_link_libraries(foobar
    PUBLIC
        example::dependency_one
        example::dependency_two
    PRIVATE
        example::some_util
        external::some_lib
        external::another_lib
        Boost::Boost
)

add_library() should also always be formatted this way:

add_library(lib SHARED
    source_one.cpp
    source_two.cpp
)

The two key changes:

  1. Even though the PUBLIC section would've fit on one line, every dependency was broken into a separate line.
  2. The target name and any valueless flags were moved to the same line as the function call.

Rationale for nr. 1:

  • At least in our codebase, file lists change quite often as people add, remove and move around code. This often makes dependency lists change too. Breaking these lists into separate lines make these diffs cleaner, which leads to less merge conflicts and easier to review pull requests, as git understands adding/removing lines better than changins a list in one line.
  • Building on that, if a particular target/file list oscillates around the limit for what can fit in one line, that can further obfuscate the change, as adding one file/target may trigger gersemi to break stuff into multiple lines, then removing something else may trigger it to put everything in one line again, causing unnecessarily large diffs.
  • Especially in a case like the first example where there's both a PUBLIC and PRIVATE section, it's more pleasing aesthetically for them to be formatted consistently.

Rationale for nr. 2:

  • I feel like these functions are tied stronger to the (first) target argument than the others, similarly how method calls treat the object argument special in OOP languages. (I.e. it's object.func(arg1, arg2) instead of func(object, arg1, arg2).) I'd consider the target_whatever() functions to be "methods" on the target object, and express this by putting them on the same line.
  • In case of add_library() / add_executable() without any flags, there is no clear separation between the target name and the list of files; putting the target name next to the function call and the files in separate lines with different indenting makes it more distinct.
  • Similarly, if there are any flags (STATIC/SHARED, EXCLUDE_FROM_ALL, etc.), those would look out of place "mixed in" with the file list.

I'd apply the above changes to add_library(), add_executable(), target_link_libraries(), target_sources(), target_precompile_headers(), and possibly target_include_directories(), target_link_directories(), target_compile_definitions(), target_compile_features(), target_compile_options(), target_link_options().

What do you think? Nr. 1 is the really important part for me for the 5 highlighted functions, the other functions and nr. 2 is more of a "if we're already talking about this" thing. I'm also torn on whether function calls with only one provided list argument containing one value (eg. target_include_directories(Foo PUBLIC include)) should warrant an exception to the "always multiline" rule.

@BlankSpruce
Copy link
Owner

I'll have to think a little bit longer about your proposals but here is my view on the topic and I hope you'll challenge the points you find weak.

On proposal nr. 1

I find this proposal quite reasonable since what you're proposing is to lean more towards readability of diffs at little to no cost to readability of the code as it's viewed in the file editor. While this would slightly break my assumption about limited configuration would you be happy with extra parameter "list-expansion (temporary name, I don't how to call it better)" with these strategies:
list-expansion: favour-inlining (current)

target_link_libraries(foo PUBLIC bar)
target_link_libraries(foo PUBLIC bar baz)
target_link_libraries(foo PUBLIC bar PRIVATE baz)
target_link_libraries(
    foo 
    PUBLIC bar 
    PRIVATE
        example::some_util
        external::some_lib
        external::another_lib
        Boost::Boost
)

list-expansion: favour-smaller-diffs (?)

target_link_libraries(foo PUBLIC bar)
target_link_libraries(
    foo 
    PUBLIC 
        bar
        baz
)
target_link_libraries(
    foo 
    PUBLIC 
        bar 
    PRIVATE 
        baz
)
target_link_libraries(
    foo 
    PUBLIC 
        bar 
    PRIVATE
        example::some_util
        external::some_lib
        external::another_lib
        Boost::Boost
)

list-expansion: always

target_link_libraries(
    foo 
    PUBLIC 
        bar
)
target_link_libraries(
    foo 
    PUBLIC 
        bar
        baz
)
target_link_libraries(
    foo 
    PUBLIC 
        bar 
    PRIVATE 
        baz
)
target_link_libraries(
    foo 
    PUBLIC 
        bar 
    PRIVATE
        example::some_util
        external::some_lib
        external::another_lib
        Boost::Boost
)

On proposal nr. 2

I'm against that proposal due to these reasons:

  • first of all I'd rather avoid special casing for particular sublist of builtin commands because
  1. while the similarity to OOP is understandable in my opinion I'd have to offer that for any command that resembles this syntax somehow, not only to target_* ones.

  2. I couldn't offer the same for your project's custom command unless I'd introduce some special comment like # gersemi: first-argument-is-an-object (temporary name)

  3. How would I deal with commands that fall under the same OOP impression except that the "object" isn't exactly first argument? Example with file(CONFIGURE OUTPUT output-file ...) which is pretty much something akin to output-file.configure_output(...).

    To sum up a little bit too much attention from me to do it properly.

  • it'd introduce separate, non-repeatable vertical line of focus which I think hurts readability.
  1. It's separate because the way lists are expanded currently your next vertical line of focus is predictable since every indentation level introduces more detailed information on given command call. Zeroth level (let's ignore IF blocks for the sake of discussion) says what is called, first level says what is covered by the call as in "does the library X links only to PRIVATE or perhaps also to PUBLIC entities" and second level deals with details of "argument" as in "PRIVATE links are X, Y and Z". On second level I refer to argument as I'd view it for instance in python:
target_link_libraries(
    "foo", # if it was named argument it would be `target="foo"`
    public=["bar"],
    private=[
        "example::some_util",
        "external::some_lib",
        "external::another_lib",
        "Boost::Boost",
    ],
)

add_library(
    "foo",
    kind="SHARED" # note that cmake doesn't call this option `kind` but I suppose this would be API in python
    sources=["one.cpp", "two.cpp"] # again lack of keyword in cmake
)
  1. It's non-repeatable because it depends on the command name length.

Current state (lines of focus are shown here with symbol | and gaps between command invocations are enlarged for the sake of demonstartion):

|    |    |
|    |    |
|    |    |
target_link_libraries(
|    |foobar
|    |PRIVATE
|    |    |example::some_util
|    |    |external::some_lib
|    |    |external::another_lib
|    |    |Boost::Boost
)    |
|    |
|    |
|    |
add_library(
|    |foo
|    |SHARED
|    |source_one.cpp
|    |source_two.cpp
)  

After the proposal:

|    |    |           |
|    |    |           |
|    |    |           |
target_link_libraries(|foobar 
|    |PRIVATE
|    |    |example::some_util
|    |    |external::some_lib
|    |    |external::another_lib
|    |    |Boost::Boost
)    |
|    |      |    | 
|    |      |    | 
|    |      |    | 
add_library(|foo |SHARED
|    |source_one.cpp
|    |source_two.cpp
)
  • the issue with no clear distinction (on which I agree with you) can be mitigated by using modern target_sources:
# before
add_library(lib SHARED
    source_one.cpp
    source_two.cpp
)

# after
add_library(lib SHARED)

target_sources(
    lib
    PRIVATE # I've assumed your preference to expanding lists
        source_one.cpp
        source_two.cpp
)

@BlankSpruce
Copy link
Owner

Another note: initially when I started implementing the formatter I didn't know how to call this principle for structured and repeatable visual cues which I call now "lines of focus" but I've been inspired by the term from this presentation by Kevlin Henney where Kevlin Henney calls this property "lines of attention".

@petamas
Copy link
Author

petamas commented Feb 6, 2023

Thank you for your quick and detailed reply! I haven't watched Kevlin's talk (yet), but tried to react to all your thoughts regarding prop. nr. 2.

On proposal nr. 1: if you're willing to add a configuration option, I'd be absolutely delighted. I especially like that it has three options, not just two.

On proposal nr. 2:

first of all I'd rather avoid special casing for particular sublist of builtin commands

Isn't there already some special casing? I don't think the COMMAND arguments of eg. add_custom_command() are treated the same way as other list arguments, at least based on the results of running gersemi on our codebase. I haven't checked the code, but I assume the various arguments of builtin functions are classified in some way already ("flag", "positional", "list", "command", etc.), this would only add a new category ("object argument").

while the similarity to OOP is understandable in my opinion I'd have to offer that for any command that resembles this syntax somehow, not only to target_* ones.

True, although most commands that are similar are named target_*. The only ones I haven't mentioned in my original proposal that I can remember off the top of my head are add_custom_target() and add_dependencies().

I couldn't offer the same for your project's custom command unless I'd introduce some special comment like # gersemi: first-argument-is-an-object (temporary name)

I think "function that has a single positional argument, i.e. it uses cmake_parse_arguments(PARSE_ARGV 1 ...)" would be a good criteria to identify these functions. (Apart from some of the builtin ones, eg. add_library, which has a badly designed interface IMO.)

How would I deal with commands that fall under the same OOP impression except that the "object" isn't exactly first argument? Example with file(CONFIGURE OUTPUT output-file ...) which is pretty much something akin to output-file.configure_output(...).

I wouldn't deal with them, because if the "object" argument is named, then there's already no readability issue, and if it's an unnamed non-first object argument, then the signature is simply bad IMO. (Which is the case for many builtin functions, but that's the beauty of CMake.)

it'd introduce separate, non-repeatable vertical line of focus which I think hurts readability.

I think I get what you're talking about here, although I don't feel like it introduces a separate line of focus: my eyes parse target_link_libraries(foobar as one thing, i.e. "add new dependencies to foobar" - similarly how an if or while does not introduce a line of focus. But this may be simply because I'm used to this formatting.

the issue with no clear distinction (on which I agree with you) can be mitigated by using modern target_sources

True, although this would incur training my team on using target_sources, and why they should use its PRIVATE flavour 99% of the time, which may cause some confusion.

All in all, I still feel like prop. 2 would improve readability, but if you feel like it would require too much work / special casing / etc., I can easily live without it, especially since I have multiple possible workarounds:

  • Use target_sources() as you suggested
  • Run a fairly simple postprocessor on gersemi's output that removes the extra (for us) whitespace
  • Wrap add_library() into a my_add_library(target STATIC SOURCES ...) custom function, which will be clearly formatted by gersemi (although this would incur the same kind of training like target_sources(), except for the footgun of PUBLIC)
  • Combine custom wrappers with a postprocessor for even clearer output

So, if you're still not convinced of prop. 2, I'd still be really happy with the acceptance/implementation of prop. 1 only. (Regardless of whether the default style is changed or an option is introduced.) Proposal 2 was more of a combination of "while we're talking about these functions" and "try and make gersemi's style more similar to ours" anyway. :)

@petamas
Copy link
Author

petamas commented Feb 6, 2023

Question that popped into my head while writing my previous answer: list-expansion: favour-smaller-diffs and always would not impact the COMMAND arguments of add_custom_command(), add_custom_target(), add_test() etc, right? Because some of our commands would look really weird in a vertical layout. :D

@BlankSpruce
Copy link
Owner

No hard deadlines but I'll try make something for you next week once I find enough time to work a little bit with implementation.

You're right about COMMAND being specially treated and to be honest I'm not too happy about how I've done it. To my defense to do it well, that is as people would probably do it manually, it'd require embedding some kind of "shell invocation"-kind-of-thing formatter. That being said I'll try to exempt it from the rules of expansion for these additional knobs to keep relatively reasonable oneliners as oneliners.

@petamas
Copy link
Author

petamas commented Feb 6, 2023

No hard deadlines but I'll try make something for you next week once I find enough time to work a little bit with implementation.

Wow, thanks, that would be great! I can test the prototype whenever you need.

You're right about COMMAND being specially treated and to be honest I'm not too happy about how I've done it. To my defense to do it well, that is as people would probably do it manually, it'd require embedding some kind of "shell invocation"-kind-of-thing formatter. That being said I'll try to exempt it from the rules of expansion for these additional knobs to keep relatively reasonable oneliners as oneliners.

Yeah, I have no idea how to format COMMAND right in an automated way. Eg. I often write stuff like this:

add_custom_command(
    ...
    COMMAND
        "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_BUILD_DIR}/whatever.py"
            --flag
            --option foo # default "option" is "bar"
            --verbose
            --dirs
                "${CMAKE_CURRENT_SOURCE_DIR}"
                "${CMAKE_CURRENT_BUILD_DIR}"
    ...
)

For a human, this grouping/indenting/commenting style improves readability, but good luck figuring it out automatically. I think the best solution in gersemi may be to put the COMMAND keyword at an indentation level that comes from the other formatting logic, then keep the "value" of it indented/wrapped exactly as it was written, except for indenting/unindenting all lines to align the first line 1 "tab" right from where COMMAND is.

@BlankSpruce
Copy link
Owner

@petamas Please be invited to test out this version which introduces --list-expansion switch.

To see the expected effects you can compare how files are formatted under two strategies here (for the sake of demonstration line length is 50 instead of usual 80):
https://github.com/BlankSpruce/gersemi/blob/list-expansion/tests/formatter/issue_0009_list_expansion_favour_inlining.out.cmake
https://github.com/BlankSpruce/gersemi/blob/list-expansion/tests/formatter/issue_0009_list_expansion_favour_expansion.out.cmake

@petamas
Copy link
Author

petamas commented Feb 14, 2023

Thanks, I'll take a look tomorrow!

@petamas
Copy link
Author

petamas commented Feb 14, 2023

@BlankSpruce
I haven't tested the code yet, just took a look at your examples, and something caught my eye. This line should not be formatted this way in either "favour-smaller-diffs" or "always" mode (I don't know which one is modeled by "favour-expansion"):

target_link_libraries(foo PUBLIC bar baz)

It should be formatted this way in either of those cases to support merges on addition:

target_link_libraries(
    foo
    PUBLIC
        bar
        baz
)

Or am I missing something?

@BlankSpruce
Copy link
Owner

The favour-expansion mode would model what was earlier called favour-smaller-diffs. I've changed that name because it gave false promise that it actually targets small diffs - expanded code might sometimes introduce a bit larger line count, even for simple command invocations.

I'd argue that the way you expect formatting in the new example fits the description only for always case which isn't supported on that testing branch.

I'm wondering whether we could formulate somewhat simple rule that would consistently (within reason since CMake language isn't exactly friendly here) describe desired formatting that wouldn't be something along the lines "and if the command is called THIS and the argument section is THAT then do [...]".

@petamas
Copy link
Author

petamas commented Feb 15, 2023

The favour-expansion mode would model what was earlier called favour-smaller-diffs. I've changed that name because it gave false promise that it actually targets small diffs - expanded code might sometimes introduce a bit larger line count, even for simple command invocations.

I agree with the name change, I just did not know which option you implemented under that name because it kinda fits both. :)

I'd argue that the way you expect formatting in the new example fits the description only for always case which isn't supported on that testing branch.

In this comment, you formatted both always and favour-smaller-diffs in the way I expect it, that's why I am confused: #9 (comment)

I'd also say that the original issue (frequently changing lists introducing unnecessary conflicts) applies to this example too.

I'm wondering whether we could formulate somewhat simple rule that would consistently (within reason since CMake language isn't exactly friendly here) describe desired formatting that wouldn't be something along the lines "and if the command is called THIS and the argument section is THAT then do [...]".

All my proposals will make decisions on the function call level, not the individual list argument level i.e. either all provided list arguments are allowed to be inlined, or none of them are. (Based on your example formatting, this already seems to be the case, i.e. when you expanded one list argument, you expanded all of them. I like this because it makes the arguments consistent with each other.)

Proposal A: always expend lists for function calls with more than one provided list argument, or with a provided list argument containing more than one element

  • target_link_libraries(foo PUBLIC bar) is allowed to be inlined (but can be expanded if the line would be too long etc. as part of "normal" formatting)
    • This makes sense to me because target_include_directories(Foo INTERFACE include) is a frequent enough pattern in our codebase that it deserves a oneliner. Also, as an added bonus, if someone's doing something more unusual (like adding multiple directories), the different layout will make it more noticeable. (This might only apply to our codebase though.)
  • target_link_libraries(foo PUBLIC bar baz) must be expanded, because while there's only one list provided (PUBLIC), it has two elements (bar and baz).
    • Rationale for this is the reason I opened the issue originally: list of two or more items are likely to be extended/modified frequently, so making them be on separate lines makes diffs easier to understand and merge.
  • target_link_libraries(foo PUBLIC bar PRIVATE baz) must be expanded, because two list arguments are provided (PUBLIC and PRIVATE), even though they only have one element each.
    • Multiple provided list arguments usually indicate that this function call is doing multiple things; expanding it makes the second "thing" harder to miss.
    • In case of the target_* family, the PUBLIC/PRIVATE/INTERFACE lists are actually "one list" in the sense that they describe eg. "list of dependencies (private or public)" and similar concepts, so the "lists of two elements are prone to be extended/modified" argument applies. However, this is not necessarily true for all functions with multiple list arguments.

Proposal B: always expend lists for function calls with a provided list argument containing more than one element

I feel like the rationale for the third bullet point in proposal A is weaker than the other two, so I tried to make the rule less constraining, allowing (but not mandating) a bit more inlining.

  • target_link_libraries(foo PUBLIC bar) and target_link_libraries(foo PUBLIC bar baz) behave the same as the previous option.
  • target_link_libraries(foo PUBLIC bar PRIVATE baz) is now allowed to be inlined, because the two list arguments are provided (PUBLIC and PRIVATE) only have one element each.
    • This case is the only difference in behaviour to the previous option
  • target_link_libraries(foo PUBLIC bar PRIVATE baz xyz) still has to be expanded, because the provided PRIVATE list has more than one element.
    • The "lists of two elements are prone to be extended/modified" argument applies here.

What do you think? Personally, I prefer proposal A, but if you find it too restrictive, I can live with proposal B.

(I was more swamped today than I expected, so I couldn't run the testing version on our code yet, sorry.)

@BlankSpruce
Copy link
Owner

It seems that proposal A is quite nice but I need to make sure I understand it well. Could you show how you would format examples for the made-up command described below? Assume whatever line length you want but please state what it is for the sake of completion.

Signature:

try_to_win_best_picture_academy_award(
    <title>
    [<alternative title>]
    DIRECTORS <director>...
    [CAST <cast_member>....]
    [GENRE <genre>]
    [YEAR <year>]
    [FOREIGN_LANGUAGE]   
)
                                               

Code:

try_to_win_best_picture_academy_award("Harry Potter and the Philosopher's Stone" "Harry Potter and the Sorcerer's Stone" DIRECTORS "Chris Columbus" GENRE "Fantasy" YEAR 2001 CAST "Daniel Radcliffe" "Rupert Grint" "Emma Watson")

try_to_win_best_picture_academy_award("The King's Speech" DIRECTORS "Tom Hooper")

try_to_win_best_picture_academy_award("The Shape of Water" DIRECTORS "Guillermo del Toro" CAST "Sally Hawkins" "Michael Shannon" "Richard Jenkins" "Doug Jones" "Michael Stuhlbarg" "Octavia Spencer" GENRE "Romantic fantasy" YEAR 2017)

try_to_win_best_picture_academy_award(Parasite DIRECTORS "Bong Joon-ho" CAST "Song Kang-ho" "Lee Sun-kyun" "Cho Yeo-jeong" "Choi Woo-shik" GENRE "Black comedy thriller" YEAR 2019 FOREIGN_LANGUAGE)

try_to_win_best_picture_academy_award("Everything Everywhere All at Once" DIRECTORS "Daniel Scheinert" "Daniel Kwan" CAST "Michelle Yeoh" GENRE "Absurdist comedy-drama" YEAR 2022)

@petamas
Copy link
Author

petamas commented Feb 16, 2023

Before I show my formatting, I want to note some stuff:

  • When I say one-value argument, I mean arguments that cannot take multiple values, only one, in contrast to multi-value/list arguments. If a named argument can take a list, but has only one item in the list in the call, it does NOT count as a one-value argument. I.e. only GENRE and YEAR are one-value arguments.
  • My understanding of how gersemi works by default is this:
    • If the full function call can be put in one line, then it will put it in one line
    • If it cannot put it in one line (because of line length limit, --list-expand, comment, etc.), it'll put each positional and named parameter in their own line (i.e. gives its own line to <title>, <alternative title>, DIRECTORS <director>..., [CAST <cast_member>....], etc. This is all-or-nothing, so it never puts multiple arguments in the same line while also putting some on their own.
    • If even that is not valid, it will expand the arguments that make the layout invalid one-by-one by putting the values on separate lines from the keyword, and indenting them.
    • As we talked about before, COMMAND arguments are handled specially
    • I have no idea how gersemi chooses which alternative to choose if even the fully expanded layout is invalid. I assume it tries to make as many lines fit as it can, i.e. expand anything that is longer than the max line length, even if that version is still longer or other lines are breaking the lenght limit.
  • I don't care much about the formatting of one-valued named arguments, flags, and unnamed arguments, i.e. I accept the results of the above process (and my first experiment with formatting our codebase) for them.
  • I'll call maximum line length len.

Harry Potter and the Philosopher's Stone

Depending on len, I'd expect one of these three layout:

try_to_win_best_picture_academy_award(
    "Harry Potter and the Philosopher's Stone"
    "Harry Potter and the Sorcerer's Stone"
    DIRECTORS
        "Chris Columbus"
    GENRE "Fantasy"
    YEAR 2001
    CAST
        "Daniel Radcliffe"
        "Rupert Grint"
        "Emma Watson"
)
try_to_win_best_picture_academy_award(
    "Harry Potter and the Philosopher's Stone"
    "Harry Potter and the Sorcerer's Stone"
    DIRECTORS
        "Chris Columbus"
    GENRE
        "Fantasy"
    YEAR 2001
    CAST
        "Daniel Radcliffe"
        "Rupert Grint"
        "Emma Watson"
)
try_to_win_best_picture_academy_award(
    "Harry Potter and the Philosopher's Stone"
    "Harry Potter and the Sorcerer's Stone"
    DIRECTORS
        "Chris Columbus"
    GENRE
        "Fantasy"
    YEAR
        2001
    CAST
        "Daniel Radcliffe"
        "Rupert Grint"
        "Emma Watson"
)

As you can see, line length only ever impacts the one-value arguments, because proposal A forces BOTH lists to be expanded, because there are two of them (DIRECTORS and CAST). Based on my understanding described above, I think the cutoffs would be at 19 (length of GENRE "Fantasy") and 13 (length of YEAR 2001), but since in those cases both the function name and the title itself are way longer then len, I don't know for sure.

The Shape of Water

Similarly to Harry Potter, I'd expect one of these three:

try_to_win_best_picture_academy_award(
    "The Shape of Water"
    DIRECTORS
        "Guillermo del Toro"
    CAST
        "Sally Hawkins"
        "Michael Shannon"
        "Richard Jenkins"
        "Doug Jones"
        "Michael Stuhlbarg"
        "Octavia Spencer"
    GENRE "Romantic fantasy"
    YEAR 2017
)
try_to_win_best_picture_academy_award(
    "The Shape of Water"
    DIRECTORS
        "Guillermo del Toro"
    CAST
        "Sally Hawkins"
        "Michael Shannon"
        "Richard Jenkins"
        "Doug Jones"
        "Michael Stuhlbarg"
        "Octavia Spencer"
    GENRE
        "Romantic fantasy"
    YEAR 2017
)
try_to_win_best_picture_academy_award(
    "The Shape of Water"
    DIRECTORS
        "Guillermo del Toro"
    CAST
        "Sally Hawkins"
        "Michael Shannon"
        "Richard Jenkins"
        "Doug Jones"
        "Michael Stuhlbarg"
        "Octavia Spencer"
    GENRE
        "Romantic fantasy"
    YEAR
        2017
)

Again, two lists were provided in the call -> both lists must be expanded.

Parasite

I'd expect this to be formatted like this, with the usual variations around GENRE and YEAR based on len:

try_to_win_best_picture_academy_award(
    Parasite
    DIRECTORS
        "Bong Joon-ho"
    CAST
        "Song Kang-ho"
        "Lee Sun-kyun"
        "Cho Yeo-jeong"
        "Choi Woo-shik"
    GENRE "Black comedy thriller"
    YEAR 2019
    FOREIGN_LANGUAGE
)

Everything Everywhere All at Once

Again, same principle (two list -> expand lists) with varying GENRE and YEAR:

try_to_win_best_picture_academy_award(
    "Everything Everywhere All at Once"
    DIRECTORS
        "Daniel Scheinert"
        "Daniel Kwan"
    CAST
        "Michelle Yeoh"
    GENRE "Absurdist comedy-drama"
    YEAR 2022
)

The King's Speech

I'd expect one of these three:

try_to_win_best_picture_academy_award("The King's Speech" DIRECTORS "Tom Hooper")
try_to_win_best_picture_academy_award(
    "The King's Speech"
    DIRECTORS "Tom Hooper"
)
try_to_win_best_picture_academy_award(
    "The King's Speech"
    DIRECTORS
        "Tom Hooper"
)

Naturally, the first option would need len >= 81, and in that case, it would/should be preferred. For smaller lens, I assume len < 26 would be the condition to choose the third one instead of the second one, but I'm fine either way.

Everything Everywhere All at Once (again)

I want to highlight this specific case you did not ask about, but which I had a problem with for the previous implementation (and we have talked about it, I just want to make sure there's no misunderstanding): try_to_win_best_picture_academy_award("Everything Everywhere All at Once" DIRECTORS "Daniel Scheinert" "Daniel Kwan" GENRE "Absurdist comedy-drama" YEAR 2022) (same as yours, without CAST).

I'd expect this to be formatted this way because of the second half of proposal A ("expand lists ... with a provided list argument containing more than one element"), with the usual variations around GENRE and YEAR based on line length:

try_to_win_best_picture_academy_award(
    "Everything Everywhere All at Once"
    DIRECTORS
        "Daniel Scheinert"
        "Daniel Kwan"
    GENRE "Absurdist comedy-drama"
    YEAR 2022
)

In particular, I'd like these to be disallowed:

try_to_win_best_picture_academy_award(
    "Everything Everywhere All at Once"
    DIRECTORS "Daniel Scheinert" "Daniel Kwan" # NOPE
    GENRE "Absurdist comedy-drama"
    YEAR 2022
)
try_to_win_best_picture_academy_award("Everything Everywhere All at Once" DIRECTORS "Daniel Scheinert" "Daniel Kwan" GENRE "Absurdist comedy-drama" YEAR 2022) # NOPE

The Shape of Water (again)

Similarly, I'd like to talk about a "stripped" version of this example, where only one director and cast member are provided: try_to_win_best_picture_academy_award("The Shape of Water" DIRECTORS "Guillermo del Toro" CAST "Sally Hawkins" GENRE "Romantic fantasy" YEAR 2017)

I'd expect this to be formatted this way (with the usual GENREand YEAR variations):

try_to_win_best_picture_academy_award(
    "The Shape of Water"
    DIRECTORS
        "Guillermo del Toro"
    CAST
        "Sally Hawkins"
    GENRE "Romantic fantasy"
    YEAR 2017
)

Note that even though both the DIRECTORS and CAST lists have only one list item, they were expanded because two lists were provided, even if len would allow DIRECTORS "Guillermo del Toro" to be put on one line. This is where proposal B would be less restrictive, allowing (and preferring) this layout for len >= 34:

try_to_win_best_picture_academy_award(
    "The Shape of Water"
    DIRECTORS "Guillermo del Toro" # proposal B only
    CAST "Sally Hawkins" # proposal B only
    GENRE "Romantic fantasy"
    YEAR 2017
)

and this for len >= 24:

try_to_win_best_picture_academy_award(
    "The Shape of Water"
    DIRECTORS
        "Guillermo del Toro"
    CAST "Sally Hawkins" # proposal B only
    GENRE "Romantic fantasy"
    YEAR 2017
)

All other examples would be formatted exactly the same way with proposals A and B because all of them either involve lists with two or more items, triggering expansion in both proposals, or (in case of The King's Speech) not triggering expansion in any of them.


Again, the handling of GENRE/YEAR/FOREIGN_LANGUAGE and the positional arguments w.r.t. line length are based on my understanding from examples and my experiments with the current version of gersemi, and may be wrong.

@BlankSpruce
Copy link
Owner

Excellent work. It seems that my initial understanding was correct. I think these should be the formatting principles for your proposed style:

  • command invocation might be inlined with at most 2 "items"
## line_length: 100
# 2 "items"
try_to_win_best_picture_academy_award("Edge of Tomorrow" "Live Die Repeat")

# 3 "items", 2 titles and 1 DIRECTORS "item"
try_to_win_best_picture_academy_award(
    "Edge of Tomorrow"
    "Live Die Repeat"
    DIRECTORS
        "Doug Liman"
)

# 2 "items", 1 title and 1 DIRECTORS "item"
try_to_win_best_picture_academy_award("The Shape of Water" DIRECTORS "Guillermo del Toro")

# line_length: 80
try_to_win_best_picture_academy_award(
    "The Shape of Water"
    DIRECTORS
        "Guillermo del Toro"
)
  • keyworded "item" with "one value" semantics might be inlined during expansion if it fits the line
## line_length: 100
# 2 "items", 1 title and 1 GENRE
try_to_win_best_picture_academy_award("Everything Everywhere All at Once" YEAR 2022)

# 3 "items", 1 title, 1 GENRE and 1 YEAR
try_to_win_best_picture_academy_award(
    "Everything Everywhere All at Once"
    GENRE "Absurdist comedy-drama"
    YEAR 2022
)

# line_length: 30 (kind of absurd but you get the point)
try_to_win_best_picture_academy_award(
    "Everything Everywhere All at Once"
    YEAR 2022
)
try_to_win_best_picture_academy_award(
    "Everything Everywhere All at Once"
    GENRE
        "Absurdist comedy-drama"
    YEAR 2022
)
  • command invocation can be inlined only if keyworded "item" with "multi value" semantics has one value (this also naturally extends to "item" with "one value" semantics):
## line_length: 100
try_to_win_best_picture_academy_award("The Shape of Water" DIRECTORS "Guillermo del Toro")
try_to_win_best_picture_academy_award(
    "Everything Everywhere All at Once"
    DIRECTORS
        "Daniel Scheinert"
        "Daniel Kwan"
)

## line_length: 80
try_to_win_best_picture_academy_award(
    "The Shape of Water"
    DIRECTORS
        "Guillermo del Toro"
)
try_to_win_best_picture_academy_award(
    "Everything Everywhere All at Once"
    DIRECTORS
        "Daniel Scheinert"
        "Daniel Kwan"
)
  • keyworded "item" with "multi value" semantics never gets inlined during expansion:
## line_length: 40
# CAST and DIRECTORS have "multi value" semantics, GENRE and YEAR have "one value" semantics
try_to_win_best_picture_academy_award(
    "Everything Everywhere All at Once"
    DIRECTORS
        "Daniel Scheinert"
        "Daniel Kwan"
    CAST
        "Michelle Yeoh"
    GENRE "Absurdist comedy-drama"
    YEAR 2022
)

This should lead to the following in the usual codebase:

## line_length: 80
add_library(foo SHARED)

add_library(
    foo
    SHARED
    EXCLUDE_FROM_ALL
)

add_library(
    foo
    SHARED
    a.cpp
    b.cpp
    c.cpp
)

target_link_library(foo PUBLIC bar)

target_link_library(
    foo
    PUBLIC
        bar
        baz
)

target_link_library(
    foo
    PUBLIC
        bar
    PRIVATE
        baz
)

I think with this set of rules I'll be able to cook something up soon.

I have no idea how gersemi chooses which alternative to choose if even the fully expanded layout is invalid. I assume it tries to make as many lines fit as it can, i.e. expand anything that is longer than the max line length, even if that version is still longer or other lines are breaking the lenght limit.

Basically the rule is: once you've expanded there's no going back. Hence the last layout is the one it produces even if it violates line length limit.

@BlankSpruce
Copy link
Owner

@petamas
I'll wait until next Saturday for your feedback. If you don't provide one I'll assume the latest attempt is okay and release new version with that solution.

@petamas
Copy link
Author

petamas commented Mar 1, 2023

Sorry for disappearing, I'll test & reply tomorrow.

@petamas
Copy link
Author

petamas commented Mar 4, 2023

Hi,
I'd like to ask for a bit more patience - I was more swamped this week than I expected. I will find time for a detailed feedback by Tuesday the latest. (Likely on Sunday.)
Sorry again for bot replying sooner.

@BlankSpruce
Copy link
Owner

Not a problem. Take your time. :)

@petamas
Copy link
Author

petamas commented Mar 6, 2023

(I'll write a separate comment about why I'm not 100% sure the approach described above matches what I need.)

Hi,

I've tried testing it, but something isn't right.

I've installed the package in a venv from your branch:

> py -m venv .venv
> .venv\Scripts\activate.bat
> pip install https://github.com/BlankSpruce/gersemi/archive/refs/heads/list-expansion-second-attempt.zip

Then, I ran it this way:

gersemi - -l 120 --list-expansion favour-expansion < issue_0009_list_expansion_favour_expansion.in.cmake

Result was:

### {line_length: 50, list_expansion: favour-expansion}
add_library(foo SHARED)

add_library(foo SHARED EXCLUDE_FROM_ALL)

add_library(
    foo
    SHARED
    a.cpp
    b.cpp
    c.cpp
)

target_link_libraries(foo PUBLIC bar)

target_link_libraries(foo PUBLIC bar baz)

target_link_libraries(foo PUBLIC bar PRIVATE baz)

target_link_libraries(foo PUBLIC bar baz PRIVATE dependency_with_very_long_name another_dependency)

if(TRUE)
    target_link_libraries(foo PUBLIC bar baz)

    target_link_libraries(foo PUBLIC bar baz PRIVATE dependency_with_very_long_name another_dependency)
endif()

if(TRUE)
    if(TRUE)
        target_link_libraries(foo PUBLIC bar baz)

        target_link_libraries(foo PUBLIC bar baz PRIVATE dependency_with_very_long_name another_dependency)
    endif()
endif()

if(
    long_arg__________________________________________________
    AND long_arg__________________________________________________
    AND long_arg__________________________________________________
)
    target_link_libraries(foo PUBLIC bar baz)

    target_link_libraries(foo PUBLIC bar baz PRIVATE dependency_with_very_long_name another_dependency)
endif()

if(long_arg__________________________________________________ AND (FOO AND BAR AND BAZ) AND (FOO AND BAR AND BAZ))
    target_link_libraries(foo PUBLIC bar baz)

    target_link_libraries(foo PUBLIC bar baz PRIVATE dependency_with_very_long_name another_dependency)
endif()

add_custom_command(
    OUTPUT FOOBAR
    COMMAND clang-format -length=1000 -sort-includes -style=some_kind_of_style -verbose -output-replacements-xml
)

add_custom_command(
    OUTPUT FOOBAR
    COMMAND
        clang-format -length=1000 -sort-includes -style=some_kind_of_style -verbose -output-replacements-xml
        "multiline
string"
        -some-flag with_argument -another
)

if(TRUE)
    add_custom_command(
        OUTPUT FOOBAR
        COMMAND clang-format -length=1000 -sort-includes -style=some_kind_of_style -verbose -output-replacements-xml
    )

    add_custom_command(
        OUTPUT FOOBAR
        COMMAND
            clang-format -length=1000 -sort-includes -style=some_kind_of_style -verbose -output-replacements-xml
            "multiline
string"
            -some-flag with_argument -another
    )
endif()

add_custom_command(
    OUTPUT
        FOO
        # first line comment
        # second line comment
        some_other_output
        another_output
    COMMAND
        FOO
        # first line comment
        # second line comment
        some_arg_to_foo_command another_arg_to_foo_command
    COMMAND BAZ
)

This differs significantly from issue_0009_list_expansion_favour_expansion.out.cmake, eg. around every target_link_libraries(foo PUBLIC bar baz). Am I running it wrong?

@BlankSpruce
Copy link
Owner

You've done everything correctly. Somehow it doesn't work with arguments passed in command line. I'll push the correction soon.

@BlankSpruce
Copy link
Owner

I've pushed code to the same branch. It should work now.

I'm eager to see your comments about this style.

@petamas
Copy link
Author

petamas commented Mar 6, 2023

Note: this has been written before I checked the output

Re: #9 (comment)

First of all, what do you consider an item? In my mind, there are the following types of items:

  • keyworded items:
    • keyworded item with "option" semantic: a fixed string without extra values eg. EXCLUDE_FROM_ALL in add_library()
    • keyworded item with "one value" semantic: a fixed string and a single value following it, eg. NAME in add_test()
    • keyworded item with "multi value" semantic: a fixed string and one or more values following it, eg. PUBLIC in target_link_libraries()
  • single-value positional item: one value that has a specific position (for user-defined functions, they are declared in the function() statement), eg. the target name in target_link_libraries()
  • multi-value positional item: rest of the values that are not part of any keyworded items or are a single-value positional item, eg. the file list in add_library() (for user-defined functions, this is usually achieved by using ARGN or <prefix>_UNPARSED_ARGUMENTS).

I'd consider each of these 5 argument types an "item", i.e. add_library(Foo STATIC bar.cpp baz.cpp) has 3 "items":

  • a positional one-value argument (called <name> in the documentation, consisting of the Foo argument)
  • a keyworded item with option semantic (consisting of the STATIC argument)
  • a multi-value positional item (called <source>... in the documentation, consisting of the bar.cpp and baz.cpp arguments)
  • command invocation might be inlined with at most 2 "items"

Why is this requirement needed? Neither of my proposals (A and B) says anything about the number of keyworded items with option or one value semantics, or single-value positional items. They are only addressing the specific formatting issues related to keyworded items with multi value semantics, and to multi-value positional items. The difference is that proposal B forces expansion of the command if any of these kinds of items have at least two values assigned to them, while proposal A forces expansion on the mere presence of two "multi-value semantics" items in addition to the situations expanded by proposal B.

On more concrete examples:

  • this rule would force add_library(Foo STATIC bar.cpp) to be expanded as it has 3 items, while both proposal A and B would allow this to be inlined.
  • Similarly, target_link_options(Foo BEFORE PUBLIC "--bar") (3 items) would be expanded by your rule, while both proposals would allow it to be inlined, as I don't care about non-multi-valued items.
  • target_link_options(Foo BEFORE PUBLIC "--bar" PRIVATE "--baz") (4 items) would be expanded by your rule and proposal A, but allowed to be inlined by proposal B.
  • target_link_options(Foo BEFORE PUBLIC "--bar" "--baz") (3 items) would be expanded by your rule and both of my proposals.
  • command invocation can be inlined only if keyworded "item" with "multi value" semantics has one value (this also naturally extends to "item" with "one value" semantics)
  • keyworded "item" with "multi value" semantics never gets inlined during expansion

These two together are basically equivalent to what I intended proposal B to be, at least for keyworded items with multi-value semantics. I'd propose dropping the "keyworded" part, and extend these rules to positional items with multi-value semantics too. (I'm fine with only implementing this for builtins, as detecting multi-value positional items may be nontrivial for user-defined functions.)

  • keyworded "item" with "one value" semantics might be inlined during expansion if it fits the line

This is a useful clarification, but I think it is not a change compared to how gersemi works by default. (Or is it?)


In conclusion, I'd drop the "command invocation might be inlined with at most 2 "items"" rule, and make the other three apply to positional multi-value arguments of builtin functions.

Next comment will cite examples from our codebase that I found problematic with the current implementation of favour-expansion (so far they are almost exclusively about your first rule unnecessarily forcing expansion on command invocations that would be inlined both by gersemi's default style and my preferences, but I haven't completed my review yet).

@petamas
Copy link
Author

petamas commented Mar 6, 2023

(It is implied by my last paragraph, but not said explicitly: with your last fix it works as you intended, so I could run it on the codebase, and collect examples for my next comment).

@BlankSpruce
Copy link
Owner

First impression is that I'm probably fine with dropping the rule command invocation might be inlined with at most 2 "items" especially when I consider your understanding of non-keyworded arguments and possible distiction of single and multi value positional items. Since currently it's not recognized distinction in the implementation my understanding of your proposal have been unnecessarily warped.

That being said I'd like to see your perspective on yet another example. I wonder if this would "feel right" regardless of the fact that you could easily reason it is consistent with the rules:

set(OPTIONS LIBRARY_TYPE)
set(ONE_VALUE_KEYWORDS LIBRARY_NAME)
set(MULTI_VALUE_KEYWORDS DEPENDENCIES)
cmake_parse_arguments(PREFIX "${OPTIONS}" "${ONE_VALUE_KEYWORDS}" "${MULTI_VALUE_KEYWORDS}" ${ARGN})

add_library(${PREFIX_LIBRARY_NAME} ${PREFIX_LIBRARY_TYPE})
target_link_library(${PREFIX_LIBRARY_NAME} PRIVATE ${PREFIX_DEPENDENCIES} Boost::boost) # adding boost for the sake of example

This invocation of cmake_parse_arguments has 4 single value positional arguments <prefix> <options> <one_value_keywords> <multi_value_keywords> and one multi value positional argument <args>... while the invocation of target_link_library has 1 single value positional argument <target> and 1 keyworded multi value arguments with list of 2 elements.

With line_length: 100, list_expansion: favour-expansion I would be allowed to format it this like that:

set(OPTIONS LIBRARY_TYPE)
set(ONE_VALUE_KEYWORDS LIBRARY_NAME)
set(MULTI_VALUE_KEYWORDS DEPENDENCIES)
cmake_parse_arguments(PREFIX "${OPTIONS}" "${ONE_VALUE_KEYWORDS}" "${MULTI_VALUE_KEYWORDS}" ${ARGN})

add_library(${PREFIX_LIBRARY_NAME} ${PREFIX_LIBRARY_TYPE})
target_link_library(
    ${PREFIX_LIBRARY_NAME} 
    PRIVATE 
        ${PREFIX_DEPENDENCIES} 
        Boost::boost
) # adding boost for the sake of example

Example like that made me think that perhaps counting positional arguments should participate into "let's expand now" functionality.

@petamas
Copy link
Author

petamas commented Mar 6, 2023

Personally, I'm fine with this formatting - while a function with five positional arguments is bad interface design in general, I don't feel that it should be expanded unless the line gets too long. Also, specifically the cmake_parse_arguments() call usually does not contain any useful information to the reader apart from the prefix because it always have the same arguments (the three variables defined in the previous lines, and ${ARGN}), so I'd even argue for always keeping it on one line, even if it's longer than the line limit. (I'm not asking you to implement this, because it would be horrible special-casing, but an argument could be made for it.) A line like this was actually on my "does not feel right" list I'm compiling right now.

@petamas
Copy link
Author

petamas commented Mar 6, 2023

As promised, I collected a bunch of examples from our codebase where the command invocation might be inlined with at most 2 "items" rule needlessly expands the line:

find_package(Qt5 COMPONENTS Core REQUIRED)
get_target_property(fooLocation Foo IMPORTED_LOCATION)
find_program(foo_EXECUTABLE foo HINTS "${fooDirectory}")
file(COPY foo DESTINATION "${CMAKE_CURRENT_BINARY_DIR}/foo")
configure_file(Foo.cpp.in Foo.cpp @ONLY)
list(APPEND foo "BAR")
cmake_policy(SET CMP0077 NEW)
set("${outputVar}" "${result}" PARENT_SCOPE)
math(EXPR pos1 "${pos}+1")

foreach(foo IN LISTS bar)
endforeach()

if(FOO STREQUAL "BAR")
endif()

gersemi - -l 120 --list-expansion favour-expansion results in the following:

find_package(
    Qt5
    COMPONENTS
        Core
    REQUIRED
)

get_target_property(
    fooLocation
    Foo
    IMPORTED_LOCATION
)
find_program(
    foo_EXECUTABLE
    foo
    HINTS
        "${fooDirectory}"
)
file(
    COPY
        foo
    DESTINATION "${CMAKE_CURRENT_BINARY_DIR}/foo"
)
configure_file(
    Foo.cpp.in
    Foo.cpp
    @ONLY
)
list(
    APPEND
    foo
    "BAR"
)
cmake_policy(
    SET
    CMP0077
    NEW
)
set("${outputVar}"
    "${result}"
    PARENT_SCOPE
)
math(
    EXPR
        pos1
        "${pos}+1"
)

foreach(
    foo
    IN
    LISTS
        bar
)
endforeach()

if(
    FOO
        STREQUAL
        "BAR"
)
endif()

I noticed that you nest properties as if they were keyworded arguments when they all part of a set_target_properties() call using favour-expansion:

set_target_properties(
    Foo
    PROPERTIES
        RUNTIME_OUTPUT_DIRECTORY
            "${CMAKE_CURRENT_BINARY_DIR}/output"
        XCODE_ATTRIBUTE_ENABLE_HARDENED_RUNTIME
            YES
)

However, this is not the case for set_property:

set_property(
    TARGET
        Foo
    PROPERTY
        RUNTIME_OUTPUT_DIRECTORY
        "${CMAKE_CURRENT_BINARY_DIR}/output"
)

This becomes even messier-looking if the property has multiple values (this is not supported by set_target_properties(), only set_property()):

set_property(
    TARGET
        Foo
    PROPERTY
        INTERFACE_LINK_DEPENDS
        bar.i
        baz.i
)

I'd prefer these to be formatted this way if gersemi decides to expand:

set_property(
    TARGET
        Foo
    PROPERTY
        RUNTIME_OUTPUT_DIRECTORY
            "${CMAKE_CURRENT_BINARY_DIR}/output"
)

set_property(
    TARGET
        Foo
    PROPERTY
        INTERFACE_LINK_DEPENDS
            bar.i
            baz.i
)

Or I'd allow a slightly terser syntax too:

set_target_properties(
    Foo
    PROPERTIES
        RUNTIME_OUTPUT_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/output"
        XCODE_ATTRIBUTE_ENABLE_HARDENED_RUNTIME  YES
)

set_property(
    TARGET Foo
    PROPERTY
        RUNTIME_OUTPUT_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/output"
)

set_property(
    TARGET Foo
    PROPERTY
        INTERFACE_LINK_DEPENDS
            bar.i
            baz.i
)

Or, in case of the middle one, I'd also allow inlining the whole call if the line length limit allows it:

set_property(TARGET Foo PROPERTY RUNTIME_OUTPUT_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/output")

I understand this would involve some special-casing around set_property() and set_target_properties(), but from the output I assume there already is some special-casing around the latter because you seem to handle basically a list of pairs.

Note that this is much less of a concern on my side, it is just something I've spotted; these calls are rare enough in our codebase that I'm fine with wrapping them / leaving their formatting as gersemi does them.


Our codebase managed to crash gersemi - -l 120 --list-expansion favour-expansion with this line:

set(${_VARIABLE_NAME} ${_VALUE} CACHE ${_TYPE} ${_DOCSTRING} ${_FORCE})

This seem to be present in the "stock" version of gersemi too with -l 30, so it's likely unrelated. Should I open a separate issue for this?


I noticed that long set() calls are actually using the "member function" style of my original proposal 2, i.e. the variable name is kept on the same line as set( even during expansion. Is set() treated specially in some way?

It seems like the "stock" version behaves the same, so this has nothing to do with favour-expansion, and I don't wanna reopen discussion on prop. 2, it's simply that I've just noticed this.


It seems like whatever special-casing you have for the COMMAND argument in add_custom_command(), it does not get applied if someone uses the ARGS keyword, and the arguments will be formatted as if ARGS was a "regular" multi-valued keyword argument. Not a problem for me (only a single invocation uses this keyword, and I can easily remove it), just wanted to mention it so you are aware.

@petamas
Copy link
Author

petamas commented Mar 6, 2023

First impression is that I'm probably fine with dropping the rule command invocation might be inlined with at most 2 "items" especially when I consider your understanding of non-keyworded arguments and possible distiction of single and multi value positional items. Since currently it's not recognized distinction in the implementation my understanding of your proposal have been unnecessarily warped.

Could it be made a recognized distinction? If not, do you have an idea for handling add_library() & co.? (Apart from the already discussed target_sources() and my_add_library(Foo SOURCES ...)). I.e. how could we make add_library(Foo STATIC bar.cpp baz.cpp) always expand?

@BlankSpruce
Copy link
Owner

Our codebase managed to crash gersemi - -l 120 --list-expansion favour-expansion with this line:

set(${_VARIABLE_NAME} ${_VALUE} CACHE ${_TYPE} ${_DOCSTRING} ${_FORCE})

This seem to be present in the "stock" version of gersemi too with -l 30, so it's likely unrelated. Should I open a separate issue for this?

It's not necessary, I'll fix it without an issue. I'm guessing you use variables to introduce options which is something I haven't seen yet.

I noticed that long set() calls are actually using the "member function" style of my original proposal 2, i.e. the variable name is kept on the same line as set( even during expansion. Is set() treated specially in some way?

It seems like the "stock" version behaves the same, so this has nothing to do with favour-expansion, and I don't wanna reopen discussion on prop. 2, it's simply that I've just noticed this.

No, it's just the name + parenthesis fits into 4 characters, exact same amount as indentation and I remember vividly that I made decision to treat it like that. Not very good example of consistency, isn't it?

Could it be made a recognized distinction? If not, do you have an idea for handling add_library() & co.? (Apart from the already discussed target_sources() and my_add_library(Foo SOURCES ...)). I.e. how could we make add_library(Foo STATIC bar.cpp baz.cpp) always expand?

Yes, it can be made (and I think I should have done it earlier), it'll just take a little more time.

Nice observation about set_target_properties vs set_property. It's good to time to revisit these functions.

@petamas
Copy link
Author

petamas commented Mar 6, 2023

It's not necessary, I'll fix it without an issue. I'm guessing you use variables to introduce options which is something I haven't seen yet.

Yeah, it is in a file I haven't ever seen, and in a function that is used a single time, so is not a regular thing. :D We have some weird stuff lurking around in the dark corners of our codebase...

No, it's just the name + parenthesis fits into 4 characters, exact same amount as indentation and I remember vividly that I made decision to treat it like that. Not very good example of consistency, isn't it?

Well, I actually like the result, but it's not very consistent, yeah. :D I also understand why you made the decision, the 4-wide line followed by a 4-wide indent would have looked weird.

Yes, it can be made (and I think I should have done it earlier), it'll just take a little more time.

Thanks!

Nice observation about set_target_properties vs set_property. It's good to time to revisit these functions.

Happy to help! Feel free to ping me if you need more examples / wanna validate expected layouts with a second set of eyeballs.

BlankSpruce added a commit that referenced this issue Mar 22, 2023
Introduced strategies:
- 'favour-inlining' delays expansion of the code as long as possible
- 'favour-expansion' starts expansion when list of arguments has more
than two items or keyworded item has more than one element

Fix all the issues found out in discussion for issue #9
@BlankSpruce
Copy link
Owner

I've been busy lately hence the delay.

Here's the third attempt. Let's hope that third time's the charm indeed.

All the issues mentioned in your comments, @petamas, are implemented there.

@petamas
Copy link
Author

petamas commented Mar 22, 2023

Hey, don't apologize, you're the one doing me a favour. :) Thank you for your work so far!

I'm also quite busy this week, I have a bunch of deadlines, but I'll take the new version for a spin early next week.

@petamas
Copy link
Author

petamas commented Apr 7, 2023

Sorry for disappearing again - I did test the new version last week, but did not have the time to type up my findings until now. In general, the expansion algorithm seems to work as intended, there are only minor troubles with the handling of specific commands.

As before, I'll be using gersemi -l 120 --list-expansion favour-expansion for my examples.

Control-flow statements

In case of certain control flow statements and other "language" commands, favour-expansion seems over-eager.

Given the following:

function(foo bar baz)
endfunction()

the output is this:

function(
    foo
    bar
    baz
)
endfunction()

I assume this happens because function() is modeled as function(<function-name> [<arg>...]), and when we declare two or more arguments, the "positional multi-value argument has more than one value provided in the call" rule kicks in, and forces expansion of the line. However, I'd argue that function() is not a regular command, so an exception should be made from this expansion rule even in favour-expansion mode.

Similarly, foreach(<variable> IN LISTS <list-variable>...) gets too eagerly expanded:

foreach(var IN LISTS list1 list2)
endforeach()

gets turned into

foreach(
    var
    IN
    LISTS
        list1
        list2
)
endforeach()

I also assume something similar is the cause for changing this:

if(foo AND bar AND baz)
endif()

to this:

if(
    foo
    AND bar
    AND baz
)
endif()

I think these should be special-cased to be exempt from the favour-expansion rule. I also think while() and macro() may have the same problems, but I haven't tested them. I also assume foreach(<var> IN ITEMS <item>...) and foreach(<var>... IN ZIP_LISTS <list>...) would be affected as well. I think special-casing is acceptable for these commands, as they're not run-of-the-mill function-like commands, but actually control flow statements.

Unrecognized functions

It seems to me that when gersemi encounters a function it does not recognize, it treats it as if all its arguments are part of a single, multi-value argument. This behaviour seems OK to me, but it also seems like gersemi doesn't "know" cmake_minimum_required_version() and cmake_parse_arguments() because this:

cmake_minimum_required(VERSION 3.20)

function(fooo)
    set(OPTIONS "QUIET")
    set(ONE_VALUE_ARGS "NAME")
    set(MULTI_VALUE_ARGS "SOURCES")
    cmake_parse_arguments("FOOO" ${OPTIONS} ${ONE_VALUE_ARGS} ${MULTI_VALUE_ARGS} ${ARGN})
endfunction()

gets turned into this:

cmake_minimum_required(
    VERSION
    3.20
)

function(fooo)
    set(OPTIONS "QUIET")
    set(ONE_VALUE_ARGS "NAME")
    set(MULTI_VALUE_ARGS "SOURCES")
    cmake_parse_arguments(
        "FOOO"
        ${OPTIONS}
        ${ONE_VALUE_ARGS}
        ${MULTI_VALUE_ARGS}
        ${ARGN}
    )
endfunction()

I think both should be kept as one-liners, because they're under the limit (120 characters), and there are no multi-valued arguments present.

Unrecognized arguments

It seems like gersemi is unaware of the OUTPUT and TARGET keyworded arguments of add_custom_command(), as this:

add_custom_command(
    OUTPUT
        foo.txt
    COMMAND
        "${CMAKE_COMMAND}" -E touch foo.txt
    DEPENDS
        bar.txt
        baz.txt
    VERBATIM
)

add_custom_command(
    TARGET
        Foo
    POST_BUILD
    COMMAND
        "${CMAKE_COMMAND}" -E touch "$<TARGET_FILE_DIR:Foo>/bar.txt"
    VERBATIM
)

gets turned into this:

add_custom_command(
    OUTPUT
    foo.txt
    COMMAND
        "${CMAKE_COMMAND}" -E touch foo.txt
    DEPENDS
        bar.txt
        baz.txt
    VERBATIM
)

add_custom_command(
    TARGET
    Foo
    POST_BUILD
    COMMAND
        "${CMAKE_COMMAND}" -E touch "$<TARGET_FILE_DIR:Foo>/bar.txt"
    VERBATIM
)

, but foo.txt and Foo should be either inlined or nested under OUTPUT and TARGET, respectively, because they're keyworded arguments.

set(CACHE) is formatted weirdly

This line:

set(foo "bar" CACHE STRING "docstring")

gets turned into this:

set(foo
    "bar"
    CACHE
        STRING
        "docstring"
)

There are multiple things I don't understand here. Why did it get expanded? It's way shorter than 120 characters, and there's no multi-value argument there. Also, why are STRING and "docstring" nested under CACHE? Is CACHE modeled as a multi-value keyworded argument that has the type and docstring as values?

The phantom COMPONENTS

find_package(<package> REQUIRED COMPONENTS <component>...) gets formatted as I'd expect, i.e. this:

find_package(Qt5 REQUIRED COMPONENTS Qml)
find_package(Qt5 REQUIRED COMPONENTS Qml QuickCompiler)

turns into this, as it should:

find_package(Qt5 REQUIRED COMPONENTS Qml)
find_package(
    Qt5
    REQUIRED
    COMPONENTS
        Qml
        QuickCompiler
)

However, the COMPONENTS keyword is optional, but if we skip it, then the components will decide to nest under REQUIRED in case of expansion, which looks weird to me:

find_package(Qt5 REQUIRED Qml)
find_package(
    Qt5
    REQUIRED
        Qml
        QuickCompiler
)

If we also skip REQUIRED (which is an option), and go with the COMPONENTS-less variant, gersemi simply lets all components be inlined, which it shouldn't allow for the second call with two components:

find_package(Qt5 Qml)
find_package(Qt5 Qml QuickCompiler)

This issue is not really important to me, as I can just use COMPONENTS everywhere, which leads to a more readable syntax and better formatting (and which is used in 99% of the calls in our codebase already), I just wanted to point it out.

set_target_properties vs set_property(TARGET)

These functions are still not formatted consistently with each other. Based on my experiments, it seems like:

  • Both calls get expanded always, inlining the calls are prohibited
  • The property value(s) in set_target_properties() always get expanded under the property name
  • The property value(s) in set_property(TARGET) always get inlined next to the property name

I.e., the calls are formatted this way in general:

set_target_properties(
    Target
    PROPERTIES
        FOO
            bar
)

set_target_properties(
    Target
    PROPERTIES
        FOO
            bar
        BAZ
            goo
)

set_property(
    TARGET
        Target
    PROPERTY
        FOO bar
)

set_property(
    TARGET
        Target
    PROPERTY
        FOO bar baz
)

This does not seem like the best choice to me for multiple reasons:

  • The first and third calls are semantically equivalent, but they are formatted differently (one inlines the value, the other expands it).
  • If multiple values are provided for a property in set_property(TARGET), they should be expanded under the property name to be consistent with multi-value keyworded arguments.

I'd argue for one of the following options:

  • Both forms should always expand property values under the property names, regardless of the number of values
  • Both forms should inline a single property value next to the property name, but set_property(TARGET) should still expand multiple values under the property name. Number of properties in set_target_properties() does not matter, inlining/expansion should be decided on a property-by-property basis.

Optionally, I'd prefer to have calls that set a single property to a single value to be allowed to be inlined, i.e. set_target_properties(Target PROPERTIES FOO bar) and set_property(TARGET Target PROPERTY FOO bar) should be allowed to be inlined, but set_target_properties(Target PROPERTIES FOO bar BAZ goo) and set_property(TARGET Target PROPERTY FOO bar baz) should not be.

The second option and allowing the whole thing to be a single line would be consistent with what we're doing with multi-value keyworded arguments in favour-expansion mode.

Comments messing up formatting

I noticed that a comment can mess up formatting of add_test() as this:

add_test(
    NAME test_foo
    COMMAND
        "${CMAKE_COMMAND}" -E true
    WORKING_DIRECTORY
        # we have to run this in the binary dir
        "${CMAKE_CURRENT_BINARY_DIR}"
)

gets incorrectly formatted as:

add_test(
    NAME test_foo
    COMMAND
        "${CMAKE_COMMAND}" -E true
    WORKING_DIRECTORY
        # we have to run this in the binary dir
    "${CMAKE_CURRENT_BINARY_DIR}"
)

If I remove the comment line, it gets formatted as it should:

add_test(
    NAME test_foo
    COMMAND
        "${CMAKE_COMMAND}" -E true
    WORKING_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}"
)

The issue seems to be present in "stock" gersemi too, so it's not something this change brought in. It can be easily worked around by putting the comment either before WORKING_DIRECTORY or on the same line as ${CMAKE_CURRENT_BINARY_DIR}, but I wanted you to know about it.

@BlankSpruce
Copy link
Owner

BlankSpruce commented Apr 18, 2023

Brief answers:

  • control-flow statements to be exempted from favour-expansion rule: ok, I'm fine with that.
  • set(CACHE) weird: yes, I've modelled it as mutli-value since that's how I'd read it based on documentation I've revisited
    https://cmake.org/cmake/help/latest/command/set.html?highlight=set
    You can argue that it's very specific exactly-two-value argument. Nevertheless, I'll revisit it.
  • COMPONENTS: yeah, I'll fix that
  • set_*_properties: I'll revisit it, that outcome wasn't intended.
  • comments issue: yeah, the comments are the usual suspects when it comes to bugs. I'll fix that.
  • other issues: I'll fix them.

@petamas
Copy link
Author

petamas commented Apr 18, 2023

Thanks for the follow-up!

I can accept this formatting as it is, the description is quite long usually anyways, triggering expansion, and nesting makes sense, it was just surprising to see first, is all. So from my PoV, changing this can be even dropped if you don't have time / don't want to add special handling to it. (Most of our cache variables are option()s anyway.)

@BlankSpruce
Copy link
Owner

BlankSpruce commented May 2, 2023

Not really funny thing about:

find_package(Qt5 Qml)
find_package(Qt5 Qml QuickCompiler)

It turns out that unless I make some heuristic about <version> positional argument it won't always format as intended. Basic signature described here: https://cmake.org/cmake/help/latest/command/find_package.html#basic-signature makes it ambiguous whether Qml is <version> or first element of components.... I guess I'll leave it as a "known bug".

BlankSpruce added a commit that referenced this issue May 2, 2023
Introduced strategies:
- 'favour-inlining' delays expansion of the code as long as possible
- 'favour-expansion' starts expansion when list of arguments has more
than two items or keyworded item has more than one element

Fix all the issues found out in discussion for issue #9
@BlankSpruce
Copy link
Owner

Here's the fourth attempt. That single find_package case is not fixed as explained in comment above. set_property vs set_target_properties have been made in such way that one can be replaced by another without affecting git blame log for property values. I hope that second third time is the charm.

@petamas
Copy link
Author

petamas commented May 2, 2023

@petamas : Thank you! I'm still testing & evaluating in general, but wanted to notify you that this crashes gersemi:

function(foo)
    cmake_parse_arguments("" "" "FIXTURE;SETUP" "" ${ARGN})
endfunction()

I'm simply running gersemi - for the above input, and it fails with exit code 123 and this message: <stdin>: runtime error, list index out of range. I haven't tested the "stock" version yet.

BlankSpruce added a commit that referenced this issue May 2, 2023
Introduced strategies:
- 'favour-inlining' delays expansion of the code as long as possible
- 'favour-expansion' starts expansion when list of arguments has more
than two items or keyworded item has more than one element

Fix all the issues found out in discussion for issue #9
@BlankSpruce
Copy link
Owner

@petamas I've updated the branch to fix that error.

@petamas
Copy link
Author

petamas commented May 2, 2023

@BlankSpruce: I only managed to find a single (very minor) inconsistency: set_property(TARGET) and set_property(DIRECTORY) are formatted differently:

set_property(
    TARGET
        Foo
    PROPERTY
        RUNTIME_OUTPUT_DIRECTORY
            "${CMAKE_CURRENT_BINARY_DIR}/output"
)

vs

set_property(
    DIRECTORY Foo
    APPEND
    PROPERTY
        CMAKE_CONFIGURE_DEPENDS
            "whatever.py"
)

Is this because you can only supply one directory, while (in theory) you can supply multiple targets? I'm fine with it as-is, just want to understand.


Apart from that, this latest version is perfect for my goals, thanks for implementing it for us! I plan to integrate the released version into our codebase in May, and refer back if any issues come up over time.

Thanks again!

@BlankSpruce
Copy link
Owner

BlankSpruce commented May 2, 2023

That's exactly the reason. The signature allows that:

set_property(<GLOBAL                      |
              DIRECTORY [<dir>]           |
              TARGET    [<target1> ...]   |
              SOURCE    [<src1> ...]
                        [DIRECTORY <dirs> ...]
                        [TARGET_DIRECTORY <targets> ...] |
              INSTALL   [<file1> ...]     |
              TEST      [<test1> ...]     |
              CACHE     [<entry1> ...]    >
             [APPEND] [APPEND_STRING]
             PROPERTY <name> [<value1> ...])

With that concluding major milestone I'll try to release 0.9 version today that has this new style available. For any further issues after the release just open new issue.

@petamas
Copy link
Author

petamas commented May 2, 2023

Will do, and thanks again! It's really cool that you were willing to work through this with me, I cannot express enough how grateful I am.

BlankSpruce added a commit that referenced this issue May 2, 2023
Introduced strategies:
- 'favour-inlining' delays expansion of the code as long as possible
- 'favour-expansion' starts expansion when list of arguments has more
than two items or keyworded item has more than one element

Fix all the issues found out in discussion for issue #9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants