Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment blocks #170

Open
Xophmeister opened this issue Jan 12, 2023 · 10 comments
Open

Alignment blocks #170

Xophmeister opened this issue Jan 12, 2023 · 10 comments
Labels
P2 major: an upcoming release type: feature request

Comments

@Xophmeister
Copy link
Member

"Indentation blocks", in Topiary -- demarcated by the @{append,prepend}_indent_{start,end} capture names -- increase the indentation level for all targeted nodes. A similar concept is, what I'm calling, "alignment blocks", which do not affect the indentation level, but where subsequent sibling nodes are aligned to the same column as the tagged sibling.

Is your feature request related to a problem? Please describe.

Besides the usual indentation, it is often desirable to align semantically similar syntactic elements over multiple lines. For example:

docker run --tty \
           --rm \
           --volume=foo:bar \
           image:tag

AIUI, this cannot currently be done in Topiary.

Describe the solution you'd like

This would require (at most: four) new capture names: @{append,prepend}_align_{start,end}.

When the *_align_start capture name is applied to a node, that node's offset from the current indent level would be recorded -- say, with a unique identifier -- and "tagged" against every sibling node until the *_align_end capture name (or no siblings remain; see "Inference", below). When formatting happens, that offset and alignment is simply applied to each node, appropriately.

For example:

(parent
  .
  (_first_child) @prepend_align_start
) @append_align_end

(parent
  (_child) @append_spaced_softline
)

Applied to the following example syntax tree:

  • root
    • parent
      • child
      • parent
        • child
        • child
      • child

...might mark up like so:

  • root
    • parent
      • <align_start(SOME_UUID)> child <spaced_softline>
      • <align_to(SOME_UUID)> parent
        • <align_start(NEW_UUID)> child <spaced_softline>
        • <align_to(NEW_UUID)> child <spaced_softline>
      • <align_end(NEW_UUID)> <spaced_softline> <align_to(SOME_UUID)> child <spaced_softline>
  • <align_end(SOME_UUID)> <EOF>

...finally rendering as:

parent child
       parent child
              child
       child

Describe alternatives you've considered

In the given example, it wouldn't be unreasonable (nor unattested) to use an indentation block to achieve a similar end. Something like:

docker run \
  --tty \
  --rm \
  --volume=foo:bar \
  image:tag

It may not always be appropriate to do this. However, it's a trade-off against how complex it would be to implement alignment blocks.

(Note: This particular example may not be strictly realistic. AFAIK, the Tree Sitter Bash grammar doesn't distinguish between subcommands and command line arguments; line continuations may also be tricky to deal with... This is purely illustrative!)

Additional context

Inference
The *_align_end capture names may not be necessary. Instead, the end of the alignment block could be inferred once all siblings have been exhausted. (Of course, having explicit *_align_end capture names gives finer control and allows you to do things that can't be done when considering all siblings. However, this is likely to be a very rare need.)

Childlike Siblings
The alignment block would apply to all siblings as though they were equal. However, there are times when siblings are, semantically speaking, more like child nodes. This would be at the mercy of the grammar.

For example, both --long-opt=value and --long-opt value are commonly attested command line argument formats. The first instance may be considered a single sibling, whereas the second might be parsed as two siblings (i.e., (option) (option) rather than (option (value))). This would then get uglily rendered like so:

do_something --some-flag \
             --long-opt \
             value

Tabs vs. Spaces
Alignment should always be done with spaces. Indentation is currently done with spaces, but may change in the future to support tabs, etc. (see #105).

Wide Characters
Alignment involving wide characters (e.g., CJK, emojis, etc.) may be tricky.

Post-Processing
It may not be easy/possible to calculate the tagged node's final resting place (i.e., indent level + required alignment padding). This may be further exacerbated by post-processing that eliminates runs of whitespace.

@aspiwack
Copy link
Member

@torhovland didn't reply here because he new I would rant 😛 .

There is no technical obstacle to doing something like this. However, I contend that alignment with current tools is a miserable experience and you don't want to do it ever. People who know me are probably bored to death, already but my position is that there is only one known reasonable way of doing code alignment today, and it's elastic tabstops. There is a second advantage for me: it saves me time because the website already contains most of my rant.

A point that isn't mentioned on the Elastic Tabstops website, by the way, is that alignment with spaces is very much in opposition to the guiding principle that we have to strive and minimize diffs produced by formatting.

The problem is that elastic tabstops require editor support. And as long as it isn't standard, we can't use it. And as long as we can't use elastic tabstops, I much prefer that we don't try and align things.

@Xophmeister
Copy link
Member Author

I'd completely forgotten about elastic tabstops. I'm not completely convinced by them, besides the obvious lack of rendering support... I guess the tab vs. spaces holy war continues 😅

@aspiwack
Copy link
Member

This is a very different tab vs spaces holy war though. Should we have different names for them? Asking the big questions today.

@ErinvanderVeen ErinvanderVeen added P3 minor: not priorized P2 major: an upcoming release and removed P3 minor: not priorized labels Jul 5, 2023
@treuherz
Copy link

I know this is a quiet topic, but I want to point out that the proposal as written wouldn’t be sufficient to support other attested applications of alignment, specifically in Go. There’s an example in gofmt’s tests of the before and after of reformatting struct literals but it applies the same logic to var blocks or struct declarations. I believe it uses an elastic tabstops implementation to do this, funnily enough.

In these cases the sibling nodes are conventionally indented and the “cousin” nodes, nodes at a particular field within each sibling, are the ones being aligned. I don’t think this would be supported by what’s been set out here.

@Xophmeister
Copy link
Member Author

Thanks, @treuherz.

In the example you gave, we see this alignment:

Before                          After

{                               {
	S: "Hello World",             	S:       "Hello World",
	Integer: 42,                  	Integer: 42,
},                              }

In Tree-sitter (depending on the grammar, of course), those record fields would be considered siblings. I suppose by "cousins" you mean to imply that, in the case where there are multiple records in an array, then each record should be aligned uniformly. The example doesn't demonstrate that, because each record is the same (i.e., even if cousin-alignment is happening, it would be indistinguishable from sibling-alignment). Consider this example, instead:

Before                   Sibling                  Cousin

[                        [                        [
  {                        {                        {
    "something": 123         "something": 123         "something": 123
    "else": 123              "else":      123         "else":      123
  },                       },                       },
  {                        {                        {
    "another": 123           "another": 123           "another":   123
    "thing": 123             "thing":   123           "thing":     123
  ]                        ]                        ]
}                        }                        }

What this issue is proposing is the middle option (sibling-alignment). Is what you mean by "cousin"-alignment shown by the right column? Does gofmt do this, in the case of heterogeneous elements?

(AFAIK, without looking into it too much, I believe sibling-alignment is the best that can be achieved. However, cousin-alignment would be interesting to look into...)

@treuherz
Copy link

treuherz commented Nov 13, 2023

I did mean the middle option from your example. I've used words unclearly here, let me try again

Gofmt would take this input

var v = []Struct{
	{
		a: "a string",
		foo: 42,
		longerName: false,
	},
	{
		a: "",
	},
	{
		a: "blah",
		foo: -999,
	},
}

var (
	a int = 1
	b = 2
	c = 3

	d WrappedInt = 4
	e int8 = 5
	f WrappedInt = 6
	g = 7

	f = 8
	g = 9
	hijk = 10
)

and produce

var v = []Struct{
	{
		a:          "a string",
		foo:        42,
		longerName: false,
	},
	{
		a: "",
	},
	{
		a:   "blah",
		foo: -999,
	},
}

var (
	a int = 1
	b     = 2
	c     = 3

	d WrappedInt = 4
	e int8       = 5
	f WrappedInt = 6
	g            = 7

	f    = 8
	g    = 9
	hijk = 10
)

I'll begin with the first block, the struct literal. The point of alignment is the beginning of the field values within each struct, which are not not siblings within the grammar. Instead each one is a child of a sibling keyed_element, which is why I called them cousins, as distinct from the bash example where every element is just an argument of the command on the same level of nesting.

In the second section of the example, the var block, the point of alignment is the equals sign within each var_spec sibling. The alignment groups here are delimited not by any sematic element but by vertical whitespace.

@Xophmeister
Copy link
Member Author

Xophmeister commented Nov 13, 2023

Thanks, @treuherz; now I understand and I completely see your point. So what I was calling "sibling-alignment" is actually "cousin-alignment" and, FWIW, my "cousin-alignment" is "second-cousin-alignment" to stretch the metaphor.

You are (of course) correct. The initial proposal was for siblings in, e.g., Bash commands or Lisp s-expressions, etc. However, alignment of essentially key-value pairs is both common -- gofmt, as you point out, and terraform fmt immediately come to mind -- and, I think, important. Therefore, it ought to be covered by this issue.

(Now I wonder if the generalisation of "Nth-cousin-alignment", for $N\ge 0$ (where a zeroth-cousin is a sibling 😅), might be worth pursuing. Perhaps not from the outset, but as a long-term goal...)

@josharian
Copy link

A case in which alignment blocks turn out to be somewhat important is templating languages. (I'm toying around with getting topiary to work on the Go templating language right now.)

Consider this (you can guess enough of the semantics):

<div>
  {{ if .X }}
    <div>
      {{ template "other" (Combine
            (A .A)
            (B .B)
         )
      }}
    </div>
  {{ end }}
</div>

Using any alignment for the (A .A) and (B .B) that is only aware of the templating language syntax won't work out well, because there are "external" alignment considerations from the intervening HTML. (And it isn't necessarily HTML. The templating language supports arbitrary text, including other indented content like Go and SQL and the like.)

@josharian
Copy link

I know this is a quiet topic, but I want to point out that the proposal as written wouldn’t be sufficient to support other attested applications of alignment, specifically in Go.

I should perhaps add here that gofmt uses even fancier heuristics. See golang/go#10392 (comment) for an old example. The heuristics have gotten even more complicated since then.

But (IMHO), topiary can let gofmt be gofmt, and still provide tons of value by making it easy to write "good enough" formatters for the long tail of inputs for which there is not enough incentive to write a bespoke, hand-tuned formatter.

@Xophmeister
Copy link
Member Author

(Another "vote" for this feature via #679)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 major: an upcoming release type: feature request
Projects
None yet
Development

No branches or pull requests

6 participants