-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alignment blocks #170
Comments
@torhovland didn't reply here because he new I would rant 😛 . There is no technical obstacle to doing something like this. However, I contend that alignment with current tools is a miserable experience and you don't want to do it ever. People who know me are probably bored to death, already but my position is that there is only one known reasonable way of doing code alignment today, and it's elastic tabstops. There is a second advantage for me: it saves me time because the website already contains most of my rant. A point that isn't mentioned on the Elastic Tabstops website, by the way, is that alignment with spaces is very much in opposition to the guiding principle that we have to strive and minimize diffs produced by formatting. The problem is that elastic tabstops require editor support. And as long as it isn't standard, we can't use it. And as long as we can't use elastic tabstops, I much prefer that we don't try and align things. |
I'd completely forgotten about elastic tabstops. I'm not completely convinced by them, besides the obvious lack of rendering support... I guess the tab vs. spaces holy war continues 😅 |
This is a very different tab vs spaces holy war though. Should we have different names for them? Asking the big questions today. |
I know this is a quiet topic, but I want to point out that the proposal as written wouldn’t be sufficient to support other attested applications of alignment, specifically in Go. There’s an example in gofmt’s tests of the before and after of reformatting struct literals but it applies the same logic to var blocks or struct declarations. I believe it uses an elastic tabstops implementation to do this, funnily enough. In these cases the sibling nodes are conventionally indented and the “cousin” nodes, nodes at a particular field within each sibling, are the ones being aligned. I don’t think this would be supported by what’s been set out here. |
Thanks, @treuherz. In the example you gave, we see this alignment:
In Tree-sitter (depending on the grammar, of course), those record fields would be considered siblings. I suppose by "cousins" you mean to imply that, in the case where there are multiple records in an array, then each record should be aligned uniformly. The example doesn't demonstrate that, because each record is the same (i.e., even if cousin-alignment is happening, it would be indistinguishable from sibling-alignment). Consider this example, instead:
What this issue is proposing is the middle option (sibling-alignment). Is what you mean by "cousin"-alignment shown by the right column? Does (AFAIK, without looking into it too much, I believe sibling-alignment is the best that can be achieved. However, cousin-alignment would be interesting to look into...) |
I did mean the middle option from your example. I've used words unclearly here, let me try again Gofmt would take this input var v = []Struct{
{
a: "a string",
foo: 42,
longerName: false,
},
{
a: "",
},
{
a: "blah",
foo: -999,
},
}
var (
a int = 1
b = 2
c = 3
d WrappedInt = 4
e int8 = 5
f WrappedInt = 6
g = 7
f = 8
g = 9
hijk = 10
) and produce var v = []Struct{
{
a: "a string",
foo: 42,
longerName: false,
},
{
a: "",
},
{
a: "blah",
foo: -999,
},
}
var (
a int = 1
b = 2
c = 3
d WrappedInt = 4
e int8 = 5
f WrappedInt = 6
g = 7
f = 8
g = 9
hijk = 10
) I'll begin with the first block, the struct literal. The point of alignment is the beginning of the field values within each struct, which are not not siblings within the grammar. Instead each one is a child of a sibling In the second section of the example, the var block, the point of alignment is the equals sign within each |
Thanks, @treuherz; now I understand and I completely see your point. So what I was calling "sibling-alignment" is actually "cousin-alignment" and, FWIW, my "cousin-alignment" is "second-cousin-alignment" to stretch the metaphor. You are (of course) correct. The initial proposal was for siblings in, e.g., Bash commands or Lisp s-expressions, etc. However, alignment of essentially key-value pairs is both common -- (Now I wonder if the generalisation of "Nth-cousin-alignment", for |
A case in which alignment blocks turn out to be somewhat important is templating languages. (I'm toying around with getting topiary to work on the Go templating language right now.) Consider this (you can guess enough of the semantics):
Using any alignment for the |
I should perhaps add here that gofmt uses even fancier heuristics. See golang/go#10392 (comment) for an old example. The heuristics have gotten even more complicated since then. But (IMHO), topiary can let gofmt be gofmt, and still provide tons of value by making it easy to write "good enough" formatters for the long tail of inputs for which there is not enough incentive to write a bespoke, hand-tuned formatter. |
(Another "vote" for this feature via #679) |
"Indentation blocks", in Topiary -- demarcated by the
@{append,prepend}_indent_{start,end}
capture names -- increase the indentation level for all targeted nodes. A similar concept is, what I'm calling, "alignment blocks", which do not affect the indentation level, but where subsequent sibling nodes are aligned to the same column as the tagged sibling.Is your feature request related to a problem? Please describe.
Besides the usual indentation, it is often desirable to align semantically similar syntactic elements over multiple lines. For example:
AIUI, this cannot currently be done in Topiary.
Describe the solution you'd like
This would require (at most: four) new capture names:
@{append,prepend}_align_{start,end}
.When the
*_align_start
capture name is applied to a node, that node's offset from the current indent level would be recorded -- say, with a unique identifier -- and "tagged" against every sibling node until the*_align_end
capture name (or no siblings remain; see "Inference", below). When formatting happens, that offset and alignment is simply applied to each node, appropriately.For example:
Applied to the following example syntax tree:
root
parent
child
parent
child
child
child
...might mark up like so:
root
parent
<align_start(SOME_UUID)>
child
<spaced_softline>
<align_to(SOME_UUID)>
parent
<align_start(NEW_UUID)>
child
<spaced_softline>
<align_to(NEW_UUID)>
child
<spaced_softline>
<align_end(NEW_UUID)>
<spaced_softline>
<align_to(SOME_UUID)>
child
<spaced_softline>
<align_end(SOME_UUID)>
<EOF>
...finally rendering as:
Describe alternatives you've considered
In the given example, it wouldn't be unreasonable (nor unattested) to use an indentation block to achieve a similar end. Something like:
It may not always be appropriate to do this. However, it's a trade-off against how complex it would be to implement alignment blocks.
(Note: This particular example may not be strictly realistic. AFAIK, the Tree Sitter Bash grammar doesn't distinguish between subcommands and command line arguments; line continuations may also be tricky to deal with... This is purely illustrative!)
Additional context
Inference
The
*_align_end
capture names may not be necessary. Instead, the end of the alignment block could be inferred once all siblings have been exhausted. (Of course, having explicit*_align_end
capture names gives finer control and allows you to do things that can't be done when considering all siblings. However, this is likely to be a very rare need.)Childlike Siblings
The alignment block would apply to all siblings as though they were equal. However, there are times when siblings are, semantically speaking, more like child nodes. This would be at the mercy of the grammar.
For example, both
--long-opt=value
and--long-opt value
are commonly attested command line argument formats. The first instance may be considered a single sibling, whereas the second might be parsed as two siblings (i.e.,(option) (option)
rather than(option (value))
). This would then get uglily rendered like so:Tabs vs. Spaces
Alignment should always be done with spaces. Indentation is currently done with spaces, but may change in the future to support tabs, etc. (see #105).
Wide Characters
Alignment involving wide characters (e.g., CJK, emojis, etc.) may be tricky.
Post-Processing
It may not be easy/possible to calculate the tagged node's final resting place (i.e., indent level + required alignment padding). This may be further exacerbated by post-processing that eliminates runs of whitespace.
The text was updated successfully, but these errors were encountered: