caddyfile: Normalize & flatten all unmarshalers #6037

francislavoie · 2024-01-12T10:31:01Z

This is something I've been noticing over the years; I was always confused as to why we were using a for d.Next() loop around all the Caddyfile unmarshaling code, it never seemed useful.

In practice, the only time when the loop is useful is when we intentionally want to merge multiple consecutive segments, and this only happens for request matchers. For all other module types, we never need this kind of behaviour.

So to improve code readability and make it feel less arcane, it's time to flatten all the unmarshalers to only have a single d.Next() line with no loop. Removes one level of indentation in almost all unmarshalers.

~~Also, I changed all for d.NextBlock(0) { to for nesting := d.Nesting(); d.NextBlock(nesting); { which avoids a hard-coded number, making better use of our provided APIs.~~ This was a bad idea after thinking about it for a few more minutes.

I strongly recommend reviewing with the whitespace collapse option turned on in github for this one.

mholt · 2024-01-17T02:04:23Z

This is something I've been noticing over the years; I was always confused as to why we were using a for d.Next() loop around all the Caddyfile unmarshaling code, it never seemed useful.

In Caddy 1, it was not uncommon for directives to occur multiple times and all their tokens were combined and the parser handled it in one go. This pattern carried over into Caddy 2 for the flexibility it provides in case we do that.

In practice, the only time when the loop is useful is when we intentionally want to merge multiple consecutive segments, and this only happens for request matchers. For all other module types, we never need this kind of behaviour.

That may be true. I hope we don't ever (unintentionally or otherwise) re-introduce the pattern of grouping segments then.

Even though our docs demonstrate using d.Next() at the beginning of the Unmarshaler, it seems pointless that we include the directive name at all IMO. It made more sense to do that when there was the potential to loop over multiple instances of the directive (for d.Next()).

I dunno. I still think the loop is an elegant, robust way to consume tokens, that gives us more flexibility, but at the same time we don't document this pattern that I can recall...

This change is probably fine. I'm slightly sad but not for any technical reason 😂

mholt · 2024-01-17T02:05:32Z

Oh -- actually! We did document this:

// Implementations must be able to support
// multiple segments (instances of their
// directive or batch of tokens); typically
// this means wrapping all token logic in
// a loop: `for d.Next() { ... }`.

So now I wonder if removing it is really beneficial. There will be many plugins that still use the loop.

francislavoie · 2024-01-17T03:57:39Z

No, I'm quite sure nothing uses that pattern except for matchers. It's super unintuitive and isn't something we ever mention as possible except for matchers, so I see no reason plugins would have experimented to the point where they found that they could do this.

Remember that the caller to the unmarshaler must have actually passed a dispenser which includes multiple segments for it to be possible to parse multiple copies of a directive. That never happens, we never pass more than one directive segment to an unmarshaler. It only happens with the named matcher unmarshaler.

The reason it works with matchers is because we do this (httptype.go:1432):

		// in case there are multiple instances of the same matcher, concatenate
		// their tokens (we expect that UnmarshalCaddyfile should be able to
		// handle more than one segment); otherwise, we'd overwrite other
		// instances of the matcher in this set
		tokensByMatcherName := make(map[string][]caddyfile.Token)
		for nesting := d.Nesting(); d.NextArg() || d.NextBlock(nesting); {
			matcherName := d.Val()
			tokensByMatcherName[matcherName] = append(tokensByMatcherName[matcherName], d.NextSegment()...)
		}

We collect tokens by their matcher name together (essentially grouping them sequentially) and then pass that entire set of tokens to the matcher's unmarshaler with err = unm.UnmarshalCaddyfile(caddyfile.NewDispenser(tokens))

We never do this kind of thing anywhere else. We never collect tokens ahead of time by name then feed them to an unmarshaler. We always just pass a single segment in.

One reason we never do it for directives is because of matchers. Since there may be an inline matcher token, we can never group directives that have a matcher with any other with either no matcher or a different matcher. They need to be separate. Also, directives are by design ordered because they run as a middleware chain. It's very rare that a directive might be okay with running nondeterministically at the same time as another. You might think vars, but even that, users may want to set one variable with the value of another variable, so ordering matters there too and it can't be in a single handler instance.

I dunno. I still think the loop is an elegant, robust way to consume tokens

I would agree with you, if the loops ever iterated more than once, which they don't 😂

mholt

Alrighty, we can try this.

As mentioned in Slack, it's a bummer that the unmarshalers now have an unnecessary call to Next() at the beginning. (There's no point in passing in the directive name if there's no loop.) That's my bad, I guess. A carry-over from v1.

Thanks for doing this!!

francislavoie requested a review from mholt January 12, 2024 10:31

francislavoie added the optimization 📉 Performance or cost improvements label Jan 12, 2024

francislavoie added this to the v2.8.0 milestone Jan 12, 2024

francislavoie force-pushed the normalize-caddyfile-unmarshal branch 4 times, most recently from 53daadf to fcfcff6 Compare January 13, 2024 21:08

francislavoie mentioned this pull request Jan 17, 2024

tls: modularize trusted CA providers #5784

Merged

3 tasks

francislavoie force-pushed the normalize-caddyfile-unmarshal branch from fcfcff6 to 2a9b7ef Compare January 17, 2024 06:50

mholt approved these changes Jan 24, 2024

View reviewed changes

caddyfile: Normalize & flatten all unmarshalers

b5a71b0

francislavoie force-pushed the normalize-caddyfile-unmarshal branch from 2a9b7ef to b5a71b0 Compare January 24, 2024 00:24

francislavoie merged commit 750d0b8 into master Jan 24, 2024
24 checks passed

francislavoie deleted the normalize-caddyfile-unmarshal branch January 24, 2024 00:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

caddyfile: Normalize & flatten all unmarshalers #6037

caddyfile: Normalize & flatten all unmarshalers #6037

francislavoie commented Jan 12, 2024 •

edited

Loading

mholt commented Jan 17, 2024

mholt commented Jan 17, 2024 •

edited

Loading

francislavoie commented Jan 17, 2024 •

edited

Loading

mholt left a comment

caddyfile: Normalize & flatten all unmarshalers #6037

caddyfile: Normalize & flatten all unmarshalers #6037

Conversation

francislavoie commented Jan 12, 2024 • edited Loading

mholt commented Jan 17, 2024

mholt commented Jan 17, 2024 • edited Loading

francislavoie commented Jan 17, 2024 • edited Loading

mholt left a comment

Choose a reason for hiding this comment

francislavoie commented Jan 12, 2024 •

edited

Loading

mholt commented Jan 17, 2024 •

edited

Loading

francislavoie commented Jan 17, 2024 •

edited

Loading