Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caddyfile: Normalize & flatten all unmarshalers #6037

Merged
merged 1 commit into from
Jan 24, 2024

Conversation

francislavoie
Copy link
Member

@francislavoie francislavoie commented Jan 12, 2024

This is something I've been noticing over the years; I was always confused as to why we were using a for d.Next() loop around all the Caddyfile unmarshaling code, it never seemed useful.

In practice, the only time when the loop is useful is when we intentionally want to merge multiple consecutive segments, and this only happens for request matchers. For all other module types, we never need this kind of behaviour.

So to improve code readability and make it feel less arcane, it's time to flatten all the unmarshalers to only have a single d.Next() line with no loop. Removes one level of indentation in almost all unmarshalers.

Also, I changed all for d.NextBlock(0) { to for nesting := d.Nesting(); d.NextBlock(nesting); { which avoids a hard-coded number, making better use of our provided APIs. This was a bad idea after thinking about it for a few more minutes.

I strongly recommend reviewing with the whitespace collapse option turned on in github for this one.

@francislavoie francislavoie requested a review from mholt January 12, 2024 10:31
@francislavoie francislavoie added the optimization 📉 Performance or cost improvements label Jan 12, 2024
@francislavoie francislavoie added this to the v2.8.0 milestone Jan 12, 2024
@francislavoie francislavoie force-pushed the normalize-caddyfile-unmarshal branch 4 times, most recently from 53daadf to fcfcff6 Compare January 13, 2024 21:08
@mholt
Copy link
Member

mholt commented Jan 17, 2024

This is something I've been noticing over the years; I was always confused as to why we were using a for d.Next() loop around all the Caddyfile unmarshaling code, it never seemed useful.

In Caddy 1, it was not uncommon for directives to occur multiple times and all their tokens were combined and the parser handled it in one go. This pattern carried over into Caddy 2 for the flexibility it provides in case we do that.

In practice, the only time when the loop is useful is when we intentionally want to merge multiple consecutive segments, and this only happens for request matchers. For all other module types, we never need this kind of behaviour.

That may be true. I hope we don't ever (unintentionally or otherwise) re-introduce the pattern of grouping segments then.

Even though our docs demonstrate using d.Next() at the beginning of the Unmarshaler, it seems pointless that we include the directive name at all IMO. It made more sense to do that when there was the potential to loop over multiple instances of the directive (for d.Next()).

I dunno. I still think the loop is an elegant, robust way to consume tokens, that gives us more flexibility, but at the same time we don't document this pattern that I can recall...

This change is probably fine. I'm slightly sad but not for any technical reason 😂

@mholt
Copy link
Member

mholt commented Jan 17, 2024

Oh -- actually! We did document this:

// Implementations must be able to support
// multiple segments (instances of their
// directive or batch of tokens); typically
// this means wrapping all token logic in
// a loop: `for d.Next() { ... }`.

So now I wonder if removing it is really beneficial. There will be many plugins that still use the loop.

@francislavoie
Copy link
Member Author

francislavoie commented Jan 17, 2024

No, I'm quite sure nothing uses that pattern except for matchers. It's super unintuitive and isn't something we ever mention as possible except for matchers, so I see no reason plugins would have experimented to the point where they found that they could do this.

Remember that the caller to the unmarshaler must have actually passed a dispenser which includes multiple segments for it to be possible to parse multiple copies of a directive. That never happens, we never pass more than one directive segment to an unmarshaler. It only happens with the named matcher unmarshaler.

The reason it works with matchers is because we do this (httptype.go:1432):

		// in case there are multiple instances of the same matcher, concatenate
		// their tokens (we expect that UnmarshalCaddyfile should be able to
		// handle more than one segment); otherwise, we'd overwrite other
		// instances of the matcher in this set
		tokensByMatcherName := make(map[string][]caddyfile.Token)
		for nesting := d.Nesting(); d.NextArg() || d.NextBlock(nesting); {
			matcherName := d.Val()
			tokensByMatcherName[matcherName] = append(tokensByMatcherName[matcherName], d.NextSegment()...)
		}

We collect tokens by their matcher name together (essentially grouping them sequentially) and then pass that entire set of tokens to the matcher's unmarshaler with err = unm.UnmarshalCaddyfile(caddyfile.NewDispenser(tokens))

We never do this kind of thing anywhere else. We never collect tokens ahead of time by name then feed them to an unmarshaler. We always just pass a single segment in.

One reason we never do it for directives is because of matchers. Since there may be an inline matcher token, we can never group directives that have a matcher with any other with either no matcher or a different matcher. They need to be separate. Also, directives are by design ordered because they run as a middleware chain. It's very rare that a directive might be okay with running nondeterministically at the same time as another. You might think vars, but even that, users may want to set one variable with the value of another variable, so ordering matters there too and it can't be in a single handler instance.

I dunno. I still think the loop is an elegant, robust way to consume tokens

I would agree with you, if the loops ever iterated more than once, which they don't 😂

@francislavoie francislavoie force-pushed the normalize-caddyfile-unmarshal branch from fcfcff6 to 2a9b7ef Compare January 17, 2024 06:50
Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alrighty, we can try this.

As mentioned in Slack, it's a bummer that the unmarshalers now have an unnecessary call to Next() at the beginning. (There's no point in passing in the directive name if there's no loop.) That's my bad, I guess. A carry-over from v1.

Thanks for doing this!!

@francislavoie francislavoie force-pushed the normalize-caddyfile-unmarshal branch from 2a9b7ef to b5a71b0 Compare January 24, 2024 00:24
@francislavoie francislavoie merged commit 750d0b8 into master Jan 24, 2024
25 checks passed
@francislavoie francislavoie deleted the normalize-caddyfile-unmarshal branch January 24, 2024 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization 📉 Performance or cost improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants