Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal error: concurrent map read and map write after upgrading 2.7.6 to 2.8.4 and/or adding github.com/caddyserver/cache-handler #6683

Open
christophcemper opened this issue Nov 10, 2024 · 41 comments
Labels
bug 🐞 Something isn't working needs info 📭 Requires more information

Comments

@christophcemper
Copy link

1. What version of Caddy are you running (caddy -version)?

now 2.8.4 with

xcaddy build --output bin/LRTcaddy --with github.com/christophcemper/caddy-netlify-redirects@v0.2.4 --with github.com/caddyserver/transform-encoder --with github.com/pteich/caddy-tlsconsul --with github.com/caddyserver/cache-handler

after 10 month stable 2.7.6. with

xcaddy build --output bin/LRTcaddy --with github.com/christophcemper/caddy-netlify-redirects@v0.2.4 --with github.com/caddyserver/transform-encoder --with github.com/pteich/caddy-tlsconsul

that did not experience this very rare, unseen before crash

2. What are you trying to do?

Multiple production hosts, reverse proxies, etc.

We added the module github.com/caddyserver/cache-handler for a project
and (unfortunately) also upgraded from 2.7.6 to 2.8.4. in one step.

Since then we had 2 occassions of a crash of Caddy on Oct 28 and Nov 10
that caused a total outage and could only be remedied with a manual restart.

3. What is your entire Caddyfile?

Sorry we cannot share it in public, happy to send directly to mholt or other key people.

4. How did you run Caddy (give the full command and describe the execution environment)?

/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile


via a systemd service

systemctl status caddy
● caddy.service - Caddy
     Loaded: loaded (/lib/systemd/system/caddy.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2024-11-10 11:45:13 UTC; 7h ago
       Docs: https://caddyserver.com/docs/
    Process: 1929477 ExecReload=/usr/bin/caddy reload --config /etc/caddy/Caddyfile --force (code=exited, status=0/SUCCESS)
   Main PID: 1879405 (caddy)
      Tasks: 39 (limit: 154411)
     Memory: 2.2G
     CGroup: /system.slice/caddy.service
             └─1879405 /usr/bin/caddy run --environ --config /etc/caddy/Caddyfile

5. What did you expect to see?

Stable caddy like all the time before.

6. What did you see instead (give full error messages and/or log)?

Oct 28 (intro for 1st goroutine - longer log attached)

Oct 28 15:05:08 web01 caddy[2621539]: fatal **error:** concurrent map read and map write

Oct 28 15:05:08 web01 caddy[2621539]: goroutine 3873976361 [running]:

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.GetVar({0x1f9bd08?, 0xc015b73290?}, {0x1a68ef6, 0xe})

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/vars.go:323 +0x57

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f3291d534b0, 0xc049806a80}, 0xc045f00480)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:257 +0x1c6

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f3291d534b0?, 0xc049806a80?}, 0xe?)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f3291d534b0, 0xc049806a80}, 0xc045f00480)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f3291d534b0?, 0xc049806a80?}, 0xe?)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f3291d534b0, 0xc049806a80}, 0xc045f00480)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f3291d534b0?, 0xc049806a80?}, 0xe?)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f3291d534b0, 0xc049806a80}, 0xc045f00480)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f3291d534b0?, 0xc049806a80?}, 0xe?)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f3291d534b0, 0xc049806a80}, 0xc045f00480)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f3291d534b0?, 0xc049806a80?}, 0xe?)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f3291d534b0, 0xc049806a80}, 0xc045f00480)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f3291d534b0?, 0xc049806a80?}, 0xe?)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f3291d534b0, 0xc049806a80}, 0xc045f00480)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f3291d534b0?, 0xc049806a80?}, 0xe?)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f3291d534b0, 0xc049806a80}, 0xc045f00480)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f3291d534b0?, 0xc049806a80?}, 0xe?)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f3291d534b0, 0xc049806a80}, 0xc045f00480)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f3291d534b0?, 0xc049806a80?}, 0xe?)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f3291d534b0, 0xc049806a80}, 0xc045f00480)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f3291d534b0?, 0xc049806a80?}, 0xe?)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f3291d534b0, 0xc049806a80}, 0xc045f00480)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f3291d534b0?, 0xc049806a80?}, 0xe?)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f3291d534b0, 0xc049806a80}, 0xc045f00480)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f3291d534b0?, 0xc049806a80?}, 0xe?)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f3291d534b0, 0xc049806a80}, 0xc045f00480)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f86720?, {0x7f3291d534b0?, 0xc049806a80?}, 0x2c2acc0?)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/modules/caddyhttp.(*Server).ServeHTTP(0xc0029f4908, {0x1f98160, 0xc012dee000}, 0xc000c76480)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/server.go:384 +0x111e

Oct 28 15:05:08 web01 caddy[2621539]: net/http.serverHandler.ServeHTTP({0xc0414a2f80?}, {0x1f98160?, 0xc012dee000?}, 0xc03c2f0ce8?)

Oct 28 15:05:08 web01 caddy[2621539]:         net/http/server.go:3142 +0x8e

Oct 28 15:05:08 web01 caddy[2621539]: net/http.initALPNRequest.ServeHTTP({{0x1f9bd08?, 0xc010a6b9e0?}, 0xc02ec19508?, {0xc0004b5770?}}, {0x1f98160, 0xc012dee000}, 0xc000c76480)

Oct 28 15:05:08 web01 caddy[2621539]:         net/http/server.go:3750 +0x231

Oct 28 15:05:08 web01 caddy[2621539]: golang.org/x/net/http2.(*serverConn).runHandler(0x448ebd?, 0x0?, 0x0?, 0xc0490e41c0?)

Oct 28 15:05:08 web01 caddy[2621539]:         golang.org/x/net@v0.29.0/http2/server.go:2414 +0x113

Oct 28 15:05:08 web01 caddy[2621539]: created by golang.org/x/net/http2.(*serverConn).scheduleHandler in goroutine 3874036131

Oct 28 15:05:08 web01 caddy[2621539]:         golang.org/x/net@v0.29.0/http2/server.go:2348 +0x21d

Oct 28 15:05:08 web01 caddy[2621539]: goroutine 1 [select (no cases), 18227 minutes]:

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/cmd.cmdRun({0x0?})

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/cmd/commandfuncs.go:283 +0xbfc

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/cmd.init.1.func2.WrapCommandFuncForCobra.1(0xc0005a8608, {0x1a4fb72?, 0x4?, 0x1a4fb3a?})

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/cmd/cobra.go:137 +0x2f

Oct 28 15:05:08 web01 caddy[2621539]: github.com/spf13/cobra.(*Command).execute(0xc0005a8608, {0xc00069a630, 0x3, 0x3})

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/spf13/cobra@v1.8.0/command.go:983 +0xaca

Oct 28 15:05:08 web01 caddy[2621539]: github.com/spf13/cobra.(*Command).ExecuteC(0x2c61ba0)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/spf13/cobra@v1.8.0/command.go:1115 +0x3ff

Oct 28 15:05:08 web01 caddy[2621539]: github.com/spf13/cobra.(*Command).Execute(...)

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/spf13/cobra@v1.8.0/command.go:1039

Oct 28 15:05:08 web01 caddy[2621539]: github.com/caddyserver/caddy/v2/cmd.Main()

Oct 28 15:05:08 web01 caddy[2621539]:         github.com/caddyserver/caddy/v2@v2.8.4/cmd/main.go:75 +0x1d8

Oct 28 15:05:08 web01 caddy[2621539]: main.main()

Oct 28 15:05:08 web01 caddy[2621539]:         caddy/main.go:15 +0xf

Oct 28 15:05:08 web01 caddy[2621539]: goroutine 12 [select, 18227 minutes]:

Nov 10 (Intro for 1st go routine, longer log attached)

Nov 10 11:29:32 web01 caddy[2314484]: fatal **error:** concurrent map read and map write

Nov 10 11:29:32 web01 caddy[2314484]: goroutine 3800443627 [running]:

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.GetVar({0x1f9bd08?, 0xc03724ac90?}, {0x1a68ef6, 0xe})

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/vars.go:323 +0x57

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:257 +0x1c6

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f9bd08?, {0x7f430fb2afd0?, 0xc01424db80?}, 0xe?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.RouteList.Compile.wrapRoute.func1.1({0x7f430fb2afd0, 0xc01424db80}, 0xc006df2b40)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/routes.go:268 +0x244

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.HandlerFunc.ServeHTTP(0x1f86720?, {0x7f430fb2afd0?, 0xc01424db80?}, 0x2c2acc0?)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/caddyhttp.go:58 +0x29

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/modules/caddyhttp.(*Server).ServeHTTP(0xc0003cdb08, {0x1f98160, 0xc0046985b8}, 0xc0209e3c20)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/server.go:384 +0x111e

Nov 10 11:29:32 web01 caddy[2314484]: net/http.serverHandler.ServeHTTP({0x1a4fc00?}, {0x1f98160?, 0xc0046985b8?}, 0xc033cc79e0?)

Nov 10 11:29:32 web01 caddy[2314484]:         net/http/server.go:3142 +0x8e

Nov 10 11:29:32 web01 caddy[2314484]: net/http.initALPNRequest.ServeHTTP({{0x1f9bd08?, 0xc02b94c900?}, 0xc0211ef888?, {0xc01d9165a0?}}, {0x1f98160, 0xc0046985b8}, 0xc0209e3c20)

Nov 10 11:29:32 web01 caddy[2314484]:         net/http/server.go:3750 +0x231

Nov 10 11:29:32 web01 caddy[2314484]: golang.org/x/net/http2.(*serverConn).runHandler(0x448ebd?, 0x0?, 0x0?, 0xc00048f778?)

Nov 10 11:29:32 web01 caddy[2314484]:         golang.org/x/net@v0.29.0/http2/server.go:2414 +0x113

Nov 10 11:29:32 web01 caddy[2314484]: created by golang.org/x/net/http2.(*serverConn).scheduleHandler in goroutine 3800442958

Nov 10 11:29:32 web01 caddy[2314484]:         golang.org/x/net@v0.29.0/http2/server.go:2348 +0x21d

Nov 10 11:29:32 web01 caddy[2314484]: goroutine 1 [select (no cases), 18469 minutes]:

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/cmd.cmdRun({0x0?})

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/cmd/commandfuncs.go:283 +0xbfc

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/cmd.init.1.func2.WrapCommandFuncForCobra.1(0xc0005e2308, {0x1a4fb72?, 0x4?, 0x1a4fb3a?})

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/cmd/cobra.go:137 +0x2f

Nov 10 11:29:32 web01 caddy[2314484]: github.com/spf13/cobra.(*Command).execute(0xc0005e2308, {0xc000307d70, 0x3, 0x3})

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/spf13/cobra@v1.8.0/command.go:983 +0xaca

Nov 10 11:29:32 web01 caddy[2314484]: github.com/spf13/cobra.(*Command).ExecuteC(0x2c61ba0)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/spf13/cobra@v1.8.0/command.go:1115 +0x3ff

Nov 10 11:29:32 web01 caddy[2314484]: github.com/spf13/cobra.(*Command).Execute(...)

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/spf13/cobra@v1.8.0/command.go:1039

Nov 10 11:29:32 web01 caddy[2314484]: github.com/caddyserver/caddy/v2/cmd.Main()

Nov 10 11:29:32 web01 caddy[2314484]:         github.com/caddyserver/caddy/v2@v2.8.4/cmd/main.go:75 +0x1d8

Nov 10 11:29:32 web01 caddy[2314484]: main.main()

Nov 10 11:29:32 web01 caddy[2314484]:         caddy/main.go:15 +0xf

7. How can someone who is starting from scratch reproduce this behavior as minimally as possible?

Unfortunately, we could not reproduce the issue yet in any way, nor do we know how to isolate this down to the host/domain that is causing this.

We have not yet tried to down-grade to 2.7.6. and see if that happens again with the cache module,
but then if it only occurs every 3 or so weeks randomly we're not even sure if that makes sense at this point.

We would kindly ask for help to enable/change to more helpful logging details for this than the raw unqualified go stack-trace

caddy-crash-2024-11-10.log
caddy-crash-2024-10-28.log

@mholt
Copy link
Member

mholt commented Nov 10, 2024

Thanks; a quick preliminary look suggests this may be a simple fix in the vars middleware. Will get back to this soon 🙂

@christophcemper
Copy link
Author

christophcemper commented Nov 10, 2024

Wow, thanks so much for the quick response.

Maybe unrelated, but caught my eye: the xcaddy switched implicitly from our company standard golang 1.21.8 to 1.22.0, 1.22.8 or 1.22.9 depending on which log line I look at in our logs 🤔

Image
Image

I find that highly confusing atm tbh,
and we have no problem relaxing that standard to a "recommended golang version for caddy",
but unfortunately have found some crazy performance penalites in 1.22.x and 1.23.x for some of our cases, so are very careful/cautious with those new and shiny versions full of features we don't need (yet).

@francislavoie
Copy link
Member

Go itself now uses the latest version of the runtime when building. It downloads the latest patch version that matches the minimum minor version requirement. See https://tip.golang.org/doc/toolchain

@francislavoie
Copy link
Member

Without a config we cannot minimally reproduce this, and therefore can't fix this. Please share a minimally reproducing config.

@christophcemper
Copy link
Author

christophcemper commented Nov 10, 2024

Sorry, we also cannot reproduce it as mentioned, and visible from the frequency,
and @mholt mentioned there's possibly a low hanging fruit sanitizing.

If you want to review the full production config in private, please let me know how to share.

@christophcemper
Copy link
Author

christophcemper commented Nov 10, 2024

In addition to that, it's beyond my imagination how we would want to be able to reproduce some random web traffic pattern of X production web sites for an obvious timing/race condition problem that occurs every 3+ weeks... but maybe enlighten me how we could approach that, as mentioned.

@mohammed90
Copy link
Member

You're describing a race in map access. The odd thing is, the subject map is scoped per HTTP request:

// GetVar gets a value out of the context's variable table by key.
// If the key does not exist, the return value will be nil.
func GetVar(ctx context.Context, key string) any {
varMap, ok := ctx.Value(VarsCtxKey).(map[string]any)
if !ok {
return nil
}
return varMap[key]
}

I, for one, cannot imagine 2 HTTP requests sharing the same context, so maybe we're missing something. If you can share minimal config, maybe we can all look at reproducing it with, perhaps, Vegeta.

@christophcemper
Copy link
Author

hmmm... maybe late, but that code snippet is just a read.

Where is the write that conflicts?

And how is correct concurrency ensured? Are there mutex'es? Is the context's variable "thread safe" already?

I found @francislavoie 's mail in his profile, will mail our prod config there, for investigation,
but NOT to be published in public here.

@francislavoie
Copy link
Member

Configs are not private information. Post your config here if you want to get help.

@christophcemper
Copy link
Author

Since we disagree on this, I guess we'll just keep counting how many times Caddy takes down a dozen production websites and services and backend-APIs down in the next days & weeks based on this "policy".

@christophcemper
Copy link
Author

In addition to that, I want to re-confirm, again,

Unfortunately, we could not reproduce the issue yet in any way, nor do we know how to isolate this down to the host/domain that is >causing this.

We have not yet tried to down-grade to 2.7.6. and see if that happens again with the cache module,
but then if it only occurs every 3 or so weeks randomly we're not even sure if that makes sense at this point.

We would kindly ask for help to enable/change to more helpful logging details for this than the raw unqualified go stack-trace

it is NOT reproducible, and I've not read any idea or approach how a probably data-driven race should be become reproducible,
if we don't add anything from the payload to some dozen pages of stack trace 🤷

any idea how the issue could be isolated?

IMO the crash handler/stacktrace should get at least minimal payload data,
i.e. which domain, which url, which redirect etc. otherwise we'll never find it I fear

@christophcemper
Copy link
Author

christophcemper commented Nov 11, 2024

Configs are not private information.

half of the config refers to internal hosts, ips, names, etc. that are nobody's business, for example,
and any such knowledge can help a threat actor puzzle together an attack vector 🤷

this is true for our config, and most other company's internal system service configs.

@francislavoie
Copy link
Member

francislavoie commented Nov 11, 2024

Since we disagree on this, I guess we'll just keep counting how many times Caddy takes down a dozen production websites and services and backend-APIs down in the next days & weeks based on this "policy".

That tone is insulting.

I have that policy because I'm a volunteer. I offer my time to the project in the open so that anyone who has similar issues may find their answers from the discussions and troubleshooting that happens in the open, in searchable/indexable parts of the internet. I have to draw the line somewhere to limit things and to keep a healthy open-source/life balance.

If you want private support then you should consider sponsoring the project https://caddyserver.com/support and Matt will be able to prioritize helping you.

@christophcemper
Copy link
Author

Good morning @francislavoie

Thanks for you fast responses, much appreciated (it was past midnight here in Austria).

I want to apologize if you felt my response insulting or somehow related to your personal efforts.
Both are much appreciated, and especially the quick response.
I wasn't aware that you were referring to your own personal policy, but assumed a too limited general project policy of Caddy.

I of course agree that the idea to keep as much as possible in the open helps everyone.

I still don't agree that revealing production config is something you can or should expect, always,
or that "(production) system service config is not private".

In this case, especially, we don't even know how to isolate & minimize the config down, which is another issue where I'll share my ideas.

As it stands now we're not even guessing, we're looking at a random stack trace together,
and the next 2 occurences may be end of November and at Xmas :-) great!

Next steps for us:

  1. Rollback to the stable 2.7.6+ Caddy we had that didn't completely take down prod repeatedly. Caddy must not become the next "Apache white screen", which it did, and that's very concerning to me as a Caddy fan.

  2. Find ways to make that naked stack trace more useful (with enrichment of logs)

  3. Possibly split up production services to more separate Caddy instances

Thanks also for the reminder on the sponsoring and paid support offers, we'll review those, too!

@christophcemper
Copy link
Author

christophcemper commented Nov 11, 2024

You're describing a race in map access. The odd thing is, the subject map is scoped per HTTP request:

I, for one, cannot imagine 2 HTTP requests sharing the same context, so maybe we're missing something.

I thought about this more @mohammed90 and have the following ideas/questions:

Q1: Even if the map is really always used only by 1 HTTP request and not shared, is the memory safe?

Q2: Are multiple go routines of handlers possibly accessing the map concurrently?

Q3: Can we ensure that we synchronize the map access?

Currently it is not synchronized. Concurrent map access without proper synchronization is a serious issue that can lead to memory corruption or at least outages that we manually need to cure.

We could perhaps use sync.Mutex, sync.RWMutex (Better for Read-Heavy Workloads), or sync.Map (Built-in Concurrent Map).

Q4: Can we get more detailed info on where (which URL, host, etc) this is happening on?

I have some more suggestions on that here #6684 as it's a general problem with all panics/crashes of Caddy.

@christophcemper
Copy link
Author

christophcemper commented Nov 11, 2024

#6683 (comment)

The issue here is that multiple goroutines could be accessing the same context's vars map concurrently.

This can happen in several scenarios (which I am not all aware of personally yet, or verified if those are sync'ed already)

  • Middleware Chains: Multiple middleware could be reading variables concurrently
  • Async Handlers: Handlers that spawn goroutines might access vars from different goroutines
  • Route Matching: Multiple route matchers might read variables simultaneously
  • Common places where GetVar might be called concurrently:
    • Route matchers checking conditions
    • Middleware accessing shared variables
    • Error handlers accessing request context
    • Rewrite rules evaluating path variables
    • Authentication middleware checking user data

To fix this, we should modify the var map implementation to be thread-safe.

@francislavoie
Copy link
Member

The middleware chain should never be concurrent. It's synchronous. Request matchers being part of the middleware chain should always be synchronous. Error handlers can only happen after an error in the main route middleware chain returns an error, and therefore is also synchronous. Rewrites being part of the middleware chain are also synchronous. Same with authentication.

The only possible place is a spawned goroutine by a handler. I'm not aware of any in Caddy itself that does this and allows concurrent access to vars. You're also using plugins, one of those plugins may be doing that.

@christophcemper
Copy link
Author

The only possible place is a spawned goroutine by a handler. I'm not aware of any in Caddy itself that does this and allows concurrent access to vars. You're also using plugins, one of those plugins may be doing that.

Thanks for your insights and estimation, I agree, that's certainly possible.

We have now on QA system already reverted back to

CADDY_VERSION=v2.7.6
TLS_CONSUL_VERSION=v1.4.0
CACHE_HANDLER_VERSION=v0.12.0

because those plugins also have had quite some changes, in addition to a few changes we've identified already in the core.

For the core, from what we understood, also the reported vars.go has 4 revisions as diff between 2.7.6 and 2.8.4

  • changing/improving structure or SetVar / GetVar handling (still unsync'ed, not thread safe)
  • a new SetVar-like setter in ServeHTTP for special case http.auth.user.id but which actually uses a mapMutex
  • a new aliasing of "" and nil "Make nil values act as empty string instead of ""

@mholt
Copy link
Member

mholt commented Nov 11, 2024

I'm mostly away from my desk on the weekends, so I have yet to catch up on the whole discussion as it's only Monday morning here, but:

  • It's fairly obvious from inspection that what Francis said is correct: a handler is concurrently accessing the vars map during a request. (Each request has its own vars map.)
  • A simple "fix" is indeed possible as I mentioned, assuming the use of the proper APIs GetVar() and SetVar(), to use a mutex to sync access to the map.
  • However, I'm not sure that is most correct. A handler performing concurrent work on the request is usually the sign of a bug, thus concurrent access is not generally supported in HTTP handler chains.
  • The most correct thing to do would be to eliminate concurrency on shared state while handling the request.
  • Seeing your config and, if possible, the characteristics of the request (URI, headers, etc) would be helpful to know what other handlers are in the chain and which ones are being invoked.
  • Francis is a volunteer, as he says, and prefers that help given benefits the public community, so receiving information in private isn't helpful there.
  • As a full-time maintainer, I'm happy to offer help and receive through private venues, but I'm so busy that I can only afford to do this for sponsors (at least the Indie Pro tier, which offers some basic occasional email support). If you or your company are already a sponsor, I apologize, GitHub isn't showing the Sponsor badge for me with your account. You can sign up to be a sponsor at the tier that best fits your company, or post your config here in public. You can redact credentials though; and remember that hostnames are considered addresses, not secrets (we often need the actual domain strings to troubleshoot and trace code execution because there is logic depending on the characteristics of the actual hostnames).
  • You can enable the race detector when you build Caddy to pinpoint what is actually going on next time this happens. Use the -race flag, or, if using xcaddy, set XCADDY_RACE_DETECTOR=1. Note that this will be less performant but will print exactly where the concurrency is happening. I think it's unlikely to be happening in Caddy core modules (we don't use concurrency during a request handler chain, except technically in the reverse proxy for full duplex streaming; but anyway, I would be surprised if our code is triggering this since we haven't seen this before -- but I can't rule it out). The race detector will show us.

Anyway, I'm mostly repeating what Francis already said, but I hope this offers some clarity. If you can post the config (and ideally the request too) here, even if we can't reliably reproduce it, we can likely have a better sense of the problem and offer a viable solution rather than guessing. (We try not to guess here when possible.)

What is the full output of caddy build-info and caddy list-modules?

@mholt mholt added the needs info 📭 Requires more information label Nov 11, 2024
@christophcemper
Copy link
Author

christophcemper commented Nov 12, 2024

Thanks so much @mholt and @francislavoie for clarifications and context,
very helpful and I very much appreciate your fast responses, despite being so busy,
reconfirming our decision to go with Caddy.

  • Again no intention was made to ask for something you'd not (want to/can) provide, especially when volunteering Francis, big thanks and apologies for misunderstanding "policy" as something generic vs. personal.
  • Especially inner workings e.g. explained here
  • I fully understand the need to be able to replicate/identify the defect at hand here, and cannot stop thinking about how to collect that best.
  • OTH, in a Caddyfile(s) audit we've identified multiple very sensitive details/credentials, so I'll do more research how to make them "less private" by Caddy-methods, as we maybe just don't use that (like Ansible Vault etc) or write an issue with ideas/suggestions to "hide private/sensitive" info from Caddyfiles
  • We've already started testing with XCADDY_RACE_DETECTOR=1
  • We'll add a simple mutex sync and more logging details to our fork and report on findings, if any

we often need the actual domain strings to troubleshoot and trace code execution because there is logic depending on the characteristics of the actual hostnames

Yes, makes total sense.

RE:

(and ideally the request too)
I thought about a "HTTP Request Recorder & Replay" module/feature that would just write all requests of the last 24-48 days to a (powerful) database like Clickhouse cluster.

We could easily operate such a "Recorder" and be able to re-run requests not just for that but possible if combined with backup/recover PIT from the of the system(s) even create "real world black box tests" of reference systems in the future.

Does such a "HTTP Request Recorder & Replay" module/feature exist? If not we may build one actually.

Next,

  • I'll take the isolated crashing binary and give you the caddy build-info and caddy list-modules details.
    I love how well prepared Caddy is already also for reporting support data for defect here.
  • We'll downgrade PROD also and rollout a custom/fork build to production as mentioned here

@christophcemper
Copy link
Author

christophcemper commented Nov 12, 2024

We've also just signed up for the Sponsor Startup plan @mholt with our startup AIPRM now (almost empty repo)
thanks for all you do and have done so far with your teams, and so glad to have escaped the Apache/Nginx mess :-)

Image

@christophcemper
Copy link
Author

caddy version

v2.8.4 h1:q3pe0wpBj1OcHFZ3n/1nl4V4bxBrYoSoab7rL9BMYNk=

What is the full output of caddy build-info and caddy list-modules?

here we go

caddy build-info

go	go1.22.8
path	caddy
mod	caddy	(devel)	
dep	filippo.io/edwards25519	v1.1.0	h1:FNf4tywRC1HmFuKW5xopWpigGjJKiJSV0Cqo0cJWDaA=
dep	github.com/BurntSushi/toml	v1.3.2	h1:o7IhLm0Msx3BaB+n3Ag7L8EVlByGnpq14C4YWiu/gL8=
dep	github.com/Masterminds/goutils	v1.1.1	h1:5nUrii3FMTL5diU80unEVvNevw1nH4+ZV4DSLVJLSYI=
dep	github.com/Masterminds/semver/v3	v3.2.0	h1:3MEsd0SM6jqZojhjLWWeBY+Kcjy9i6MQAeY7YgDP83g=
dep	github.com/Masterminds/sprig/v3	v3.2.3	h1:eL2fZNezLomi0uOLqjQoN6BfsDD+fyLtgbJMAj9n6YA=
dep	github.com/alecthomas/chroma/v2	v2.13.0	h1:VP72+99Fb2zEcYM0MeaWJmV+xQvz5v5cxRHd+ooU1lI=
dep	github.com/antlr4-go/antlr/v4	v4.13.0	h1:lxCg3LAv+EUK6t1i0y1V6/SLeUi0eKEKdhQAlS8TVTI=
dep	github.com/armon/go-metrics	v0.4.1	h1:hR91U9KYmb6bLBYLQjyM+3j+rcd/UhE+G78SFnF8gJA=
dep	github.com/aryann/difflib	v0.0.0-20210328193216-ff5ff6dc229b	h1:uUXgbcPDK3KpW29o4iy7GtuappbWT0l5NaMo9H9pJDw=
dep	github.com/beorn7/perks	v1.0.1	h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
dep	github.com/buger/jsonparser	v1.1.1	h1:2PnMjfWD7wBILjqQbt530v576A/cAbQvEW9gGIpYMUs=
dep	github.com/caddyserver/cache-handler	v0.14.0	h1:AVHqBcbvpiWbM3DRnnDHkHDC7t5JP+KCV6uc358NVkA=
dep	github.com/caddyserver/caddy/v2	v2.8.4	h1:q3pe0wpBj1OcHFZ3n/1nl4V4bxBrYoSoab7rL9BMYNk=
dep	github.com/caddyserver/certmagic	v0.21.3	h1:pqRRry3yuB4CWBVq9+cUqu+Y6E2z8TswbhNx1AZeYm0=
dep	github.com/caddyserver/transform-encoder	v0.0.0-20240312163748-f627fc4f7633	h1:RDs0Ef8EI4+NnQMlIrQNM38u7+6n3M3VOnamC2vZG0k=
dep	github.com/caddyserver/zerossl	v0.1.3	h1:onS+pxp3M8HnHpN5MMbOMyNjmTheJyWRaZYwn+YTAyA=
dep	github.com/cenkalti/backoff/v4	v4.2.1	h1:y4OZtCnogmCPw98Zjyt5a6+QwPLGkiQsYW5oUqylYbM=
dep	github.com/cespare/xxhash/v2	v2.3.0	h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs=
dep	github.com/christophcemper/caddy-netlify-redirects	v0.2.4	h1:rdLR8E5D7jc0jujxhbZIdbCjsmC9AgeeyUQBLu0hYFY=
dep	github.com/chzyer/readline	v1.5.1	h1:upd/6fQk4src78LMRzh5vItIt361/o4uq553V8B5sGI=
dep	github.com/cpuguy83/go-md2man/v2	v2.0.3	h1:qMCsGGgs+MAzDFyp9LpAe1Lqy/fY/qCovCm0qnXZOBM=
dep	github.com/darkweak/go-esi	v0.0.5	h1:b9LHI8Tz46R+i6p8avKPHAIBRQUCZDebNmKm5w/Zrns=
dep	github.com/darkweak/souin	v1.7.0	h1:QeSxwHECzZPlYHTGYDw4xQ6EBJY94f/nfqW4BLc3YQ0=
dep	github.com/darkweak/storages/core	v0.0.8	h1:9e7rOxHiJwnvADDVCZ7LFRnUnOHGT+UMpNOFlR8BOiw=
dep	github.com/dlclark/regexp2	v1.11.0	h1:G/nrcoOa7ZXlpoa/91N3X7mM3r8eIlMBBJZvsz/mxKI=
dep	github.com/dustin/go-humanize	v1.0.1	h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
dep	github.com/fatih/color	v1.17.0	h1:GlRw1BRJxkpqUCBKzKOw098ed57fEsKeNjpTe3cSjK4=
dep	github.com/felixge/httpsnoop	v1.0.4	h1:NFTV2Zj1bL4mc9sqWACXbQFVBBg2W3GPvqp8/ESS2Wg=
dep	github.com/fxamacker/cbor/v2	v2.6.0	h1:sU6J2usfADwWlYDAFhZBQ6TnLFBHxgesMrQfQgk1tWA=
dep	github.com/go-chi/chi/v5	v5.0.12	h1:9euLV5sTrTNTRUU9POmDUvfxyj6LAABLUcEWO+JJb4s=
dep	github.com/go-jose/go-jose/v3	v3.0.3	h1:fFKWeig/irsp7XD2zBxvnmA/XaRWp5V3CBsZXJF7G7k=
dep	github.com/go-kit/kit	v0.13.0	h1:OoneCcHKHQ03LfBpoQCUfCluwd2Vt3ohz+kvbJneZAU=
dep	github.com/go-kit/log	v0.2.1	h1:MRVx0/zhvdseW+Gza6N9rVzU/IVzaeE1SFI4raAhmBU=
dep	github.com/go-logfmt/logfmt	v0.6.0	h1:wGYYu3uicYdqXVgoYbvnkrPVXkuLM1p1ifugDMEdRi4=
dep	github.com/go-logr/logr	v1.4.2	h1:6pFjapn8bFcIbiKo3XT4j/BhANplGihG6tvd+8rYgrY=
dep	github.com/go-logr/stdr	v1.2.2	h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag=
dep	github.com/go-sql-driver/mysql	v1.7.1	h1:lUIinVbN1DY0xBg0eMOzmmtGoHwWBbvnWubQUrtU8EI=
dep	github.com/google/cel-go	v0.20.1	h1:nDx9r8S3L4pE61eDdt8igGj8rf5kjYR3ILxWIpWNi84=
dep	github.com/google/certificate-transparency-go	v1.1.8-0.20240110162603-74a5dd331745	h1:heyoXNxkRT155x4jTAiSv5BVSVkueifPUm+Q8LUXMRo=
dep	github.com/google/go-tpm	v0.9.0	h1:sQF6YqWMi+SCXpsmS3fd21oPy/vSddwZry4JnmltHVk=
dep	github.com/google/go-tspi	v0.3.0	h1:ADtq8RKfP+jrTyIWIZDIYcKOMecRqNJFOew2IT0Inus=
dep	github.com/google/uuid	v1.6.0	h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
dep	github.com/grpc-ecosystem/grpc-gateway/v2	v2.18.0	h1:RtRsiaGvWxcwd8y3BiRZxsylPT8hLWZ5SPcfI+3IDNk=
dep	github.com/hashicorp/consul/api	v1.29.4	h1:P6slzxDLBOxUSj3fWo2o65VuKtbtOXFi7TSSgtXutuE=
dep	github.com/hashicorp/errwrap	v1.1.0	h1:OxrOeh75EUXMY8TBjag2fzXGZ40LB6IKw45YeGUDY2I=
dep	github.com/hashicorp/go-cleanhttp	v0.5.2	h1:035FKYIWjmULyFRBKPs8TBQoi0x6d9G4xc9neXJWAZQ=
dep	github.com/hashicorp/go-hclog	v1.6.3	h1:Qr2kF+eVWjTiYmU7Y31tYlP1h0q/X3Nl3tPGdaB11/k=
dep	github.com/hashicorp/go-immutable-radix	v1.3.1	h1:DKHmCUm2hRBK510BaiZlwvpD40f8bJFeZnpfm2KLowc=
dep	github.com/hashicorp/go-multierror	v1.1.1	h1:H5DkEtf6CXdFp0N0Em5UCwQpXMWke8IA0+lD48awMYo=
dep	github.com/hashicorp/go-rootcerts	v1.0.2	h1:jzhAVGtqPKbwpyCPELlgNWhE1znq+qwJtW5Oi2viEzc=
dep	github.com/hashicorp/golang-lru	v1.0.2	h1:dV3g9Z/unq5DpblPpw+Oqcv4dU/1omnb4Ok8iPY6p1c=
dep	github.com/hashicorp/serf	v0.10.1	h1:Z1H2J60yRKvfDYAOZLd2MU0ND4AH/WDz7xYHDWQsIPY=
dep	github.com/huandu/xstrings	v1.3.3	h1:/Gcsuc1x8JVbJ9/rlye4xZnVAbEkGauT8lbebqcQws4=
dep	github.com/imdario/mergo	v0.3.13	h1:lFzP57bqS/wsqKssCGmtLAb8A0wKjLGrve2q3PPVcBk=
dep	github.com/jackc/chunkreader/v2	v2.0.1	h1:i+RDz65UE+mmpjTfyz0MoVTnzeYxroil2G82ki7MGG8=
dep	github.com/jackc/pgconn	v1.14.3	h1:bVoTr12EGANZz66nZPkMInAV/KHD2TxH9npjXXgiB3w=
dep	github.com/jackc/pgio	v1.0.0	h1:g12B9UwVnzGhueNavwioyEEpAmqMe1E/BN9ES+8ovkE=
dep	github.com/jackc/pgpassfile	v1.0.0	h1:/6Hmqy13Ss2zCq62VdNG8tM1wchn8zjSGOBJ6icpsIM=
dep	github.com/jackc/pgproto3/v2	v2.3.3	h1:1HLSx5H+tXR9pW3in3zaztoEwQYRC9SQaYUHjTSUOag=
dep	github.com/jackc/pgservicefile	v0.0.0-20221227161230-091c0ba34f0a	h1:bbPeKD0xmW/Y25WS6cokEszi5g+S0QxI/d45PkRi7Nk=
dep	github.com/jackc/pgtype	v1.14.0	h1:y+xUdabmyMkJLyApYuPj38mW+aAIqCe5uuBB51rH3Vw=
dep	github.com/jackc/pgx/v4	v4.18.3	h1:dE2/TrEsGX3RBprb3qryqSV9Y60iZN1C6i8IrmW9/BA=
dep	github.com/klauspost/compress	v1.17.9	h1:6KIumPrER1LHsvBVuDa0r5xaG0Es51mhhB9BQB2qeMA=
dep	github.com/klauspost/cpuid/v2	v2.2.8	h1:+StwCXwm9PdpiEkPyzBXIy+M9KUb4ODm0Zarf1kS5BM=
dep	github.com/libdns/libdns	v0.2.2	h1:O6ws7bAfRPaBsgAYt8MDe2HcNBGC29hkZ9MX2eUSX3s=
dep	github.com/manifoldco/promptui	v0.9.0	h1:3V4HzJk1TtXW1MTZMP7mdlwbBpIinw3HztaIlYthEiA=
dep	github.com/mattn/go-colorable	v0.1.13	h1:fFA4WZxdEF4tXPZVKMLwD8oUnCTTo08duU7wxecdEvA=
dep	github.com/mattn/go-isatty	v0.0.20	h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
dep	github.com/mgutz/ansi	v0.0.0-20200706080929-d51e80ef957d	h1:5PJl274Y63IEHC+7izoQE9x6ikvDFZS2mDVS3drnohI=
dep	github.com/mholt/acmez/v2	v2.0.2	h1:OmK6xckte2JfKGPz4OAA8aNHTiLvGp8tLzmrd/wfSyw=
dep	github.com/miekg/dns	v1.1.62	h1:cN8OuEF1/x5Rq6Np+h1epln8OiyPWV+lROx9LxcGgIQ=
dep	github.com/mitchellh/copystructure	v1.2.0	h1:vpKXTN4ewci03Vljg/q9QvCGUDttBOGBIa15WveJJGw=
dep	github.com/mitchellh/go-ps	v1.0.0	h1:i6ampVEEF4wQFF+bkYfwYgY+F/uYJDktmvLPf7qIgjc=
dep	github.com/mitchellh/mapstructure	v1.5.0	h1:jeMsZIYE/09sWLaz43PL7Gy6RuMjD2eJVyuac5Z2hdY=
dep	github.com/mitchellh/reflectwalk	v1.0.2	h1:G2LzWKi524PWgd3mLHV8Y5k7s6XUvT0Gef6zxSIeXaQ=
dep	github.com/munnerz/goautoneg	v0.0.0-20191010083416-a7dc8b61c822	h1:C3w9PqII01/Oq1c1nUAm88MOHcQC9l5mIlSMApZMrHA=
dep	github.com/pierrec/lz4/v4	v4.1.21	h1:yOVMLb6qSIDP67pl/5F7RepeKYu/VmTyEXvuMI5d9mQ=
dep	github.com/pires/go-proxyproto	v0.7.0	h1:IukmRewDQFWC7kfnb66CSomk2q/seBuilHBYFwyq0Hs=
dep	github.com/pkg/errors	v0.9.1	h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=
dep	github.com/pquerna/cachecontrol	v0.2.0	h1:vBXSNuE5MYP9IJ5kjsdo8uq+w41jSPgvba2DEnkRx9k=
dep	github.com/prometheus/client_golang	v1.20.3	h1:oPksm4K8B+Vt35tUhw6GbSNSgVlVSBH0qELP/7u83l4=
dep	github.com/prometheus/client_model	v0.6.1	h1:ZKSh/rekM+n3CeS952MLRAdFwIKqeY8b62p8ais2e9E=
dep	github.com/prometheus/common	v0.59.1	h1:LXb1quJHWm1P6wq/U824uxYi4Sg0oGvNeUm1z5dJoX0=
dep	github.com/prometheus/procfs	v0.15.1	h1:YagwOFzUgYfKKHX6Dr+sHT7km/hxC76UB0learggepc=
dep	github.com/pteich/caddy-tlsconsul	v1.5.0	h1:APUZlnlvsuYn3JylwJYavJ6+71HW7pX84c2TRgF90vc=
dep	github.com/quic-go/qpack	v0.5.1	h1:giqksBPnT/HDtZ6VhtFKgoLOWmlyo9Ei6u9PqzIMbhI=
dep	github.com/quic-go/quic-go	v0.47.0	h1:yXs3v7r2bm1wmPTYNLKAAJTHMYkPEsfYJmTazXrCZ7Y=
dep	github.com/rs/xid	v1.5.0	h1:mKX4bl4iPYJtEIxp6CYiUuLQ/8DYMoz0PUdtGgMFRVc=
dep	github.com/russross/blackfriday/v2	v2.1.0	h1:JIOH55/0cWyOuilr9/qlrm0BSXldqnqwMsf35Ld67mk=
dep	github.com/shopspring/decimal	v1.2.0	h1:abSATXmQEYyShuxI4/vyW3tV1MrKAJzCZ/0zLUXYbsQ=
dep	github.com/shurcooL/sanitized_anchor_name	v1.0.0	h1:PdmoCO6wvbs+7yrJyMORt4/BmY5IYyJwS/kOiWx8mHo=
dep	github.com/sirupsen/logrus	v1.9.3	h1:dueUQJ1C2q9oE3F7wvmSGAaVtTmUizReu6fjN8uqzbQ=
dep	github.com/slackhq/nebula	v1.6.1	h1:/OCTR3abj0Sbf2nGoLUrdDXImrCv0ZVFpVPP5qa0DsM=
dep	github.com/smallstep/certificates	v0.26.1	h1:FIUliEBcExSfJJDhRFA/s8aZgMIFuorexnRSKQd884o=
dep	github.com/smallstep/go-attestation	v0.4.4-0.20240109183208-413678f90935	h1:kjYvkvS/Wdy0PVRDUAA0gGJIVSEZYhiAJtfwYgOYoGA=
dep	github.com/smallstep/nosql	v0.6.1	h1:X8IBZFTRIp1gmuf23ne/jlD/BWKJtDQbtatxEn7Et1Y=
dep	github.com/smallstep/pkcs7	v0.0.0-20231024181729-3b98ecc1ca81	h1:B6cED3iLJTgxpdh4tuqByDjRRKan2EvtnOfHr2zHJVg=
dep	github.com/smallstep/scep	v0.0.0-20231024192529-aee96d7ad34d	h1:06LUHn4Ia2X6syjIaCMNaXXDNdU+1N/oOHynJbWgpXw=
dep	github.com/smallstep/truststore	v0.13.0	h1:90if9htAOblavbMeWlqNLnO9bsjjgVv2hQeQJCi/py4=
dep	github.com/spf13/cast	v1.4.1	h1:s0hze+J0196ZfEMTs80N7UlFt0BDuQ7Q+JDnHiMWKdA=
dep	github.com/spf13/cobra	v1.8.0	h1:7aJaZx1B85qltLMc546zn58BxxfZdR/W22ej9CFoEf0=
dep	github.com/spf13/pflag	v1.0.5	h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA=
dep	github.com/stoewer/go-strcase	v1.2.0	h1:Z2iHWqGXH00XYgqDmNgQbIBxf3wrNq0F3feEy0ainaU=
dep	github.com/tailscale/tscert	v0.0.0-20240517230440-bbccfbf48933	h1:pV0H+XIvFoP7pl1MRtyPXh5hqoxB5I7snOtTHgrn6HU=
dep	github.com/tj/go-redirects	v0.0.0-20200911105812-fd1ba1020b37	h1:K11tjwz8zTTSZkz4TUjfLN+y8uJWP38BbyPqZ2yB/Yk=
dep	github.com/ucarion/urlpath	v0.0.0-20200424170820-7ccc79b76bbb	h1:Ywfo8sUltxogBpFuMOFRrrSifO788kAFxmvVw31PtQQ=
dep	github.com/urfave/cli	v1.22.14	h1:ebbhrRiGK2i4naQJr+1Xj92HXZCrK7MsyTS/ob3HnAk=
dep	github.com/x448/float16	v0.8.4	h1:qLwI1I70+NjRFUR3zs1JPUCgaCXSh3SW62uAKT1mSBM=
dep	github.com/yuin/goldmark	v1.7.1	h1:3bajkSilaCbjdKVsKdZjZCLBNPL9pYzrCakKaf4U49U=
dep	github.com/yuin/goldmark-highlighting/v2	v2.0.0-20230729083705-37449abec8cc	h1:+IAOyRda+RLrxa1WC7umKOZRsGq4QrFFMYApOeHzQwQ=
dep	github.com/zeebo/blake3	v0.2.4	h1:KYQPkhpRtcqh0ssGYcKLG1JYvddkEA8QwCM/yBqhaZI=
dep	go.etcd.io/bbolt	v1.3.9	h1:8x7aARPEXiXbHmtUwAIv7eV2fQFHrLLavdiJ3uzJXoI=
dep	go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp	v0.49.0	h1:jq9TW8u3so/bN+JPT166wjOI6/vQPF6Xe7nMNIltagk=
dep	go.opentelemetry.io/contrib/propagators/autoprop	v0.42.0	h1:s2RzYOAqHVgG23q8fPWYChobUoZM6rJZ98EnylJr66w=
dep	go.opentelemetry.io/contrib/propagators/aws	v1.17.0	h1:IX8d7l2uRw61BlmZBOTQFaK+y22j6vytMVTs9wFrO+c=
dep	go.opentelemetry.io/contrib/propagators/b3	v1.17.0	h1:ImOVvHnku8jijXqkwCSyYKRDt2YrnGXD4BbhcpfbfJo=
dep	go.opentelemetry.io/contrib/propagators/jaeger	v1.17.0	h1:Zbpbmwav32Ea5jSotpmkWEl3a6Xvd4tw/3xxGO1i05Y=
dep	go.opentelemetry.io/contrib/propagators/ot	v1.17.0	h1:ufo2Vsz8l76eI47jFjuVyjyB3Ae2DmfiCV/o6Vc8ii0=
dep	go.opentelemetry.io/otel	v1.24.0	h1:0LAOdjNmQeSTzGBzduGe/rU4tZhMwL5rWgtp9Ku5Jfo=
dep	go.opentelemetry.io/otel/exporters/otlp/otlptrace	v1.21.0	h1:cl5P5/GIfFh4t6xyruOgJP5QiA1pw4fYYdv6nc6CBWw=
dep	go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc	v1.21.0	h1:tIqheXEFWAZ7O8A7m+J0aPTmpJN3YQ7qetUAdkkkKpk=
dep	go.opentelemetry.io/otel/metric	v1.24.0	h1:6EhoGWWK28x1fbpA4tYTOWBkPefTDQnb8WSGXlc88kI=
dep	go.opentelemetry.io/otel/sdk	v1.21.0	h1:FTt8qirL1EysG6sTQRZ5TokkU8d0ugCj8htOgThZXQ8=
dep	go.opentelemetry.io/otel/trace	v1.24.0	h1:CsKnnL4dUAr/0llH9FKuc698G04IrpWV0MQA/Y1YELI=
dep	go.opentelemetry.io/proto/otlp	v1.0.0	h1:T0TX0tmXU8a3CbNXzEKGeU5mIVOdf0oykP+u2lIVU/I=
dep	go.step.sm/cli-utils	v0.9.0	h1:55jYcsQbnArNqepZyAwcato6Zy2MoZDRkWW+jF+aPfQ=
dep	go.step.sm/crypto	v0.45.0	h1:Z0WYAaaOYrJmKP9sJkPW+6wy3pgN3Ija8ek/D4serjc=
dep	go.step.sm/linkedca	v0.20.1	h1:bHDn1+UG1NgRrERkWbbCiAIvv4lD5NOFaswPDTyO5vU=
dep	go.uber.org/automaxprocs	v1.5.3	h1:kWazyxZUrS3Gs4qUpbwo5kEIMGe/DAvi5Z4tl2NW4j8=
dep	go.uber.org/multierr	v1.11.0	h1:blXXJkSxSSfBVBlC76pxqeO+LN3aDfLQo+309xJstO0=
dep	go.uber.org/zap	v1.27.0	h1:aJMhYGrd5QSmlpLMr2MftRKl7t8J8PTZPA732ud/XR8=
dep	go.uber.org/zap/exp	v0.2.0	h1:FtGenNNeCATRB3CmB/yEUnjEFeJWpB/pMcy7e2bKPYs=
dep	golang.org/x/crypto	v0.27.0	h1:GXm2NjJrPaiv/h1tb2UH8QfgC/hOf/+z0p6PT8o1w7A=
dep	golang.org/x/crypto/x509roots/fallback	v0.0.0-20240507223354-67b13616a595	h1:TgSqweA595vD0Zt86JzLv3Pb/syKg8gd5KMGGbJPYFw=
dep	golang.org/x/exp	v0.0.0-20240904232852-e7e105dedf7e	h1:I88y4caeGeuDQxgdoFPUq097j7kNfw6uvuiNxUBfcBk=
dep	golang.org/x/net	v0.29.0	h1:5ORfpBpCs4HzDYoodCDBbwHzdR5UrLBZ3sOnUJmFoHo=
dep	golang.org/x/sync	v0.8.0	h1:3NFvSEYkUoMifnESzZl15y791HH1qU2xm6eCJU5ZPXQ=
dep	golang.org/x/sys	v0.25.0	h1:r+8e+loiHxRqhXVl6ML1nO3l1+oFoWbnlu2Ehimmi34=
dep	golang.org/x/term	v0.24.0	h1:Mh5cbb+Zk2hqqXNO7S1iTjEphVL+jb8ZWaqh/g+JWkM=
dep	golang.org/x/text	v0.18.0	h1:XvMDiNzPAl0jr17s6W9lcaIhGUfUORdGCNsuLmPG224=
dep	golang.org/x/time	v0.6.0	h1:eTDhh4ZXt5Qf0augr54TN6suAUudPcawVZeIAPU7D4U=
dep	google.golang.org/genproto/googleapis/api	v0.0.0-20240506185236-b8a5c65736ae	h1:AH34z6WAGVNkllnKs5raNq3yRq93VnjBG6rpfub/jYk=
dep	google.golang.org/genproto/googleapis/rpc	v0.0.0-20240429193739-8cf5692501f6	h1:DujSIu+2tC9Ht0aPNA7jgj23Iq8Ewi5sgkQ++wdvonE=
dep	google.golang.org/grpc	v1.63.2	h1:MUeiw1B2maTVZthpU5xvASfTh3LDbxHd6IJ6QQVU+xM=
dep	google.golang.org/protobuf	v1.34.2	h1:6xV6lTsCfpGD21XK49h7MhtcApnLqkfYgPcdHftf6hg=
dep	gopkg.in/natefinch/lumberjack.v2	v2.2.1	h1:bBRl1b0OH9s/DuPhuXpNl+VtCaJXFZ5/uEFST95x9zc=
dep	gopkg.in/yaml.v3	v3.0.1	h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
build	-buildmode=exe
build	-compiler=gc
build	-tags=nobadger
build	-trimpath=true
build	CGO_ENABLED=0
build	GOARCH=amd64
build	GOOS=linux
build	GOAMD64=v1

`caddy list-modules``

admin.api.load
admin.api.metrics
admin.api.pki
admin.api.reverse_proxy
caddy.adapters.caddyfile
caddy.config_loaders.http
caddy.filesystems
caddy.listeners.http_redirect
caddy.listeners.proxy_protocol
caddy.listeners.tls
caddy.logging.encoders.append
caddy.logging.encoders.console
caddy.logging.encoders.filter
caddy.logging.encoders.filter.cookie
caddy.logging.encoders.filter.delete
caddy.logging.encoders.filter.hash
caddy.logging.encoders.filter.ip_mask
caddy.logging.encoders.filter.query
caddy.logging.encoders.filter.regexp
caddy.logging.encoders.filter.rename
caddy.logging.encoders.filter.replace
caddy.logging.encoders.json
caddy.logging.writers.discard
caddy.logging.writers.file
caddy.logging.writers.net
caddy.logging.writers.stderr
caddy.logging.writers.stdout
caddy.storage.file_system
events
http
http.authentication.hashes.bcrypt
http.authentication.providers.http_basic
http.encoders.gzip
http.encoders.zstd
http.handlers.acme_server
http.handlers.authentication
http.handlers.copy_response
http.handlers.copy_response_headers
http.handlers.encode
http.handlers.error
http.handlers.file_server
http.handlers.headers
http.handlers.intercept
http.handlers.invoke
http.handlers.log_append
http.handlers.map
http.handlers.metrics
http.handlers.push
http.handlers.request_body
http.handlers.reverse_proxy
http.handlers.rewrite
http.handlers.static_response
http.handlers.subroute
http.handlers.templates
http.handlers.tracing
http.handlers.vars
http.ip_sources.static
http.matchers.client_ip
http.matchers.expression
http.matchers.file
http.matchers.header
http.matchers.header_regexp
http.matchers.host
http.matchers.method
http.matchers.not
http.matchers.path
http.matchers.path_regexp
http.matchers.protocol
http.matchers.query
http.matchers.remote_ip
http.matchers.vars
http.matchers.vars_regexp
http.precompressed.br
http.precompressed.gzip
http.precompressed.zstd
http.reverse_proxy.selection_policies.client_ip_hash
http.reverse_proxy.selection_policies.cookie
http.reverse_proxy.selection_policies.first
http.reverse_proxy.selection_policies.header
http.reverse_proxy.selection_policies.ip_hash
http.reverse_proxy.selection_policies.least_conn
http.reverse_proxy.selection_policies.query
http.reverse_proxy.selection_policies.random
http.reverse_proxy.selection_policies.random_choose
http.reverse_proxy.selection_policies.round_robin
http.reverse_proxy.selection_policies.uri_hash
http.reverse_proxy.selection_policies.weighted_round_robin
http.reverse_proxy.transport.fastcgi
http.reverse_proxy.transport.http
http.reverse_proxy.upstreams.a
http.reverse_proxy.upstreams.multi
http.reverse_proxy.upstreams.srv
pki
tls
tls.ca_pool.source.file
tls.ca_pool.source.http
tls.ca_pool.source.inline
tls.ca_pool.source.pki_intermediate
tls.ca_pool.source.pki_root
tls.ca_pool.source.storage
tls.certificates.automate
tls.certificates.load_files
tls.certificates.load_folders
tls.certificates.load_pem
tls.certificates.load_storage
tls.client_auth.verifier.leaf
tls.get_certificate.http
tls.get_certificate.tailscale
tls.handshake_match.local_ip
tls.handshake_match.remote_ip
tls.handshake_match.sni
tls.issuance.acme
tls.issuance.internal
tls.issuance.zerossl
tls.leaf_cert_loader.file
tls.leaf_cert_loader.folder
tls.leaf_cert_loader.pem
tls.leaf_cert_loader.storage
tls.permission.http
tls.stek.distributed
tls.stek.standard

  Standard modules: 121

cache
caddy.logging.encoders.formatted
caddy.logging.encoders.transform
caddy.storage.consul
http.handlers.cache
http.handlers.netlify_redirects

  Non-standard modules: 6

  Unknown modules: 0

@christophcemper
Copy link
Author

christophcemper commented Nov 12, 2024

we don't use concurrency during a request handler chain, except technically in the reverse proxy for full duplex streaming

This is a very good detail to know.

We use Caddy for a lot of reverse proxy work to internal services, probably 60% or more of all host use it.

We started using the http.handlers.cache via custom module build on top of latest 2.8.4. Oct 14, before we pinned it now back to 2.7.6

So that's worth diving deeper on:

1/ 60% or more of all hosts us reverse proxy
2/ a new addition to a host uses handle_path to apply both cache and reverse_proxy combined - first and only time so far, to achieve a caching reverse proxy for a 3rd party service.

It's been ~14 days until the first crash, but my current bet it on this specific addition,
and we'll try to isolate that to independent config, yet unclear how to reproduce the crash, but provide setup details.

Possibly the same crash would even arise using the backported build... 🤔

@mholt
Copy link
Member

mholt commented Nov 12, 2024

@christophcemper Thanks for your sponsorship and the additional details. If you want to email your config to matt at dyanim dot com that could be helpful.

Does such a "HTTP Request Recorder & Replay" module/feature exist? If not we may build one actually.

Nothing turnkey... that I know of.

I did a search in the non-standard modules listed there for GetVar and SetVar and VarsCtxKey, all clues that they might be using the vars middleware, but couldn't find any results.

This is a very good detail to know.

To clarify, the goroutines (concurrency) we use in the reverse proxy is literally only io.Copy() calls -- no use of vars middleware. I was just saying that because I was making the point that concurrency in handling HTTP requests is not typical and is often incorrect (with the exception of reverse proxy streaming, etc). Indeed, a module that is handling a request asynchronously (like, "let me run my logic in a goroutine while I continue to pass the request up the chain") is likely incorrect.

I just don't know where that's happening. A config could be helpful.

@christophcemper
Copy link
Author

christophcemper commented Nov 12, 2024

@mholt thanks for the insights and clarifications!

the goroutines (concurrency) we use in the reverse proxy is literally only io.Copy() calls

Turns out few minutes after I read this I managed to find a DATA RACE locally on my dev/fork of 2.7.6. in caddyhttp/responsewriter.go- will report separately.

Regarding the mentioned case, a combo of cache module + reverse proxy + header modification I have an isolated config and tes setup, as well as build commands and binaries used.

I'm posting here also, but will attach/mail ZIP with all (ca 40MB)
have the full report, binaries, example logs also in a folder structure here

Theory 1 - unusual combo of caching reverse proxy

CaddyfileDEV-defect-6683-map-read-write-crash-simple-theory1

# isolation of special use of cache + reverse_proxy 
# that is _maybe_ related or cause of defect issue 6883 "fatal **error:** concurrent map read and map write"
# occured first Oct 28m second Nov 10 on a very busy production cady instance with MANY MORE hosts 
# -- 
# global config
{
	https_port 443
	http_port 80

	# on PROD we use the consul storage backend
	# storage consul {
	# 	address      "127.0.0.1:8500"
	# 	token        ""
	# 	timeout      "10"
	# 	prefix       "caddytls"
	# 	value_prefix "NNN"
	# 	aes_key      "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN"
	# 	tls_enabled  "false"
	# 	tls_insecure "true"
	# }

	admin 127.0.0.1:2019

	order netlify_redirects before redir
	order netlify_redirects before handle

	order handle_errors after handle

	grace_period 5s

	order cache before handle

	# dev only
	order log after handle_path

	cache
}

## dfsd


# re-usable snippet to modify HTTP headers for all reverse_proxy blocks
(reverse-proxy-headers) {
	header_down -Server
	header_down -X-Powered-By
	header_down X-XSS-Protection "1; mode=block"
	header_down X-Content-Type-Options "nosniff"

	# requires configuration in Cloudflare to sanitize X-Forwarded-For - https://www.authelia.com/integration/proxies/fowarded-headers/#cloudflare
	# based on current list of Cloudflare IPs - https://www.cloudflare.com/en-gb/ips/ (2022-07-15)
	trusted_proxies 103.21.244.0/22 103.22.200.0/22 103.31.4.0/22 104.16.0.0/13 104.24.0.0/14 108.162.192.0/18 131.0.72.0/22 141.101.64.0/18 162.158.0.0/15 172.64.0.0/13 173.245.48.0/20 188.114.96.0/20 190.93.240.0/20 197.234.240.0/22 198.41.128.0/17 2400:cb00::/32 2606:4700::/32 2803:f800::/32 2405:b500::/32 2405:8100::/32 2a06:98c0::/29 2c0f:f248::/32
}

(remove-headers) {
	# remove X-Powered-By and Server response headers
	header -X-Powered-By
	header -Server

	# remove X-Powered-By and Server error response headers
	handle_errors {
		header -X-Powered-By
		header -Server
	}
}

(logging) {
	roll_size 1gb
	roll_keep 10
	roll_keep_for 720h

	level DEBUG
}

(common-file-server) {
	file_server
	encode zstd gzip

	try_files {path}

	# turn // into /
	uri path_regexp /{2,} /
}

localhost {
	# on prod we don't use internal tls
	tls internal

	log {
		# using hardcoded file here to avoid issues with relative paths or some placeholders, ENV expansions etc.
		output file ./log/defect-6683-map-read-write-crash-simple/theory1/access.log {
			import logging
		}
	}

	# only on DEV
	log debug {
		# using hardcoded file here to avoid issues with relative paths or some placeholders, ENV expansions etc.
		output file ./log/defect-6683-map-read-write-crash-simple/theory1/debug.log {
			level DEBUG
		}
	}

	# OAuth2 routes for Ory Hydra
	handle /oauth2/login {
		reverse_proxy localhost:8000
	}

	handle_path /caching-reverse-proxy/* {
		# needs new xCaddy LRTcaddy build
		cache {
			allowed_http_verbs GET POST
			log_level debug
			stale 200s
			ttl 1000s
			default_cache_control store
		}

		header -server
		header -x-powered-by

		reverse_proxy https://snippet.pricewell.io {
			header_up Host {upstream_hostport}
			# header_down -cache-control  # Remove any existing cache-control headers
			# header_down -pragma  # Remove any pragma headers
			# header_down -expires  # Remove any expires headers
			header_down -server
			header_down Cache-Control "public, max-age=300, must-revalidate"
			header_down Pragma "cache"
			header_down expect-ct "max-age=300"
		}
	}

	handle /oauth2/consent {
		reverse_proxy localhost:8000
	}

	handle_path /oauth2/admin/* {
		# only allow access to the admin interface from the internal network
		reverse_proxy * localhost:47144
	}

	handle_path /oauth2/* {
		reverse_proxy * localhost:48242
	}


	## reverse proxy the local Vite dev server
	reverse_proxy localhost:5173

	# on prod we use the load balancer like this
	# reverse_proxy local-view-node019:3100 local-view-node018:3100 local-view-node017:3100 local-view-node016:3100 local-view-node015:3100 local-view-node014:3100 local-view-node013:3100 local-view-node012:3100 local-view-node011:3100 local-view-node010:3100 {
	# 	import reverse-proxy-headers
	# 	lb_policy cookie XXXXXX NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
	# 	fail_duration 30s
	# }
}

dev-server.example.com {
	tls internal
	encode zstd gzip

	## the first API is not supported anymore
	respond /endpoint1/* "Gone" 410 {
		close
	}

	## reverse proxy the actual API
	handle_path /endpoint4/* {
		reverse_proxy 127.0.0.1:8099 {
			fail_duration 30s
		}
	}

	# for complete /csv/ and /json/ subfolders allow requests from any origin
	header /csv* Access-Control-Allow-Origin "*"
	header /json* Access-Control-Allow-Origin "*"

	import common-file-server
}

Makefile to run/build (but we used original amd64 binary also)


ISSUE=defect-6683-map-read-write-crash-simple
THEORY=theory1

CADDY_VERSION=v2.8.4
TLS_CONSUL_VERSION=v1.5.0
CACHE_HANDLER_VERSION=v0.14.0

RACE_DETECTOR ?= 0
RACE_FLAG=$(if $(filter 1,$(RACE_DETECTOR)),-race,)
RACE_FILENAME=$(if $(filter 1,$(RACE_DETECTOR)),-race,)
BIN_FILENAME_WITH_VERSIONS_AND_DATE=LRTcaddy-$(CADDY_VERSION)-$(TLS_CONSUL_VERSION)-$(CACHE_HANDLER_VERSION)$(RACE_FILENAME)-$(DATE)
BUILD_DIR=build_tmp

PLATFORM=mac
ifeq ($(shell uname -m),x86_64)
	PLATFORM=amd64
endif	

CADDYBIN=bin/$(BIN_FILENAME_WITH_VERSIONS_AND_DATE)-$(PLATFORM)

LOG_DIR=log/$(ISSUE)/$(THEORY)-$(PLATFORM)

all: clean info 
	@echo "run caddy$(THEORY) in background"
	make run &
	sleep 3	 
	ps aux | grep "$(CADDYBIN)"
	@echo "run tests after 10sec startup delay"
	@echo "waiting..."
	sleep 10 
	@echo "running tests..."
	make tests 
	ps aux | grep "$(CADDYBIN)"
	make kill-caddy

.PHONY: zip
zip:
	7z a $(ISSUE)-$(THEORY).zip .


.PHONY: kill-caddy
kill-caddy:
	pkill -f "$(CADDYBIN)"

run: caddy$(THEORY) 

.PHONY: clean
clean: 
	rm -rf ./log/

info:
	@echo "make caddy$(THEORY)"
	@echo "On    PLATFORM=$(PLATFORM)"
	@echo "Using CADDYBIN=$(CADDYBIN)"
	@echo "Building log/$(LOG_DIR)"
	mkdir -p ./$(LOG_DIR)
	@echo "Logging caddy version"
	$(CADDYBIN) version > ./$(LOG_DIR)/caddy-version-$(ISSUE).log
	@echo "Logging caddy list-modules"
	$(CADDYBIN) list-modules > ./$(LOG_DIR)/caddy-list-modules-$(ISSUE).log
	@echo "Logging caddy build-info"
	$(CADDYBIN) build-info > ./$(LOG_DIR)/caddy-build-info-$(ISSUE).log


build: build-mac

## for local testing only		
.PHONY: build-mac
build-mac:
	GOOS=darwin GOARCH=arm64 xcaddy build $(CADDY_VERSION) \
		--output bin/$(BIN_FILENAME_WITH_VERSIONS_AND_DATE)-mac \
		--with github.com/christophcemper/caddy-netlify-redirects@v0.2.4 \
		--with github.com/caddyserver/transform-encoder \
		--with github.com/pteich/caddy-tlsconsul@$(TLS_CONSUL_VERSION) \
		--with github.com/caddyserver/cache-handler@$(CACHE_HANDLER_VERSION)


.PHONY: caddy$(THEORY)
caddy$(THEORY):
	$(CADDYBIN) \
	run --config ./CaddyfileDEV-$(ISSUE)-$(THEORY) \
	--watch 2>&1 | tee ./$(LOG_DIR)/caddyDEV-stdouterr-$(ISSUE)-$(THEORY)-$(PLATFORM).log


tests: caddy$(THEORY)-case1 caddy$(THEORY)-case2 caddy$(THEORY)-case3

.PHONY: caddy$(THEORY)-case1
caddy$(THEORY)-case1:
	curl -X POST -I -v https://localhost/caching-reverse-proxy/8a4899c7-03f5-4dca-86f2-3249e5c33ca2?cache=false | tee ./$(LOG_DIR)/caddy$(THEORY)-case1-$(PLATFORM).log

.PHONY: caddy$(THEORY)-case2
caddy$(THEORY)-case2:
	curl -X POST -I -v https://localhost/caching-reverse-proxy/8a4899c7-03f5-4dca-86f2-3249e5c33ca2?cache=true | tee ./$(LOG_DIR)/caddy$(THEORY)-case2-$(PLATFORM).log

.PHONY: caddy$(THEORY)-case3
caddy$(THEORY)-case3:
	curl -X POST -I -v https://localhost/caching-reverse-proxy/8a4899c7-03f5-4dca-86f2-3249e5c33ca2?cache=true | tee ./$(LOG_DIR)/caddy$(THEORY)-case3-$(PLATFORM).log

Test results of the simple case1-3 should be 200s, successfully reverse-proxied
and look like this from the CURL output

HTTP/2 200 
access-control-allow-origin: *
age: 1
alt-svc: h3=":443"; ma=2592000
cache-control: public, max-age=300, must-revalidate
cache-status: Souin; hit; ttl=299; key=POST-https-localhost-/8a4899c7-03f5-4dca-86f2-3249e5c33ca2?cache=true; detail=DEFAULT
content-type: text/html; charset=utf-8
date: Tue, 12 Nov 2024 15:38:12 GMT
etag: W/"594d-bjtQXe335eTTFcR34hk8SwnjRLg"
expect-ct: max-age=300
pragma: cache
referrer-policy: no-referrer
strict-transport-security: max-age=15552000; includeSubDomains
vary: Accept-Encoding
x-content-type-options: nosniff
x-dns-prefetch-control: off
x-download-options: noopen
x-frame-options: SAMEORIGIN
x-permitted-cross-domain-policies: none
x-xss-protection: 0
content-length: 22861

the successful CURL also has some more. i.e. like this

curl -X POST -I -v https://localhost/caching-reverse-proxy/8a4899c7-03f5-4dca-86f2-3249e5c33ca2?cache=true | tee ./log/defect-6683-map-read-write-crash-simple/theory1-amd64/caddytheory1-case3-amd64.log

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 127.0.0.1:443...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [15 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [927 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [80 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [36 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [36 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: [NONE]
*  start date: Nov 12 17:02:07 2024 GMT
*  expire date: Nov 13 05:02:07 2024 GMT
*  subjectAltName: host "localhost" matched cert's "localhost"
*  issuer: CN=Caddy Local Authority - ECC Intermediate
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* Using Stream ID: 1 (easy handle 0x55584a8c3610)
} [5 bytes data]
> POST /caching-reverse-proxy/8a4899c7-03f5-4dca-86f2-3249e5c33ca2?cache=true HTTP/2
> Host: localhost
> user-agent: curl/7.68.0
> accept: */*
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [122 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
} [5 bytes data]
< HTTP/2 200
< access-control-allow-origin: *
< age: 2
< alt-svc: h3=":443"; ma=2592000
< cache-control: public, max-age=300, must-revalidate
< cache-status: Souin; hit; ttl=298; key=POST-https-localhost-/8a4899c7-03f5-4dca-86f2-3249e5c33ca2?cache=true; detail=DEFAULT
< content-type: text/html; charset=utf-8
< date: Tue, 12 Nov 2024 17:02:21 GMT
< etag: W/"594d-bjtQXe335eTTFcR34hk8SwnjRLg"
< expect-ct: max-age=300
< pragma: cache
< referrer-policy: no-referrer
< strict-transport-security: max-age=15552000; includeSubDomains
< vary: Accept-Encoding
< x-content-type-options: nosniff
< x-dns-prefetch-control: off
< x-download-options: noopen
< x-frame-options: SAMEORIGIN
< x-permitted-cross-domain-policies: none
< x-xss-protection: 0
< content-length: 22861
<
  0 22861    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
* Connection #0 to host localhost left intact

I hope this helps

@christophcemper
Copy link
Author

christophcemper commented Nov 12, 2024

FWIW, I failed to upload "defect-6683-map-read-write-crash-simple-theory1.zip that has it all with reference logs from mac and amd64 runs, with only 42MB
so I uploaded it all here

@francislavoie
Copy link
Member

Turns out few minutes after I read this I managed to find a DATA RACE locally on my dev/fork of 2.7.6.

Honestly, that's not that interesting. We're at 2.9.0-beta.3 now. If you're not testing against latest, your tests results are likely against stale code (you might be re-finding bugs that are already fixed).

@mholt
Copy link
Member

mholt commented Nov 12, 2024

Thanks; what is the output of the race detector though?

And... yeah, unfortunately a really old version like 2.7.6 might not be super helpful, but with the output of the race detector we can confirm either way.

@christophcemper
Copy link
Author

christophcemper commented Nov 13, 2024

I've spent quite some time today to dive deeper.

Yeah, old versions are less useful - unless they are more stable.

Remember we downgraded due to instable 2.8.4? Yes, I understand where you are coming from here, from the viewpoint of ongoing development. However, besides ongoing development which is great, there's an operational reality.

How stable will future versions be if the responsibility for stability is also passed on to add-on modules? I believe still not (as) stable at all (as we would like)

We are at version 2.7.6 for now, again,
until it's clarified how to run 2.8.4. stable,
and the defect(s) reported/discussed here fixed.

So for us, the 2.9.0-beta.3 couldn't be less interesting for us at the current moment.
I'd be happy to run a "stable" version of the "stable" 2.8.4. first instead of downgrading.

Blindly trusting the "newer version being better"-idea with 2.8.4. got us into the troubles we're in... OK, and adding that cache module, which gets more and more suspicious tbh.

re-finding bugs that are already fixed

I would hope so, so we can backport.

But I believe we need hardening all the way up with mutexes and timeouts thereof.

So it doesn't seem solved, so far in both 2.8 and 2.9 but I haven't tested those, still focussing on the "last known good one", and neither new releases ensure sync/lock of Vars or Replacer Middleware as I understand.

New things found out so far in my 2.7.6 fork

1/ Vars middleware now got an InstrumentedRWMutex - with timeout logs etc. using a new lib I wrote today

2/ Vars middleware is used wrong by reverse proxy it seems, or in combo reverse proxy + cache module - some of the reverse proxy go routines seem to remain/not get cleaned up.

3/ Replacer middleware is also suffering from DATA RACEs - and is not yet sync'ed by me

4/ Reverse Proxy / Streamer are involved in DATA RACEs. It appears that when http.handlers.reverse_proxy gives us "msg":"streaming error" then this is more likely

5/ Caddy Module: http.handlers.cache is based on Souin cache - but when you look at the code and ServeHTTP there, you find a lot of Go routines, channels, parallelism... not finished review, but it contradicts what I understood is expected as not being concurrent per HTTP request.

It appears this is where more troubles with unsynced/not locked Vars/Replacer middlewares and/or issues in Reverse Proxy / Streamer could combine.

I received multiple DATA RACEs only by manual 1-user browsing, with references to local source folders only... but I expect a lot more once I deploy all that to QA system that make more sense to share.

It's 2am again, so I end here, will provide more insights tomorrow.

@mohammed90
Copy link
Member

3/ Replacer middleware is also suffering from DATA RACEs - and is not yet sync'ed by me

I believe this was fixed in v2.8.x. as of commit e7336cc.

@christophcemper
Copy link
Author

I believe this was fixed in v2.8.x. as of commit e7336cc.

Thank you @mohammed90 - that's certainly worth investigating.

I spent the whole day rewriting my rwmutexplus plus for this situation, so I can trace lock contention/timeouts etc. much better now.

Everything I found was just re-confirmed, but in nicer, more readable logs (except when you turn up VerboseLevel to 4, then the Stdout/err explodes, too).

I don't understand why but even on pre-prod hardware the Vars Middleware is really slow, and certainly suffers from concurrency that wasn't handled so far.

It turns out that many shared read locks hang in there for 50+ ms (not micro, milli, like network pings!) for really trivial vars to get like "client_ip", many times over and over, even when only few humans operate on that webserver.

Image

Also the matchers.error module is similar slow.

We have in there

Average hold time: 1.367592ms
Max hold time: 1.70286ms
Average wait time: 1.146163ms

and the client_ip one similar... all these millis add up, and using 20 x 1+ milli to get the client_ip for 1 request really leaves room to optmize

Purpose: GetVar client_ip
  Total acquired: 20
  Total hold time: 22.287346ms
  Total wait time: 18.610971ms
  Total wait events: 20
  Total timeouts read lock: 0
  Total timeouts write lock: 0
  Average hold time: 1.114367ms
  Max hold time: 1.859056ms
  Average wait time: 930.548µs
  Max wait time: 1.305851ms

All these map accesses of course can cause that fatal if not coordinated... but apparently not very often, luckily 😅

Image

here's how I build currently, in case you're interested

I'll checkout that hint for the replace, thanks!

@christophcemper
Copy link
Author

christophcemper commented Nov 14, 2024

@mohammed90 I looked into your commit and see how you implemented a mutex for that library (not reviewed/tested for conflicts, memory leaks, etc..)

However, the problems are still elsewhere where there's unsync'ed code accessing the old way via Maps pulled directly from the Context object without any protection/sync.

repl := r.Context().Value(caddy.ReplacerCtxKey).(*caddy.Replacer)

I know this place well, because just 1 line above we have a very expensive/slow access to the Vars map actually (but done unsync'ed here)

That is already changed to rwsyncplus in my fork here.

...and it shows how the timings are off across the code, same for the repl I guess.

@christophcemper
Copy link
Author

christophcemper commented Nov 14, 2024

Last words: I merged the improvements in Var Middleware up to 2.8.4.
(to private fork branch v2.8.4-rwmutexplus2)
and can confirm we still have that same issue there

{"level":"info","ts":1731550679.250164,"msg":"[VarsRWMutex] WARNING #1: HOLDING ACTIVE WriteLock took 50 - 1x exceeding timeout of 50ms - for 'ContextWithVars map[client_ip:127.0.0.1 trusted_proxy:false]' (goroutine 1161)"}
{"level":"info","ts":1731550679.254308,"msg":"[VarsRWMutex] WARNING #1: HOLDING ReadLock (1 readers)\n ACTIVE ReadLock took 50 - 1x exceeding timeout of 50ms - for 'GetVar client_ip' (goroutine 1161)"}
{"level":"info","ts":1731550679.254915,"msg":"[VarsRWMutex] WARNING #1: HOLDING ReadLock (1 readers)\n ACTIVE ReadLock took 50 - 1x exceeding timeout of 50ms - for 'GetVar client_ip' (goroutine 1161)"}
{"level":"info","ts":1731550679.255154,"msg":"[VarsRWMutex] WARNING #1: HOLDING ReadLock (1 readers)\n ACTIVE ReadLock took 50 - 1x exceeding timeout of 50ms - for 'GetVar client_ip' (goroutine 1161)"}
{"level":"info","ts":1731550679.25701,"msg":"[VarsRWMutex] WARNING #1: HOLDING ReadLock (1 readers)\n ACTIVE ReadLock took 51 - 1x exceeding timeout of 50ms - for 'GetVar client_ip' (goroutine 1161)"}
{"level":"info","ts":1731550679.26002,"msg":"[VarsRWMutex] WARNING #1: HOLDING ReadLock (1 readers)\n ACTIVE ReadLock took 50 - 1x exceeding timeout of 50ms - for 'GetVar matchers.error' (goroutine 1161)"}
{"level":"info","ts":1731550679.260961,"msg":"[VarsRWMutex] WARNING #1: HOLDING ReadLock (1 readers)\n ACTIVE ReadLock took 50 - 1x exceeding timeout of 50ms - for 'GetVar matchers.error' (goroutine 1161)"}
{"level":"info","ts":1731550679.265171,"msg":"[VarsRWMutex] WARNING #1: HOLDING ReadLock (1 readers)\n ACTIVE ReadLock took 50 - 1x exceeding timeout of 50ms - for 'GetVar matchers.error' (goroutine 1161)"}
{"level":"info","ts":1731550679.2679381,"msg":"[VarsRWMutex] WARNING #1: HOLDING ReadLock (1 readers)\n ACTIVE ReadLock took 50 - 1x exceeding timeout of 50ms - for 'GetVar matchers.error' (goroutine 1161)"}

On init the map is setup with a Write Lock, for client_ip and that proxy bool and takes over 50ms for it.

All the others reading the client_ip have to wait for the X lock, hence 50+ ms.

In all other versions out there, they access that map with whatever data it has. or not.

If THAT is not a concurrency issue wiithin just this goroutine 1161 that needs mutex'es I don't know what is.

The other question however is - what takes 50,000 micros to initialize that map there? for every request? 🤔

@christophcemper
Copy link
Author

This merge to v2.8.4-rwmutexplus2 is WIP and unreviewed, but runs at least on local dev

https://github.com/christophcemper/caddy/tree/v2.8.4-rwmutexplus2

@christophcemper
Copy link
Author

Here's one cache-handler/souin related DATA RACE where the binary is from a stable/released fork running on QA here already @mholt @mohammed90

darkweak/souin#576

@christophcemper
Copy link
Author

Afte having the Vars Middleware rwmutex'ed now in my fork, I could not help wonder why we wouldn't just use a standard sync.Map made for this, made for the "inmemory-cache" probem, specific to the 1 Write/Many Reads problem @mholt @francislavoie @mohammed90 ?

Yeah, it uses more memory, but so does manual lock management + code complexity/risk and those are still like inferior locks like SQLite full table locks on the full map, while sync.Map seems to support cache/hash-based ("row level locking") and should have much better overall latency performance, I believe - just like normal databases have over full table locking SQLite.

Can you maybe point me to some discussions/tests/insights from maybe many years ago to understand that background?

(don't get me wrong - I don't like locks, as any locks are the natural enemy of performance, but hour long outages are not a price to pay for performance)

@mholt
Copy link
Member

mholt commented Nov 14, 2024

The newer logs look nice, thanks for the updates -- but I'm still looking for the output of the race detector. I saw the one in souin, but it sounds like you're saying there is still a race in Caddy itself, and I can't diagnose it without the output of the race detector.

The reason I hesitate to add more mutexing to the vars handler is because handlers generally shouldn't be accessing that concurrently during a request. I'd like to pin down exactly what's happening using the race detector output.

@mohammed90
Copy link
Member

Adding to what Matt says and probably mentioned earlier, the output of the race detector, i.e. with -race, will say exactly where the concurrent read/write are happening. It eliminates the guess work you're currently doing.

@christophcemper
Copy link
Author

christophcemper commented Nov 14, 2024

Yes, I agree - here's one DATA RACE I just got when starting up on my Mac Studio @mholt

from this build version based on v2.8.4

{"level":"info","ts":1731625446.609313,"msg":"ENV: RWMutexPlus (VarsRWMutex, 134ms) - RWMUTEXTPLUS_VERBOSELEVEL (2)"}
{"level":"info","ts":1731625446.609357,"msg":"ENV: RWMutexPlus (VarsRWMutex, 134ms) - RWMUTEXTPLUS_DEBUGLEVEL (1)"}
{"level":"info","ts":1731625446.609385,"msg":"ENV: RWMutexPlus (VarsRWMutex, 134ms) - RWMUTEXTPLUS_TIMEOUT (50ms)"}
{"level":"info","ts":1731625446.6094022,"msg":"ENV: RWMutexPlus (VarsRWMutex, 50ms) - RWMUTEXTPLUS_CALLERINFOLINES (4)"}
{"level":"info","ts":1731625446.609427,"msg":"ENV: RWMutexPlus (VarsRWMutex, 50ms) - RWMUTEXTPLUS_CALLERINFOSKIP (4)"}
{"level":"info","ts":1731625446.609448,"msg":"ENV: RWMutexPlus (VarsRWMutex, 50ms) - RWMUTEXTPLUS_DEBUGLEVEL (1)"}
{"level":"info","ts":1731625446.609466,"msg":"ENV: RWMutexPlus (VarsRWMutex, 50ms) - RWMUTEXTPLUS_VERBOSELEVEL (2)"}
{"level":"info","ts":1731625446.610605,"msg":"using config from file","file":"./CaddyfileDEV"}
{"level":"info","ts":1731625446.7271528,"msg":"adapted config to JSON","adapter":"caddyfile"}
{"level":"warn","ts":1731625446.727189,"msg":"Caddyfile input is not formatted; run 'caddy fmt --overwrite' to fix inconsistencies","adapter":"caddyfile","file":"./CaddyfileDEV","line":185}
{"level":"info","ts":1731625446.818974,"msg":"redirected default logger","from":"stderr","to":"/Users/christophc/Workspace/AIPRM2/App/log/debug.log"}
==================
WARNING: DATA RACE
Read at 0x00c0009ae1b0 by goroutine 105:
  github.com/caddyserver/caddy/v2/modules/caddytls.(*TLS).CaddyModule()
      <autogenerated>:1 +0x38
  github.com/caddyserver/caddy/v2/modules/caddyevents.(*App).Emit()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/modules/caddyevents/app.go:228 +0x4cc
  github.com/caddyserver/caddy/v2/modules/caddytls.(*TLS).onEvent()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/modules/caddytls/tls.go:691 +0xe4
  github.com/caddyserver/caddy/v2/modules/caddytls.(*TLS).onEvent-fm()
      <autogenerated>:1 +0x28
  github.com/caddyserver/certmagic.(*Config).emit()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/config.go:1258 +0x6c4
  github.com/caddyserver/certmagic.(*Config).renewCert.func2()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/config.go:827 +0x454
  github.com/caddyserver/certmagic.doWithRetry()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/async.go:104 +0x214
  github.com/caddyserver/certmagic.(*Config).renewCert()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/config.go:974 +0x57c
  github.com/caddyserver/certmagic.(*Config).RenewCertAsync()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/config.go:762 +0x59c
  github.com/caddyserver/certmagic.(*Config).manageOne.func2()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/config.go:463 +0x5a0
  github.com/caddyserver/certmagic.(*jobManager).worker()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/async.go:73 +0x164
  github.com/caddyserver/certmagic.(*jobManager).Submit.func2()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/async.go:50 +0x34

Previous write at 0x00c0009ae1b0 by main goroutine:
  github.com/caddyserver/caddy/v2/modules/caddytls.(*TLS).keepStorageClean()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/modules/caddytls/tls.go:603 +0xcc
  github.com/caddyserver/caddy/v2/modules/caddytls.(*TLS).Start()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/modules/caddytls/tls.go:329 +0x234
  github.com/caddyserver/caddy/v2.run.func4()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:513 +0x138
  github.com/caddyserver/caddy/v2.run()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:529 +0x804
  github.com/caddyserver/caddy/v2.unsyncedDecodeAndRun()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:343 +0x234
  github.com/caddyserver/caddy/v2.changeConfig()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:234 +0x800
  github.com/caddyserver/caddy/v2.Load()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:133 +0xac
  github.com/caddyserver/caddy/v2/cmd.cmdRun()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/cmd/commandfuncs.go:231 +0x71c
  github.com/caddyserver/caddy/v2/cmd.init.1.func2.WrapCommandFuncForCobra.func1()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/cmd/cobra.go:137 +0x3c
  github.com/spf13/cobra.(*Command).execute()
      /Users/christophc/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983 +0xb18
  github.com/spf13/cobra.(*Command).ExecuteC()
      /Users/christophc/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115 +0x4b8
  github.com/spf13/cobra.(*Command).Execute()
      /Users/christophc/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039 +0x1bc
  github.com/caddyserver/caddy/v2/cmd.Main()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/cmd/main.go:75 +0x1a8
  main.main()
      /Users/christophc/Workspace/common/CaddyLRT/build_tmp/buildenv_2024-11-14-0312.1171402475/main.go:16 +0x20

Goroutine 105 (running) created at:
  github.com/caddyserver/certmagic.(*jobManager).Submit()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/async.go:50 +0x33c
  github.com/caddyserver/certmagic.(*Config).manageOne()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/config.go:481 +0x6d0
  github.com/caddyserver/certmagic.(*Config).manageAll()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/config.go:381 +0x21c
  github.com/caddyserver/certmagic.(*Config).ManageAsync()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/config.go:338 +0x2d8
  github.com/caddyserver/caddy/v2/modules/caddytls.(*TLS).Manage()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/modules/caddytls/tls.go:432 +0x268
  github.com/caddyserver/caddy/v2/modules/caddyhttp.(*App).automaticHTTPSPhase2()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/modules/caddyhttp/autohttps.go:743 +0x1ac
  github.com/caddyserver/caddy/v2/modules/caddyhttp.(*App).Start()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/modules/caddyhttp/app.go:535 +0x9b0
  github.com/caddyserver/caddy/v2.run.func4()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:513 +0x138
  github.com/caddyserver/caddy/v2.run()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:529 +0x804
  github.com/caddyserver/caddy/v2.unsyncedDecodeAndRun()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:343 +0x234
  github.com/caddyserver/caddy/v2.changeConfig()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:234 +0x800
  github.com/caddyserver/caddy/v2.Load()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:133 +0xac
  github.com/caddyserver/caddy/v2/cmd.cmdRun()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/cmd/commandfuncs.go:231 +0x71c
  github.com/caddyserver/caddy/v2/cmd.init.1.func2.WrapCommandFuncForCobra.func1()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/cmd/cobra.go:137 +0x3c
  github.com/spf13/cobra.(*Command).execute()
      /Users/christophc/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983 +0xb18
  github.com/spf13/cobra.(*Command).ExecuteC()
      /Users/christophc/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115 +0x4b8
  github.com/spf13/cobra.(*Command).Execute()
      /Users/christophc/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039 +0x1bc
  github.com/caddyserver/caddy/v2/cmd.Main()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/cmd/main.go:75 +0x1a8
  main.main()
      /Users/christophc/Workspace/common/CaddyLRT/build_tmp/buildenv_2024-11-14-0312.1171402475/main.go:16 +0x20
==================
==================
WARNING: DATA RACE
Read at 0x00c0009ae1b8 by goroutine 105:
  github.com/caddyserver/caddy/v2/modules/caddytls.(*TLS).CaddyModule()
      <autogenerated>:1 +0x38
  github.com/caddyserver/caddy/v2/modules/caddyevents.(*App).Emit()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/modules/caddyevents/app.go:271 +0xb80
  github.com/caddyserver/caddy/v2/modules/caddytls.(*TLS).onEvent()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/modules/caddytls/tls.go:691 +0xe4
  github.com/caddyserver/caddy/v2/modules/caddytls.(*TLS).onEvent-fm()
      <autogenerated>:1 +0x28
  github.com/caddyserver/certmagic.(*Config).emit()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/config.go:1258 +0x6c4
  github.com/caddyserver/certmagic.(*Config).renewCert.func2()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/config.go:827 +0x454
  github.com/caddyserver/certmagic.doWithRetry()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/async.go:104 +0x214
  github.com/caddyserver/certmagic.(*Config).renewCert()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/config.go:974 +0x57c
  github.com/caddyserver/certmagic.(*Config).RenewCertAsync()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/config.go:762 +0x59c
  github.com/caddyserver/certmagic.(*Config).manageOne.func2()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/config.go:463 +0x5a0
  github.com/caddyserver/certmagic.(*jobManager).worker()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/async.go:73 +0x164
  github.com/caddyserver/certmagic.(*jobManager).Submit.func2()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/async.go:50 +0x34

Previous write at 0x00c0009ae1b8 by main goroutine:
  github.com/caddyserver/caddy/v2/modules/caddytls.(*TLS).keepStorageClean()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/modules/caddytls/tls.go:604 +0x124
  github.com/caddyserver/caddy/v2/modules/caddytls.(*TLS).Start()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/modules/caddytls/tls.go:329 +0x234
  github.com/caddyserver/caddy/v2.run.func4()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:513 +0x138
  github.com/caddyserver/caddy/v2.run()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:529 +0x804
  github.com/caddyserver/caddy/v2.unsyncedDecodeAndRun()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:343 +0x234
  github.com/caddyserver/caddy/v2.changeConfig()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:234 +0x800
  github.com/caddyserver/caddy/v2.Load()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:133 +0xac
  github.com/caddyserver/caddy/v2/cmd.cmdRun()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/cmd/commandfuncs.go:231 +0x71c
  github.com/caddyserver/caddy/v2/cmd.init.1.func2.WrapCommandFuncForCobra.func1()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/cmd/cobra.go:137 +0x3c
  github.com/spf13/cobra.(*Command).execute()
      /Users/christophc/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983 +0xb18
  github.com/spf13/cobra.(*Command).ExecuteC()
      /Users/christophc/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115 +0x4b8
  github.com/spf13/cobra.(*Command).Execute()
      /Users/christophc/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039 +0x1bc
  github.com/caddyserver/caddy/v2/cmd.Main()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/cmd/main.go:75 +0x1a8
  main.main()
      /Users/christophc/Workspace/common/CaddyLRT/build_tmp/buildenv_2024-11-14-0312.1171402475/main.go:16 +0x20

Goroutine 105 (running) created at:
  github.com/caddyserver/certmagic.(*jobManager).Submit()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/async.go:50 +0x33c
  github.com/caddyserver/certmagic.(*Config).manageOne()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/config.go:481 +0x6d0
  github.com/caddyserver/certmagic.(*Config).manageAll()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/config.go:381 +0x21c
  github.com/caddyserver/certmagic.(*Config).ManageAsync()
      /Users/christophc/go/pkg/mod/github.com/caddyserver/certmagic@v0.21.3/config.go:338 +0x2d8
  github.com/caddyserver/caddy/v2/modules/caddytls.(*TLS).Manage()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/modules/caddytls/tls.go:432 +0x268
  github.com/caddyserver/caddy/v2/modules/caddyhttp.(*App).automaticHTTPSPhase2()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/modules/caddyhttp/autohttps.go:743 +0x1ac
  github.com/caddyserver/caddy/v2/modules/caddyhttp.(*App).Start()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/modules/caddyhttp/app.go:535 +0x9b0
  github.com/caddyserver/caddy/v2.run.func4()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:513 +0x138
  github.com/caddyserver/caddy/v2.run()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:529 +0x804
  github.com/caddyserver/caddy/v2.unsyncedDecodeAndRun()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:343 +0x234
  github.com/caddyserver/caddy/v2.changeConfig()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:234 +0x800
  github.com/caddyserver/caddy/v2.Load()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/caddy.go:133 +0xac
  github.com/caddyserver/caddy/v2/cmd.cmdRun()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/cmd/commandfuncs.go:231 +0x71c
  github.com/caddyserver/caddy/v2/cmd.init.1.func2.WrapCommandFuncForCobra.func1()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/cmd/cobra.go:137 +0x3c
  github.com/spf13/cobra.(*Command).execute()
      /Users/christophc/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983 +0xb18
  github.com/spf13/cobra.(*Command).ExecuteC()
      /Users/christophc/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115 +0x4b8
  github.com/spf13/cobra.(*Command).Execute()
      /Users/christophc/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039 +0x1bc
  github.com/caddyserver/caddy/v2/cmd.Main()
      /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/cmd/main.go:75 +0x1a8
  main.main()
      /Users/christophc/Workspace/common/CaddyLRT/build_tmp/buildenv_2024-11-14-0312.1171402475/main.go:16 +0x20
==================

@mholt
Copy link
Member

mholt commented Nov 15, 2024

That's very interesting -- thanks for posting that. I'll look into it tomorrow; but in the meantime, it's pretty clear from looking at the referenced lines, that it isn't the same as the fatal concurrent map read/write originally posted about, unfortunately.

I would be expecting a data race with one of the reads or writes at github.com/caddyserver/caddy/v2@v2.8.4/modules/caddyhttp/vars.go:323 (v2.8.4) based on the original post. Do you have any data races with the file vars.go?

@mholt mholt added the bug 🐞 Something isn't working label Nov 15, 2024
@mholt
Copy link
Member

mholt commented Nov 15, 2024

@christophcemper Can you verify for me, to make sure I'm looking at the right lines of code (since you are running a modified fork of 2.8.4): what is this line of code for you: /Users/christophc/Workspace/common/CaddyLRT/caddy-fork/modules/caddytls/tls.go:603

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working needs info 📭 Requires more information
Projects
None yet
Development

No branches or pull requests

4 participants