Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with large plans hanging on 0.2.0 #658

Closed
adamlc opened this issue Dec 6, 2023 · 5 comments · Fixed by #660
Closed

Issues with large plans hanging on 0.2.0 #658

adamlc opened this issue Dec 6, 2023 · 5 comments · Fixed by #660

Comments

@adamlc
Copy link

adamlc commented Dec 6, 2023

After upgrading to 0.2.0 I'm having some issues with some plans running, they seem to be larger plans from what I've discovered so far.

A few minutes into the plan it hangs and then eventually otf crashes with some nil pointer errors, they all look to be the same error:

2023/12/06 16:20:08 runtime error: invalid memory address or nil pointer dereference
2023/12/06 16:20:08 goroutine 9870 [running]:
runtime/debug.Stack()
        /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/debug/stack.go:24 +0x5e
github.com/gorilla/handlers.recoveryHandler.log({{0x1966c60, 0xc014f69e90}, {0x0, 0x0}, 0x1}, {0xc02d72d170?, 0x0?, 0x2?})
        /home/runner/go/pkg/mod/github.com/gorilla/handlers@v1.5.1/recovery.go:89 +0xb8
github.com/gorilla/handlers.recoveryHandler.ServeHTTP.func1()
        /home/runner/go/pkg/mod/github.com/gorilla/handlers@v1.5.1/recovery.go:74 +0xbe
panic({0x1455ea0?, 0x24579f0?})
        /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/panic.go:914 +0x21f
github.com/leg100/otf/internal/organization.(*OrganizationToken).CanAccessOrganization(0x1976660?, 0x13acb40?, {0xc01482dd61?, 0x2?})
        /home/runner/work/otf/otf/internal/organization/token.go:66 +0x13
github.com/leg100/otf/internal/organization.(*Authorizer).CanAccess(0xc000bbc378, {0x1976660, 0xc014b2bd10}, 0xc04d53a790?, {0xc01482dd61, 0x7})
        /home/runner/work/otf/otf/internal/organization/authorizer.go:24 +0xb5
github.com/leg100/otf/internal/variable.(*service).getVariableSet(0xc00043c4d0, {0x1976660, 0xc014b2bd10}, {0xc04d53a790, 0x17})
        /home/runner/work/otf/otf/internal/variable/service.go:366 +0x16a
github.com/leg100/otf/internal/variable.(*tfe).getVariableSet(0xc000bbc3a8, {0x1972090, 0xc04d878680}, 0xc014505100)
        /home/runner/work/otf/otf/internal/variable/tfe.go:245 +0x93
net/http.HandlerFunc.ServeHTTP(0x15a7d80?, {0x1972090?, 0xc04d878680?}, 0xf?)
        /opt/hostedtoolcache/go/1.21.4/x64/src/net/http/server.go:2136 +0x29
github.com/leg100/otf/internal/tfeapi.(*Handlers).AddHandlers.func2.1({0x1972090, 0xc04d878680}, 0xc014505100)
        /home/runner/work/otf/otf/internal/tfeapi/api.go:54 +0x1d7
net/http.HandlerFunc.ServeHTTP(0xc014505000?, {0x1972090?, 0xc04d878680?}, 0x0?)
        /opt/hostedtoolcache/go/1.21.4/x64/src/net/http/server.go:2136 +0x29
github.com/leg100/otf/internal/tokens.NewService.newMiddleware.func1.1({0x1972090, 0xc04d878680}, 0xc014505000)
        /home/runner/work/otf/otf/internal/tokens/middleware.go:112 +0x277
net/http.HandlerFunc.ServeHTTP(0x1976660?, {0x1972090?, 0xc04d878680?}, 0x1964280?)
        /opt/hostedtoolcache/go/1.21.4/x64/src/net/http/server.go:2136 +0x29
github.com/leg100/otf/internal/http.NewServer.func3.1({0x1972090, 0xc04d878680}, 0xc014504f00)
        /home/runner/work/otf/otf/internal/http/server.go:105 +0x14b
net/http.HandlerFunc.ServeHTTP(0xc000034b38?, {0x1972090?, 0xc04d878680?}, 0x100?)
        /opt/hostedtoolcache/go/1.21.4/x64/src/net/http/server.go:2136 +0x29
github.com/gorilla/handlers.recoveryHandler.ServeHTTP({{0x1966c60, 0xc014f69e90}, {0x0, 0x0}, 0x1}, {0x1972090?, 0xc04d878680?}, 0x195cd10?)
        /home/runner/go/pkg/mod/github.com/gorilla/handlers@v1.5.1/recovery.go:78 +0xd9
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0147c4600, {0x1972090, 0xc04d878680}, 0xc014504d00)
        /home/runner/go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1c5
net/http.serverHandler.ServeHTTP({0x4?}, {0x1972090?, 0xc04d878680?}, 0xc01c4c4d00?)
        /opt/hostedtoolcache/go/1.21.4/x64/src/net/http/server.go:2938 +0x8e
net/http.initALPNRequest.ServeHTTP({{0x1976660?, 0xc01c6b5e00?}, 0xc03a115500?, {0xc000378780?}}, {0x1972090, 0xc04d878680}, 0xc014504d00)
        /opt/hostedtoolcache/go/1.21.4/x64/src/net/http/server.go:3546 +0x231
net/http.(*http2serverConn).runHandler(0x24b86c0?, 0x0?, 0x0?, 0x0?)
        /opt/hostedtoolcache/go/1.21.4/x64/src/net/http/h2_bundle.go:6193 +0xbb
created by net/http.(*http2serverConn).scheduleHandler in goroutine 7119
        /opt/hostedtoolcache/go/1.21.4/x64/src/net/http/h2_bundle.go:6128 +0x21d

Also possibly unrelated is when the otf process restarts there seems to be a new agent created and the old ones appear to stay there broken

image

Another issue is when I try to create an agent pool the page just hangs. It looks like it creates the pool but I try to navigate to the config page it just hangs. I don't get any console errors either.

@leg100
Copy link
Owner

leg100 commented Dec 6, 2023

Are you doing something funky with organization tokens?

And it looks like you're provisioning variable sets using terraform. Please post the config you're using.

As for "old agents" showing up as errored, this is intentional. Your otfd is crashing and taking the internal agent with it before it is able to exit cleanly. The agent is at some point deemed "unknown" before being placed into the "errored" state, and then finally removed altogether.

Not sure about the creating agent pool hanging issue. I'll see if I can re-create it.

@adamlc
Copy link
Author

adamlc commented Dec 7, 2023

Thanks for the update, I didn't want to paste the config here as it's literally hundreds of resources. If it looks like it's related to variable sets I can try and see if I can create an isolated example.

The infra we have is kind of a layered approach. So some layers create variable sets for terraform in lower layers etc, if that makes sense. This has all worked up until now, I know there have been changes to tokens and stuff recently in OTF. It's been a few months since I touched this, so it may have been another release.

leg100 added a commit that referenced this issue Dec 7, 2023
Organization tokens have been broken since they were refactored in
104013d. The JWT contains the _organization token_ _ID_ but the
middleware was checking the _organization_ _name_.

Might fix #658.
@leg100
Copy link
Owner

leg100 commented Dec 7, 2023

@adamlc As you can see, org tokens were broken. Try out the new release and if it still doesn't work please re-open.

@adamlc
Copy link
Author

adamlc commented Dec 7, 2023

@leg100 thank you so much for the quick turn around, I can confirm that fix worked perfectly!

Would you like me to open a separate issue for the agent pools page hanging?

@leg100
Copy link
Owner

leg100 commented Dec 7, 2023

Would you like me to open a separate issue for the agent pools page hanging?

Yes, please do. I'm concerned there is a database deadlock or something of that nature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants