Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POST requests to /apis/kubeappsapis.core.packages.v1alpha1.PackagesService/CreateInstalledPackage and /apis/kubeappsapis.core.packages.v1alpha1.PackagesService/DeleteInstalledPackage fail with a 502 #5671

Closed
RGPosadas opened this issue Nov 18, 2022 · 8 comments
Assignees
Labels
component/pinniped-proxy Issue related to kubeapps integration with pinniped-proxy kind/bug An issue that reports a defect in an existing feature

Comments

@RGPosadas
Copy link

RGPosadas commented Nov 18, 2022

1. Describe the bug

POST requests to create/delete a helm release sometimes fail with a 502 due to a timeout being reached (30s). This timeout is always reached when deploying our large charts (~30+ k8s resources). Our small charts (~3 resources) do not encounter this timeout. However, the resources eventually get created/deleted, and the release is seen on/removed from the Applications tab after a browser refresh. Tested on both Chrome and Firefox.

This issue only appeared when transitioning from kubeapps helm chart version 10.2.2 to 12.1.0.

2. To Reproduce

2.a Helm Chart Setup

  • kubeapps helm chart version 12.1.0, for dependencies:
    • common chart version 2.1.2
    • postgresql chart version 12.1.0
    • redis is fully disabled as we do not use flux packages
  • Since we run on GKE and have GCP Artifact Registry as a package repo, we have the flag frontend.proxypassAccessTokenAsBearer: true enabled
  • kubeappsapis.pluginConfig.core.packages.v1alpha1.timeoutSeconds: 300 as default and is untouched
  • dex as our OIDC provider. Relevant config:
  authProxy:
    clientID: "kubeapps"
    extraFlags:
      - --skip-provider-button
      - --oidc-issuer-url=https://<dex-domain>
      - --provider-ca-file=/certs/ca.crt
      - --cookie-secure=true
      - --cookie-expire=24h
      - --scope=openid profile email groups
      - --proxy-prefix=/oauth2
      - --redirect-url=/oauth2/callback
  • We offer kubeapps to internal customers who do not have direct access to our k8s cluster, therefore we have pinnipedProxy.enabled: true. Relevant config:
    clusters:
    - name: "default"
      isKubeappsCluster: true
      apiServiceURL: "<provided by pinniped-concierge>"
      certificateAuthorityData: "<provided by pinniped-concierge>"
      pinnipedConfig:
        enabled: true
  • pinniped-concierge is running on version 0.20.0

2.b Kubeapps Dashboard - Actions Taken

  1. Login successfully
  2. Deploy a chart that will create 30+ k8s resources

2.c Relevant Logs
Logs below are for a POST /CreateInstalledPackage

$ k logs -l app.kubernetes.io/component=frontend -c auth-proxy  -f
[2022/11/18 19:57:53] [error_page.go:93] Error proxying to upstream server: net/http: timeout awaiting response headers
172.20.0.13:48546 - ... - ritchelle.posadas@xxx.com [2022/11/18 19:57:23] apps.cxpt.only.sap POST / "/apis/kubeappsapis.core.packages.v1alpha1.PackagesService/CreateInstalledPackage" HTTP/1.1 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:107.0) Gecko/20100101 Firefox/107.0" 502 2317 30.003
$ k logs -l app.kubernetes.io/component=frontend -c nginx -f
127.0.0.1 - - [18/Nov/2022:19:57:53 +0000] "POST /apis/kubeappsapis.core.packages.v1alpha1.PackagesService/CreateInstalledPackage HTTP/1.1" 499  0 "https://apps.cxpt.only.sap/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:107.0) Gecko/20100101 Firefox/107.0" "172.19.0.9, 172.20.0.13"
$ k logs -l app.kubernetes.io/component=frontend -c pinniped-proxy -f

## Empty, even with RUST_LOG = debug
## Issue has been acknowledged by @absoludity 
$ k logs -l app.kubernetes.io/component=kubeappsapis -f
I1118 19:14:46.302621       1 packages.go:268] "+core CreateInstalledPackage" cluster="default" namespace="myns"
I1118 19:14:46.302767       1 server.go:674] "+helm CreateInstalledPackage" cluster="default" namespace="myns"
I1118 19:14:46.354042       1 server.go:873] "fetching chart with user-agent" chartID="chartmuseum/my-large-chart" userAgentString="kubeapps-apis/plugins/helm.packages/v1alpha1/devel"
I1118 19:14:46.357930       1 server.go:886] "using chart tarball" url="http://apps-chartmuseum.apps.svc.cluster.local:8080/charts/my-large-chart-1.1.1.tgz"
...
...
## When timeout is reached
## Each resource has an equivalent ERROR message
E1118 20:12:35.825730       1 server.go:384] Event received: {Event:{Type:ERROR Object:&Status{ListMeta:ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Status:Failure,Message:an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding,Reason:InternalError,Details:&StatusDetails{Name:,Group:,Kind:,Causes:[]StatusCause{StatusCause{Type:UnexpectedServerResponse,Message:unable to decode an event from the watch stream: context canceled,Field:,},StatusCause{Type:ClientWatchDecoding,Message:unable to decode an event from the watch stream: context canceled,Field:,},},RetryAfterSeconds:0,UID:,},Code:500,}} ResourceRef:api_version:"apps/v1" kind:"Deployment" name:"my-large-release-gcsproxy" namespace:"myns"}
$ k logs -lapp=pinniped-concierge -f -n pinniped-concierge
## When timeout is reached
{"level":"error","timestamp":"2022-11-18T20:12:35.826467Z","caller":"k8s.io/apiserver@v0.25.2/pkg/server/filters/wrap.go:53$filters.WithPanicRecovery.func1","message":"timeout or abort while handling: method=GET URI=\"/apis/apps/v1/namespaces/myns/deployments?fieldSelector=metadata.name%3Dmy-large-release-gcsproxy&watch=true\" audit-ID=\"...\"\n"}

3. Expected behavior
Kubeapps works as expected and does not encounter any timeouts, as per our previous working version (helm chart version 10.2.2).

4. Screenshots
N/A

5. Desktop (please complete the following information):

  • Chrome Version 107.0.5304.110 (Official Build) (arm64)
  • Firefox 107.0
  • GKE v1.23.8-gke.1900
@RGPosadas RGPosadas added the kind/bug An issue that reports a defect in an existing feature label Nov 18, 2022
@kubeapps-bot kubeapps-bot moved this to 🗂 Backlog in Kubeapps Nov 18, 2022
@absoludity
Copy link
Contributor

Thanks again @RGPosadas for the detailed report.

I'd not realised when we chatted on slack that this was specific to a large chart. Interesting. I've actually just landed a few PRs that aim to improve the performance of pinniped-proxy. I'll test locally with a large chart and see if I can reproduce, then verify if the new code fixes the issue.

A few notes:

We offer kubeapps to internal customers who do not have direct access to our k8s cluster, therefore we have pinnipedProxy.enabled: true.

Just making sure: I assume you're using pinniped because you can't configure the cluster directly with your OIDC setup, because if you can, you don't need pinniped (unless I'm missing something).

## Empty, even with RUST_LOG = debug
## Issue has been acknowledged by @absoludity 

Yep, fixed in main, but not yet released.

This issue only appeared when transitioning from kubeapps helm chart version 10.2.2 to 12.1.0.

Right, quite a jump (8 chart versions) - not yet sure what the change may have been. I'll let you know if I reproduce it.

Thanks!

@ppbaena
Copy link
Collaborator

ppbaena commented Nov 21, 2022

Hi @RGPosadas, this is Pepe Baena, engineering manager at Kubeapps. As a Kubeapps adopter, I’d like to feature your organization in the ADOPTERS.md file. Would you like to be included? We only need a PR with your logo and 1-2 sentences describing how your organization uses Kubeapps (we can also help you with this PR).

I really appreciate it for promoting Kubeapps use cases like yours throughout our amazing open-source community.
Thank you so much for your contribution.

@ppbaena ppbaena added the component/pinniped-proxy Issue related to kubeapps integration with pinniped-proxy label Nov 21, 2022
@ppbaena ppbaena moved this from 🗂 Backlog to 🗒 Todo in Kubeapps Nov 21, 2022
@ppbaena ppbaena added this to the Technical debt milestone Nov 21, 2022
@RGPosadas
Copy link
Author

I'd not realised when we chatted on slack that this was specific to a large chart. Interesting.

@absoludity Yes I realized this after posting the Slack message on my 2nd retry of the new chart version. Hopefully that helps narrow things down!

I've actually just landed a few PRs that aim to improve the performance of pinniped-proxy. I'll test locally with a large chart and see if I can reproduce, then verify if the new code fixes the issue.

Awesome, looking forward to this 😄

Just making sure: I assume you're using pinniped because you can't configure the cluster directly with your OIDC setup, because if you can, you don't need pinniped (unless I'm missing something).

Pinniped isn't meant for us, but rather our customers. They should not have direct access to the k8s cluster but should still be able to CRUD releases, which is why we need pinniped.

As a Kubeapps adopter, I’d like to feature your organization in the ADOPTERS.md file.

@ppbaena We already are! I am part of SAP's Team Teapots 😄

@ppbaena
Copy link
Collaborator

ppbaena commented Nov 21, 2022

Thanks for the heads up @RGPosadas!

@absoludity
Copy link
Contributor

Pinniped isn't meant for us, but rather our customers. They should not have direct access to the k8s cluster but should still be able to CRUD releases, which is why we need pinniped.

OK, this is the bit I'm confused about. Unless I'm missing something, you don't need pinniped to ensure customers can use Kubeapps without having direct access to the k8s cluster - iff you control your cluster yourself (as opposed to a managed cluster by some other service). You could (if you wanted to) instead configure your cluster with the OIDC params so that your cluster trusts your dex IdP. Your users login as normal (assuming they can access both dex and the kubeapps frontend), and kubeapps is able to pass the user's id_token through with any requests to the clusters' API, which the cluster trusts because of your OIDC config. Pinniped support was added because we couldn't do this on clusters where people are unable to configure the cluster's own OIDC options.

If you need an example, this is the config we use in our local dev environment to configure those oidc options.

Hope that makes sense.

@RGPosadas
Copy link
Author

@absoludity Ahh gotcha. Will take a look!

@absoludity
Copy link
Contributor

Great. The Kubeapps docs should present both options, with the straight OIDC (without Pinniped) as the first. But reading that page, it does make it sound like you should be using Pinniped either way (which is not true). I'll get that updated.

absoludity added a commit that referenced this issue Nov 30, 2022
Signed-off-by: Michael Nelson <minelson@vmware.com>

<!--
Before you open the request please review the following guidelines and
tips to help it be more easily integrated:

 - Describe the scope of your change - i.e. what the change does.
 - Describe any known limitations with your change.
- Please run any tests or examples that can exercise your modified code.

 Thank you for contributing!
 -->

### Description of the change

<!-- Describe the scope of your change - i.e. what the change does. -->
While chatting about [OIDC
configuration](#5671 (comment))
with @RGPosadas I was looking through the docs and found that it reads
currently as if you should use Pinniped either way.

This change just clears that up a bit, I hope, so that people use the
cluster's own OIDC support when they can, and pinniped when they can't.

### Benefits

<!-- What benefits will be realized by the code change? -->

### Possible drawbacks

<!-- Describe any known limitations with your change -->

### Applicable issues

<!-- Enter any applicable Issues here (You can reference an issue using
#) -->

- ref #5671

### Additional information

<!-- If there's anything else that's important and relevant to your pull
request, mention that information here.-->

Signed-off-by: Michael Nelson <minelson@vmware.com>
@ppbaena ppbaena moved this from 🗒 Todo to 🔎 In Review in Kubeapps Dec 19, 2022
@absoludity
Copy link
Contributor

Hi there @RGPosadas . Wondering if you had a chance to verify as above whether you really need to be using Pinniped, given that you control your cluster yourself? If not, the latest release has the timing improvements for our pinniped-proxy service (but still better not to use it if you don't need to - as it's extra complexity to work around the situation where people don't control their cluster themselves).

I'll close this for now, but feel free to re-open or comment with any more info - I didn't yet test with a chart with +30 resources.

Thanks!

Repository owner moved this from 🔎 In Review to ✅ Done in Kubeapps Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/pinniped-proxy Issue related to kubeapps integration with pinniped-proxy kind/bug An issue that reports a defect in an existing feature
Projects
Archived in project
Development

No branches or pull requests

3 participants