Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tiller-proxy] Tiller-proxy pod gets killed because memory usage #1052

Closed
andresmgot opened this issue Jun 7, 2019 · 17 comments · Fixed by #1058
Closed

[tiller-proxy] Tiller-proxy pod gets killed because memory usage #1052

andresmgot opened this issue Jun 7, 2019 · 17 comments · Fixed by #1058
Labels
kind/bug An issue that reports a defect in an existing feature

Comments

@andresmgot
Copy link
Contributor

andresmgot commented Jun 7, 2019

Tiller-proxy reaches its memory limit when installing some apps (for example stable/linkerd). The pod gets killed:

  containerStatuses:
  - containerID: docker://abca9629ca890ec386bf613577bf3d5b88aa42a60d148666e5567f1a1cc790f4
    image: kubeapps/tiller-proxy:latest
    imageID: docker-pullable://kubeapps/tiller-proxy@sha256:0d2e45b0426ac7ba061c8ea6130a0087047479c4c13a078d0315df87c45bd025
    lastState:
      terminated:
        containerID: docker://a7cc49e584468031f68e55c2ad8757d9149872dedeb4ef3649cc780cadc9570c
        exitCode: 137
        finishedAt: "2019-06-07T18:40:03Z"
        reason: OOMKilled
        startedAt: "2019-06-07T18:38:08Z"

We should evaluate if we should increase the maximum memory that the pod can use or if we are misusing memory in the service.

@andresmgot andresmgot added the kind/bug An issue that reports a defect in an existing feature label Jun 7, 2019
@prydonius
Copy link
Contributor

Do you have the logs of the pod? I'm guessing we're running out of memory as tiller-proxy tries to load the tarball for linkerd in-memory. Do the chart-repo syncs have any resource limits, as they are also loading tarballs in-memory?

@andresmgot
Copy link
Contributor Author

The logs look something like:

time="2019-06-07T21:31:51Z" level=info msg="Creating Helm Release"
2019/06/07 21:31:51 Downloading repo https://kubernetes-charts.storage.googleapis.com/index.yaml index...

And then the pod gets killed so I guess yes, reading the tarball may cause this. I haven't seen this issue in the chart-repo though. Where does the chart-repo reads the tarball?

@prydonius
Copy link
Contributor

And then the pod gets killed so I guess yes, reading the tarball may cause this.

Hmm looking through the code, that doesn't quite match up. We seem to download the repo index https://github.com/kubeapps/kubeapps/blob/cf8c93f77324674369a039fe44e31f9cc3e55f4a/pkg/chart/chart.go#L287, but never get around to downloading the chart https://github.com/kubeapps/kubeapps/blob/cf8c93f77324674369a039fe44e31f9cc3e55f4a/pkg/chart/chart.go#L298. The index is definitely not that large, but maybe there is some memory leak somewhere that increases tiller-proxy memory use over time?

I haven't seen this issue in the chart-repo though. Where does the chart-repo reads the tarball?

The reason this hasn't surfaced in chart-repo is because we are not setting and resource limits: https://github.com/kubeapps/kubeapps/blob/master/cmd/apprepository-controller/controller.go#L425.

@prydonius
Copy link
Contributor

The size of the stable/linkerd tarball is 3.7K so I would be surprised that this is specific to that chart. Were you able to reproduce this multiple times with linkerd?

@andresmgot
Copy link
Contributor Author

andresmgot commented Jun 17, 2019

Yes, I was able to reproduce that several times with linkerd and openebs. We would need to investigate the exact point in which it fails but it's reproducible (at least in my Minikube setup)

[Edit] I was able to reproduce it with other (more simple charts) so it seems that the problem is how we read the index.yaml file.

[Edit 2] I can confirm that the issue happens when unmarshaling the index.yaml

@andresmgot
Copy link
Contributor Author

so I tracked down the issue to the line:

https://github.com/kubeapps/kubeapps/blob/master/pkg/chart/chart.go#L150

Apparently the ghodss/yaml library is the one using "too much" memory. Since it seems that helm depends on that library, there is not much we can do (the struct tags are annotated only for json and that only works for ghodss/yaml not the upstream go-yaml/yaml). Found: helm/helm#1287

We should increase the memory limit in that case.

@andresmgot
Copy link
Contributor Author

andresmgot commented Jun 18, 2019

The issue only appears for me when installing a chart a second time so I cached the result of parsing the index.yaml of the different chart repositories. This improved the installation time but yaml.Unmarshal is still used when loading the Chart.yaml file of a chart. This increases the memory usage to more than 128Mi:

NAME                                                         CPU(cores)   MEMORY(bytes)
kubeapps-internal-tiller-proxy-7bdc97b7db-c7dtr              0m           149Mi

We should not cache the content of every chart (because that would be worse from a memory usage point of view) so we would need to increase the memory limit anyway.

After increasing the limit to 256Mi I am not able to reproduce the issue anymore.

@andresmgot andresmgot changed the title [tiller-proxy] Kubeapps fails to install linkerd [tiller-proxy] Tiller-proxy pod gets killed because memory usage Jun 18, 2019
@lingsamuel
Copy link

lingsamuel commented Apr 13, 2021

Encountered same problem here, only 9.7M index requires ~300Mi memory. I am using 1.11.3-scratch-r0. I looked at the master branch, and the relevant code seems to have not changed.
image
image

@lingsamuel
Copy link

The yaml library seems to do some duplicate works, it unmarshal yaml bytes to obj, marshal the obj to json, and unmarshal the json to obj again.

func Unmarshal(y []byte, o interface{}) error {
	vo := reflect.ValueOf(o)
	j, err := yamlToJSON(y, &vo)
	if err != nil {
		return fmt.Errorf("error converting YAML to JSON: %v", err)
	}

	err = json.Unmarshal(j, o)
	if err != nil {
		return fmt.Errorf("error unmarshaling JSON: %v", err)
	}

	return nil
}

func yamlToJSON(y []byte, jsonTarget *reflect.Value) ([]byte, error) {
	// Convert the YAML to an object.
	var yamlObj interface{}
	err := yaml.Unmarshal(y, &yamlObj)
	if err != nil {
		return nil, err
	}

	jsonObj, err := convertToJSONableObject(yamlObj, jsonTarget)
	if err != nil {
		return nil, err
	}

	// Convert this object to JSON and return the data.
	return json.Marshal(jsonObj)
}

@andresmgot
Copy link
Contributor Author

Hi @lingsamuel, it seems that you are running a quite old version of Kubeapps that it's no longer supported (1.11.3-scratch-r0). My recommendation is to upgrade to a newer version to get the latest fixes.

@lingsamuel
Copy link

Hi @andresmgot, I would say related logic have not changed yet. I upgrade kubeapps to latest (2.3.1) but the problem still exists.

@andresmgot
Copy link
Contributor Author

Hi @lingsamuel, I am not able to reproduce that with the latest version. tiller-proxy was replaced with kubeops and the memory usage doesn't spike that much:

Screenshot from 2021-04-14 15-37-12

The repository YAML is read in the apprepo-sync job though. I have run a test as well with the bitnami repository (https://charts.bitnami.com/bitnami) but still no problem:

Screenshot from 2021-04-14 15-41-15

@lingsamuel
Copy link

Bitnami index is 7.3M, but it only contains 91 entries and 9057 versions.
My helm index is 9.7M, but contains 338 entries and 17742 versions, some of them contain a lot of dependencies, that's may be the reason.

It's not the kubeops problem, it's the github.com/ghodss/yaml lib problem.

Here is a test repo: lingsamuel/helm-index-unmarshal-test, the index is generated with 350 entries and 17500 versions, 8.8M.
Clone the repo and run make run, the memory output shows:

image

After the unmarshal, total alloc is 271 and gc 6 times. That means occasionally memory peak could be very high.
Use docker stats to inspect the memory usage, the memory usage could be ~200M.

@andresmgot
Copy link
Contributor Author

thanks for the investigation @lingsamuel, it's indeed useful. Can you verify if the alternative (gopkg.in/yaml.v2) solves the issue?

@lingsamuel
Copy link

thanks for the investigation @lingsamuel, it's indeed useful. Can you verify if the alternative (gopkg.in/yaml.v2) solves the issue?

I tried this package. It use about 1/2 memory in my case. But unfortunately, for some reasons I don't know, the unmarshalled ChartVersion object lost Metadata field.

@andresmgot
Copy link
Contributor Author

Can you send a PR with your progress changing the library? We can assist you from there to check if we can address that issue.

@lingsamuel
Copy link

go-yaml/yaml#63

On composite struct, original library behaviors differ from ghodss version. It means, it only supports yaml like this:

- metadata: # Note this
    apiVersion: v1
    appVersion: 1.0.0
    description: test
    name: test
    version: 1.0.0
  created: "2021-04-15T14:45:24.707638057+08:00"
  digest: aa
  urls:
    - charts/test-1.0.0.tgz

instead this:

- apiVersion: v1
  appVersion: 1.0.0
  description: test
  name: test
  version: 1.0.0
  created: "2021-04-15T14:45:24.707638057+08:00"
  digest: aa
  urls:
    - charts/test-1.0.0.tgz

But the "inline" tag mentioned in the issue above doesn't exists in helm lib (by the way, pointer composite struct needs a work around: go-yaml/yaml#356).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug An issue that reports a defect in an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants