[tiller-proxy] Tiller-proxy pod gets killed because memory usage #1052

andresmgot · 2019-06-07T18:50:19Z

Tiller-proxy reaches its memory limit when installing some apps (for example stable/linkerd). The pod gets killed:

  containerStatuses:
  - containerID: docker://abca9629ca890ec386bf613577bf3d5b88aa42a60d148666e5567f1a1cc790f4
    image: kubeapps/tiller-proxy:latest
    imageID: docker-pullable://kubeapps/tiller-proxy@sha256:0d2e45b0426ac7ba061c8ea6130a0087047479c4c13a078d0315df87c45bd025
    lastState:
      terminated:
        containerID: docker://a7cc49e584468031f68e55c2ad8757d9149872dedeb4ef3649cc780cadc9570c
        exitCode: 137
        finishedAt: "2019-06-07T18:40:03Z"
        reason: OOMKilled
        startedAt: "2019-06-07T18:38:08Z"

We should evaluate if we should increase the maximum memory that the pod can use or if we are misusing memory in the service.

The text was updated successfully, but these errors were encountered:

prydonius · 2019-06-07T23:25:52Z

Do you have the logs of the pod? I'm guessing we're running out of memory as tiller-proxy tries to load the tarball for linkerd in-memory. Do the chart-repo syncs have any resource limits, as they are also loading tarballs in-memory?

andresmgot · 2019-06-07T23:34:21Z

The logs look something like:

time="2019-06-07T21:31:51Z" level=info msg="Creating Helm Release"
2019/06/07 21:31:51 Downloading repo https://kubernetes-charts.storage.googleapis.com/index.yaml index...

And then the pod gets killed so I guess yes, reading the tarball may cause this. I haven't seen this issue in the chart-repo though. Where does the chart-repo reads the tarball?

prydonius · 2019-06-10T18:11:21Z

And then the pod gets killed so I guess yes, reading the tarball may cause this.

Hmm looking through the code, that doesn't quite match up. We seem to download the repo index https://github.com/kubeapps/kubeapps/blob/cf8c93f77324674369a039fe44e31f9cc3e55f4a/pkg/chart/chart.go#L287, but never get around to downloading the chart https://github.com/kubeapps/kubeapps/blob/cf8c93f77324674369a039fe44e31f9cc3e55f4a/pkg/chart/chart.go#L298. The index is definitely not that large, but maybe there is some memory leak somewhere that increases tiller-proxy memory use over time?

I haven't seen this issue in the chart-repo though. Where does the chart-repo reads the tarball?

The reason this hasn't surfaced in chart-repo is because we are not setting and resource limits: https://github.com/kubeapps/kubeapps/blob/master/cmd/apprepository-controller/controller.go#L425.

prydonius · 2019-06-10T18:12:43Z

The size of the stable/linkerd tarball is 3.7K so I would be surprised that this is specific to that chart. Were you able to reproduce this multiple times with linkerd?

andresmgot · 2019-06-17T11:18:01Z

Yes, I was able to reproduce that several times with linkerd and openebs. We would need to investigate the exact point in which it fails but it's reproducible (at least in my Minikube setup)

[Edit] I was able to reproduce it with other (more simple charts) so it seems that the problem is how we read the index.yaml file.

[Edit 2] I can confirm that the issue happens when unmarshaling the index.yaml

andresmgot · 2019-06-17T15:57:05Z

so I tracked down the issue to the line:

https://github.com/kubeapps/kubeapps/blob/master/pkg/chart/chart.go#L150

Apparently the ghodss/yaml library is the one using "too much" memory. Since it seems that helm depends on that library, there is not much we can do (the struct tags are annotated only for json and that only works for ghodss/yaml not the upstream go-yaml/yaml). Found: helm/helm#1287

We should increase the memory limit in that case.

andresmgot · 2019-06-18T09:32:45Z

The issue only appears for me when installing a chart a second time so I cached the result of parsing the index.yaml of the different chart repositories. This improved the installation time but yaml.Unmarshal is still used when loading the Chart.yaml file of a chart. This increases the memory usage to more than 128Mi:

NAME                                                         CPU(cores)   MEMORY(bytes)
kubeapps-internal-tiller-proxy-7bdc97b7db-c7dtr              0m           149Mi

We should not cache the content of every chart (because that would be worse from a memory usage point of view) so we would need to increase the memory limit anyway.

After increasing the limit to 256Mi I am not able to reproduce the issue anymore.

lingsamuel · 2021-04-13T10:21:21Z

Encountered same problem here, only 9.7M index requires ~300Mi memory. I am using 1.11.3-scratch-r0. I looked at the master branch, and the relevant code seems to have not changed.

lingsamuel · 2021-04-13T10:26:58Z

The yaml library seems to do some duplicate works, it unmarshal yaml bytes to obj, marshal the obj to json, and unmarshal the json to obj again.

func Unmarshal(y []byte, o interface{}) error {
	vo := reflect.ValueOf(o)
	j, err := yamlToJSON(y, &vo)
	if err != nil {
		return fmt.Errorf("error converting YAML to JSON: %v", err)
	}

	err = json.Unmarshal(j, o)
	if err != nil {
		return fmt.Errorf("error unmarshaling JSON: %v", err)
	}

	return nil
}

func yamlToJSON(y []byte, jsonTarget *reflect.Value) ([]byte, error) {
	// Convert the YAML to an object.
	var yamlObj interface{}
	err := yaml.Unmarshal(y, &yamlObj)
	if err != nil {
		return nil, err
	}

	jsonObj, err := convertToJSONableObject(yamlObj, jsonTarget)
	if err != nil {
		return nil, err
	}

	// Convert this object to JSON and return the data.
	return json.Marshal(jsonObj)
}

andresmgot · 2021-04-13T13:11:29Z

Hi @lingsamuel, it seems that you are running a quite old version of Kubeapps that it's no longer supported (1.11.3-scratch-r0). My recommendation is to upgrade to a newer version to get the latest fixes.

lingsamuel · 2021-04-14T06:38:13Z

Hi @andresmgot, I would say related logic have not changed yet. I upgrade kubeapps to latest (2.3.1) but the problem still exists.

andresmgot · 2021-04-14T13:45:27Z

Hi @lingsamuel, I am not able to reproduce that with the latest version. tiller-proxy was replaced with kubeops and the memory usage doesn't spike that much:

The repository YAML is read in the apprepo-sync job though. I have run a test as well with the bitnami repository (https://charts.bitnami.com/bitnami) but still no problem:

lingsamuel · 2021-04-15T06:42:07Z

Bitnami index is 7.3M, but it only contains 91 entries and 9057 versions.
My helm index is 9.7M, but contains 338 entries and 17742 versions, some of them contain a lot of dependencies, that's may be the reason.

It's not the kubeops problem, it's the github.com/ghodss/yaml lib problem.

Here is a test repo: lingsamuel/helm-index-unmarshal-test, the index is generated with 350 entries and 17500 versions, 8.8M.
Clone the repo and run make run, the memory output shows:

After the unmarshal, total alloc is 271 and gc 6 times. That means occasionally memory peak could be very high.
Use docker stats to inspect the memory usage, the memory usage could be ~200M.

andresmgot · 2021-04-15T15:22:40Z

thanks for the investigation @lingsamuel, it's indeed useful. Can you verify if the alternative (gopkg.in/yaml.v2) solves the issue?

lingsamuel · 2021-04-15T16:25:12Z

thanks for the investigation @lingsamuel, it's indeed useful. Can you verify if the alternative (gopkg.in/yaml.v2) solves the issue?

I tried this package. It use about 1/2 memory in my case. But unfortunately, for some reasons I don't know, the unmarshalled ChartVersion object lost Metadata field.

andresmgot · 2021-04-16T15:04:42Z

Can you send a PR with your progress changing the library? We can assist you from there to check if we can address that issue.

lingsamuel · 2021-04-19T06:58:50Z

go-yaml/yaml#63

On composite struct, original library behaviors differ from ghodss version. It means, it only supports yaml like this:

- metadata: # Note this
    apiVersion: v1
    appVersion: 1.0.0
    description: test
    name: test
    version: 1.0.0
  created: "2021-04-15T14:45:24.707638057+08:00"
  digest: aa
  urls:
    - charts/test-1.0.0.tgz

instead this:

- apiVersion: v1
  appVersion: 1.0.0
  description: test
  name: test
  version: 1.0.0
  created: "2021-04-15T14:45:24.707638057+08:00"
  digest: aa
  urls:
    - charts/test-1.0.0.tgz

But the "inline" tag mentioned in the issue above doesn't exists in helm lib (by the way, pointer composite struct needs a work around: go-yaml/yaml#356).

andresmgot added the kind/bug An issue that reports a defect in an existing feature label Jun 7, 2019

andresmgot changed the title ~~[tiller-proxy] Kubeapps fails to install linkerd~~ [tiller-proxy] Tiller-proxy pod gets killed because memory usage Jun 18, 2019

andresmgot mentioned this issue Jun 18, 2019

Fix memory usage for tiller-proxy #1058

Merged

andresmgot closed this as completed in #1058 Jun 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tiller-proxy] Tiller-proxy pod gets killed because memory usage #1052

[tiller-proxy] Tiller-proxy pod gets killed because memory usage #1052

andresmgot commented Jun 7, 2019 •

edited

Loading

prydonius commented Jun 7, 2019

andresmgot commented Jun 7, 2019

prydonius commented Jun 10, 2019

prydonius commented Jun 10, 2019

andresmgot commented Jun 17, 2019 •

edited

Loading

andresmgot commented Jun 17, 2019

andresmgot commented Jun 18, 2019 •

edited

Loading

lingsamuel commented Apr 13, 2021 •

edited

Loading

lingsamuel commented Apr 13, 2021

andresmgot commented Apr 13, 2021

lingsamuel commented Apr 14, 2021

andresmgot commented Apr 14, 2021

lingsamuel commented Apr 15, 2021

andresmgot commented Apr 15, 2021

lingsamuel commented Apr 15, 2021

andresmgot commented Apr 16, 2021

lingsamuel commented Apr 19, 2021

[tiller-proxy] Tiller-proxy pod gets killed because memory usage #1052

[tiller-proxy] Tiller-proxy pod gets killed because memory usage #1052

Comments

andresmgot commented Jun 7, 2019 • edited Loading

prydonius commented Jun 7, 2019

andresmgot commented Jun 7, 2019

prydonius commented Jun 10, 2019

prydonius commented Jun 10, 2019

andresmgot commented Jun 17, 2019 • edited Loading

andresmgot commented Jun 17, 2019

andresmgot commented Jun 18, 2019 • edited Loading

lingsamuel commented Apr 13, 2021 • edited Loading

lingsamuel commented Apr 13, 2021

andresmgot commented Apr 13, 2021

lingsamuel commented Apr 14, 2021

andresmgot commented Apr 14, 2021

lingsamuel commented Apr 15, 2021

andresmgot commented Apr 15, 2021

lingsamuel commented Apr 15, 2021

andresmgot commented Apr 16, 2021

lingsamuel commented Apr 19, 2021

andresmgot commented Jun 7, 2019 •

edited

Loading

andresmgot commented Jun 17, 2019 •

edited

Loading

andresmgot commented Jun 18, 2019 •

edited

Loading

lingsamuel commented Apr 13, 2021 •

edited

Loading