Support large files #57

maxbrunet · 2021-07-06T23:23:34Z

Righ now files are limited to 4MB

kubeconform/pkg/resource/stream.go

Line 51 in c4b044f

const maxResourceSize = 4 * 1024 * 1024 // 4MB ought to be enough for everybody

kubeconform/pkg/resource/files.go

Line 130 in c4b044f

maxResourceSize := 4 * 1024 * 1024 // 4MB ought to be enough for everybody

For example, I have a 17MB output for our Grafana deployment, mainly dozens of dashboards in ConfigMaps: 😛

Summary: 0 resource found parsing stdin - Valid: 0, Invalid: 0, Errors: 0, Skipped: 0

kubeval had the same issue and solved it with a default buffer (expend itself dynamically) and io.Copy() like in instrumenta/kubeval#220. Not sure how it affects the overall performance.

If you want to reproduce, here's a couple quick and dirty scripts to generate large files:

json.go

package main

import "fmt"

const (
	N = 20000
)

func main() {
	fmt.Println(
		`{
    "apiVersion": "v1",
    "kind": "List",
    "items": [`,
	)

	for i := 0; i < N; i++ {
		fmt.Printf(
			`        {
            "apiVersion": "v1",
            "kind": "ConfigMap",
            "metadata": {
                "name": "yet-another-confimap-%d"
            },
            "data": {
                "key": "value"
            }
        }`,
			i,
		)
		if i == N - 1 {
			fmt.Println()
		} else {
			fmt.Println(",")
		}
	}

	fmt.Println(`    ]
}
`)
}

yaml.go

package main

import "fmt"

func main() {
	fmt.Println(
		`apiVersion: v1
kind: List
items:`,
	)

	for i := 0; i < 50000; i++ {
		fmt.Printf(
			`- apiVersion: v1
  kind: ConfigMap
  metadata:
    name: yet-another-confimap-%d
  data:
    key: value
`,
			i,
		)
	}
}

The text was updated successfully, but these errors were encountered:

yannh · 2021-07-11T12:11:54Z

I have a patch when reading from files using the file size as an indicator to grow the buffer, I am still wondering what might be the best approach when reading from a stream like stdin 🤔

Note: I believe the io.Copy approach will read all of stdin in memory before starting to process anything. Kubeconform takes the approach of reading and processing immediately as data comes in. This is a bit faster and uses a lot less memory when reading a lot of data from stdin.

yannh · 2021-07-11T20:42:35Z

@maxbrunet let me know what you think of my branch :) for stdin however... apart from trying to load all of stdin in memory before doing anything, which I'm not so keen on, I'm not sure what the best solution might be..

maxbrunet · 2021-07-13T00:49:13Z

Kubeconform takes the approach of reading and processing immediately as data comes in.

I do not see how you would start unmarshalling a partial JSON, or even the value of unmarshalling a partial YAML (nothing tells if it'll be valid in the end). IMHO it has to be loaded all in memory, and if the file or stream is large, it is the responsibility of the user to have enough memory available. What is the approach of other tools like kubectl?

yannh · 2021-08-29T12:21:02Z

@maxbrunet do you want an (undocumented?) environment variable to override the default value while we figure this out, if this is blocking you? :)

maxbrunet · 2021-08-29T21:04:05Z

No hurries, I'm doing fine for now with kubeval, your schema repository, and the openapi2jsonschema.py script. Thank you @yannh

yannh · 2021-09-26T15:48:39Z

Done, I guess I just needed to read the doc more carefully. Bufio.Scanner will resize the buffer on its own when needed. I also added a test for it.

maxbrunet · 2021-12-18T02:55:27Z

I have finally taken the time to test this, and I have been able to validate all my manifests, thank you! Unfortunately, I do not observe any significant performance difference with kubeval, probably because my CI pipeline runs on a single core.

yannh · 2021-12-18T21:14:23Z

Glad this works, thanks for posting an update! Indeed performance usually improves with number of cores, even though there could be a tiny startup win since we also download the schemas in parallel. That might not make such a massive difference in most cases though.

yannh mentioned this issue Jul 11, 2021

Increase buffer size when needed as guessed from file size #61

Closed

yannh mentioned this issue Sep 26, 2021

Support for larger files #76

Merged

yannh closed this as completed Sep 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support large files #57

Support large files #57

maxbrunet commented Jul 6, 2021

yannh commented Jul 11, 2021 •

edited

Loading

yannh commented Jul 11, 2021

maxbrunet commented Jul 13, 2021

yannh commented Aug 29, 2021

maxbrunet commented Aug 29, 2021

yannh commented Sep 26, 2021

maxbrunet commented Dec 18, 2021

yannh commented Dec 18, 2021

Support large files #57

Support large files #57

Comments

maxbrunet commented Jul 6, 2021

yannh commented Jul 11, 2021 • edited Loading

yannh commented Jul 11, 2021

maxbrunet commented Jul 13, 2021

yannh commented Aug 29, 2021

maxbrunet commented Aug 29, 2021

yannh commented Sep 26, 2021

maxbrunet commented Dec 18, 2021

yannh commented Dec 18, 2021

yannh commented Jul 11, 2021 •

edited

Loading