-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support large files #57
Comments
I have a patch when reading from files using the file size as an indicator to grow the buffer, I am still wondering what might be the best approach when reading from a stream like stdin 🤔 Note: I believe the io.Copy approach will read all of stdin in memory before starting to process anything. Kubeconform takes the approach of reading and processing immediately as data comes in. This is a bit faster and uses a lot less memory when reading a lot of data from stdin. |
@maxbrunet let me know what you think of my branch :) for stdin however... apart from trying to load all of stdin in memory before doing anything, which I'm not so keen on, I'm not sure what the best solution might be.. |
I do not see how you would start unmarshalling a partial JSON, or even the value of unmarshalling a partial YAML (nothing tells if it'll be valid in the end). IMHO it has to be loaded all in memory, and if the file or stream is large, it is the responsibility of the user to have enough memory available. What is the approach of other tools like |
@maxbrunet do you want an (undocumented?) environment variable to override the default value while we figure this out, if this is blocking you? :) |
No hurries, I'm doing fine for now with |
Done, I guess I just needed to read the doc more carefully. Bufio.Scanner will resize the buffer on its own when needed. I also added a test for it. |
I have finally taken the time to test this, and I have been able to validate all my manifests, thank you! Unfortunately, I do not observe any significant performance difference with |
Glad this works, thanks for posting an update! Indeed performance usually improves with number of cores, even though there could be a tiny startup win since we also download the schemas in parallel. That might not make such a massive difference in most cases though. |
Righ now files are limited to 4MB
kubeconform/pkg/resource/stream.go
Line 51 in c4b044f
kubeconform/pkg/resource/files.go
Line 130 in c4b044f
For example, I have a 17MB output for our Grafana deployment, mainly dozens of dashboards in ConfigMaps: 😛
kubeval
had the same issue and solved it with a default buffer (expend itself dynamically) andio.Copy()
like in instrumenta/kubeval#220. Not sure how it affects the overall performance.If you want to reproduce, here's a couple quick and dirty scripts to generate large files:
json.go
yaml.go
The text was updated successfully, but these errors were encountered: