-
-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converting YAML to JSON is super slow #422
Comments
+1 |
in my experience - it slowed down a lot some time since early January
|
same for
I am imagining an O(N^2) algorithm somewhere inappropriate |
after downgrade to yq 2.4.1 --> zippy fast again (ie: 30s to parse yaml file in 3.x became sub second response time with 2.4.1) this helped to downgrade -> https://stackoverflow.com/questions/53837861/brew-list-and-install-specific-versions-of-formula |
Ah yeah, this will be when I moved to 3.0. As part of handling yaml anchors on conversion to JSON I made it iterate through the yaml structure. Let me play around to see if I can do that better...or even skip it |
Ok what I can do is make it not handle merge anchors by default when converting to JSON - however if there are merge anchors in the yaml file, then you will need to pass in |
Hey Mike -- will the "--explodeAnchors' still be slow? if so, does not get my vote. I have written plenty of one-pass parsers, and what it has boiled down to in the past is using some memory to cache the anchors (macros) as they are found, then doing an efficient look-up (hashed by name) at the point of reference. |
or, if by |
Yep will still be slow, that sounds pretty neat, would def take a PR for it :) |
@mikefarah I am loving to take a look and do a PR -- before beginning, is there some architectural change reason that 2.4.x works quickly and 3.x is slow? |
Yeah so 2.4.x didn't handle anchors, it was because the underlying yaml parser did not expose them. In 3.x I updated the underlying parser which let me access the merge anchor information. Now when outputting to json - I needed to process the anchors into a regular object to I can encode to JSON correctly, this wasn't done in 2.4.x. In 3.x, it is done here: https://github.com/mikefarah/yq/blob/master/cmd/utils.go#L184 before being passed to the encoder (https://github.com/mikefarah/yq/blob/master/pkg/yqlib/encoder.go#L61) |
Further context, the way explodeAnchors works, (it's quite naive :)) is to deeply recurse all matching nodes and merge all the leaf nodes into a new empty data structure |
+1: A yaml input file of mine is 827 lines long and makes heavy, multi-level use of anchors and aliases; exploding it into a 10294-line output file takes 10-12 minutes. In some sitations, I can speed up development by exploding it into a json file and loading that using I do understand that anchor / alias support was implemented in a naive way in order to support them at all - and @mikefarah: I do sincerely appreciate the great work 👍 - anchors and aliases enable me to do fantastic stuff with yaml - but whatever makes this faster would bring a significant benefit to my work. BTW: I know nothing about Go, but from I've heard, it's really good at parallelizing tasks ("co-routines" and what not ?!?); looking at @PaulCharlton: Is there anything I can do to support you with the PR you mention ? I'm able and more than happy to test dev versions and report benchmarks - somewhat self-serving, would get me a chance to get GOing (jeez, my puns have seen better days...) |
Update: My yaml input file keeps growing (e.g. 958 lines, including comments) and making smarter re-use of multi-level sections used at multiple locations (e.g. 63972 lines in the json output file). However, the time yq requires to explode the yaml file and/or convert it to json is getting prohibitive; from my log output: In order to keep making progress in my project, I will have to stop using yq for the time being and work around this issue by converting the yaml input file to a json output file up front; takes <1s with a fairly simple Python script. |
Alright I think I have a good way forward with this - working on it, should be available within a few days if I get enough time. Does anyone have a sample large doc to test it with? I've made my own but not sure how representative it is... |
I re-ran the file I mentioned earlier, takes less than a second. That's one heck of a speed-up! |
Fantastic, glad to hear that :) Thanks for the feedback |
Describe the bug
When converting a 5000 line YAML to JSON using
yq
:It takes 2 minutes:
Similarly, a 500 line file takes around 5s, which is in itself super slow.
Input Yaml
5000 lines of YAML, not pasting that here.
Command
The command you ran:
Actual behavior
Super slow
Expected behavior
It should take under a second. That's what a simple Python program does it in.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: