-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow parsing performance #205
Comments
Honestly I've never really stress tested Zod with payloads that large. I suspect there is lots of low-hanging fruit that would fastly improve performance. I'm a little confused though. Zod is taking 500+ms to parse each object? How big is each object? Is it possible you're hitting memory limits of the Nodejs process? Without a sample data set it's hard for me to duplicate this. Are you able to share? |
Thanks for taking the time to think about this. I was also very confused by the slowness I was seeing. I first managed to narrow it down to the I usually work with go and was ready for performance to be lower but as I don't have experience of what is considered "normal" I decided to ask. I'm going to try to prepare a dataset and extract a code example that allows to reproduce. |
Yeah all of Zod's complex logic is in |
@mgilbir Here is a github that tracks benchmarks for different validation libraries: https://github.com/moltar/typescript-runtime-type-benchmarks if performance is an issue. |
@davidmdm Thanks for pointing to those benchmarks! I took a look with the profiler and found that makeError was allocating, and doing a lot of work upfront, on every call to the parse function even when no error occurred. I'm new to typescript, so I'm not sure this is the best way to solve it, but it may point you towards some low-hanging fruit. Code is available in PR #218 |
My team was using zod to validate JSON messages coming from IoT devices via a queue. We received hundreds of messages per minute (so not exactly high volume). We liked the idea of zod, but we have to move away from it until we can get a decent performance. :-( |
I saw this test results and it actually came after I had already looked into a issue with slow code where I discovered that zod was taking up 600 ms to parse 1000 json elements. moltar/typescript-runtime-type-benchmarks Zod is one of the slowest in the overview. Normally speed is not a problem since I use zod on user input, but if you run a busy api endpoint this could be something important. |
@dgg I maintain myzod, which is basically just a lightweight reimplementation of zod. It doesn't have a big community like zod does, but it is much much faster, and has an extremely similar api. |
I am here because I am actually punched by the slow performance of zod. I use zod to validate http api response. It takes seconds to parse 2000+ rows of data, sad :( Interesting things are there are so many zod-like libraries here with tens of stars just because of the performance. If zod could improve the performance of itself, the world would be cleaner since zod is absolutely with the most features and supports. |
This is forcing me to abandon |
I have parsed 10k objects base on the schema provided by @mgilbir with version '3.0.0-alpha.29' and '3.1.0', the output is high improved by the team! @colinhacks the result is:
the issue has been fixed! |
@zhenhao18 What sort of results were you seeing before Unfortunately I can't really share my dataset and schemas at the moment because I'm using some real code and data to benchmark (if I have time, I'll try to generate some mock data and schemas to share), but these are my latest internal results:
We've switched to myzod for now, which is why it is included here. And as you can see it's significantly faster (and the schemas used in each case are virtually identical - nothing has been slimmed down or removed). But myzod is not without issues and concerns in itself. Also interesting to note how much performance has decreased in zod v3 from v1. Edit: I wanted to add a note that these results are from parsing in NodeJS, not in a browser. So they really can only be used as a relative measure to each other and would probably vary significantly in other environments. For example, a subset of this data in a browser was taking upwards of 10 seconds in some cases for us. |
I still not do a test on the version before 3.0.0-alpha.29, will do that
today or tomorrow, and let you know the result.
myzod looks great!
…On Fri, 11 Jun 2021 at 06:35, Joe Krill ***@***.***> wrote:
@zhenhao18 <https://github.com/zhenhao18> What sort of results were you
seeing *before* 3.0.0-alpha.29 that we can use as a comparison? Because
I'm still seeing *very* slow parsing performance in *all* versions,
including the latest (3.1.0).
Unfortunately I can't really share my dataset and schemas at the moment
because I'm using some real code and data to benchmark (if I have time,
I'll try to generate some mock data and schemas to share), but these are my
latest internal results:
library parse time (ms)
zod 3.1.0 2200.53
zod 3.0.0 2102.35
zod 1.11.17 656.82
myzod 1.7.4 19.19
We've switched to myzod <https://github.com/davidmdm/myzod> for now,
which is why it is included here. And as you can see it's *significantly*
faster (and the schemas used in each case are virtually identical - nothing
has been slimmed down or removed). But myzod is not without issues and
concerns in itself. Also interesting to note how much performance has
*decreased* in zod v3 from v1.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#205 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AT6BEI543N6N2UE2E4W7NBDTSEOZPANCNFSM4TECTSOQ>
.
|
This morning I decided to check in on zod and see if there's been any changes (it's been a few weeks since last I checked), and WOW! HUGE improvements since v3.1! Here is the results of my latest benchmark on our internal dataset adding Zod 3.5.1:
So just a massive improvement. This is so awesome. Thank you so much to @milankinen, @colinhacks, and everyone who has been contributing - I really appreciate you all. This is really amazing and definitely makes zod a usable option for me again - I'll definitely be looking into switching back. Side note: I did run into some minor tweaks I had to make to my schema where I was using |
I really love this lib and anything typescript, but taking 3x to 4x to parse things compared to myzod. Performance should be a one of the highest priority as the ecosystem starting to build around zod. |
@mmahalwy Do you have an example of something that has gone up in time? I think that would help debug and maybe a benchmark suite could be added |
Recently migrated from ajv to zod then to myzod, so there is no baseline for me to compare between zod versions. Just compared between zod & myzod and zod overall making my task to take around 50% to 80% more time to complete. As the task itself has lot's mini validations individually it won't look much but they add up by the time task completes. I also suspect there could be some GC pressure too. |
I've implemented #1023, #1021, and #1022 to explore different vectors at making zod faster. I think updating the |
Incredible work and thanks so much for digging in! I'll spend some time tomorrow and this weekend trying to think through any implications that need to be addressed. The point about switching our target is an important one: we have mostly treated compatibility as a best-effort thing, but this would be a bit of a line in the sand. I am not involved in enough open source projects of this size to have a good feel for the general approach the community usually takes here, but I feel like that might warrant a major bump? |
@tmcw I’m not sure if I understand the current caching correctly. Are you implying that the cache is permanent? This would explain why parsing a 150 MiB JSON with a complex schema uses up over 4 GiB memory (not sure how much over, just that it’s less than 8 as I had to manually bump node’s memory limit) on one of my side-projects. Do you know if the current cache gets freed over time? Also, do you know how I could try out this change to see what the performance difference is with my real-world data? The schema is insanely complex as the data structure it has to parse is ancient and organically grew. There’s lots of Also, thanks a lot for starting to work on zod performance. I’m really looking forward to this as it currently takes >4 GiB of memory and around 2 minutes to parse the data. So, let me know if there’s a way to try this out. I originally commented this on the PR, but I think let’s keep the PR’s comments for the code itself Update 1:I managed to test-out the es2018 version (#1022), and this change alone reduces my real-world 150 MiB complex JSON schema parsing time from 100s to 67s. Nice 33% reduction, just from adjusting a single line. Update 2:Trying out the cache-related changes from #1023, I only see an improvement from 100s to 96s. Not bad, but also not life-changing haha Update 3:Both PRs combined via cherry-picking locally reduce the total time to 64s. 36% better performance is a very very nice start. I hope there’s still some more aspects that can be optimized, but I’d gladly take even a 0.5% improvement at this point 😄 Update 4:With both PRs combined, the minimum memory required to succeed is between 3.75-3.875 GiB. Down from the (now tested 4.4-4.5 GiB) which is a nice 15% reduction |
I've focused mostly on the primitive types because they've been pretty straightforward to tweak & get some little wins. Should probably take a break to work on my product 😆 but in the meantime, some thoughts:
Fwiw, if folks want to benchmark their things, what I've been relying on is the
And then dumping the results into speedscope. I ran the moltar benchmarks on the latest zod release and it's 3.6x faster than the previous benchmarked release (3.11.x). I've skimmed a few competing modules and I think the main remaining difference is in Zod's approach to objects and arrays: zod truly parses, doesn't validate, and it creates copies of input datastructures. Probably if that approach changed - or we optionally allowed for a non-copying implementation - we'd be able to close more of the gap. |
@tmcw that is extremely cool! |
I gotta say @tmcw, your work is seriously appreciated. I migrated away from Zod because its real-world performance was too slow but this focus on performance makes it viable again! |
Now that I looked the codebase again, it seems that the path is actually needed in the refinement context and that might be the reason why I left the implementation as it is... 😬 The |
@milankinen isn't path the only reasonable way to programmatically use zod errors? It'd be a pretty big deal if they're not machine-readable anymore |
Path is definitely documented in multiple places including the error handling guide. |
Yes, path indeed is an important property when processing the received parsing/validation issues and the issue contract should not be changed. 👍 But I'm not talking about issue paths. I'm talking about refinement context
If you run the snippet, you can see that the My question is: how important the read-only access to the |
I've implemented the suggested tweak to the default object parsing - not collecting extraKeys, in #1393. It's a very modest performance bump, seems like around 5% in the realworld testcase. Unfortunately I think that's what we're looking at with the cloning strategy - there's maybe another 10-20% optimization left, but systematically zod will be magnitudes slower than non-cloning modules. |
Hello, We at Twilio are considering using your library to validate messages sent over WebSocket. The API is probably the most fantastic thing I've seen over any validators/parses I've checked. The only question that remains is performance. Our WebSocket protocol requires to be as optimized as possible and Zod hasn't really been great in that regard. I've read this discussion about adding an option that disables cloning. Do you know whether that's feasible in near future and if that will have a significant impact on performance? |
@InfiniteRain Would you be willing to contribute to the performance improvement effort :)? |
@kibertoad If adding a no-clone option is something that the maintainers would want and if it brings significant performance improvements, then I would love to contribute. |
@colinhacks Would you accept a PR for no-clone option? |
Oh wow a toggle for no clone mode would be incredible! |
I'm definitely open to this! Slightly concerned about bundle-size bloat, and I'm not sure how things like transforms will work (I suppose they would just throw). But ultimately it's super worth it if perf is a blocker on adoption. @InfiniteRain don't hesistate to get in touch about any implementational questions - Twitter is probably best 👍👍 My first instinct is that this should be a separate method, probably const mySchema = z.object({ name: z.string() });
const data = { ... };
const isValid = mySchema.validate(data);
if(isValid){
data; // {name: string}
}else{
data; // unknown
} The downside is that a "safe" version of this method isn't really possible, because type predicates can't be nested inside objects. |
@colinhacks I'm very interested in working on this! I'm on vacation right now, but once I'm back I'd love to get in contact with you to discuss the details further. |
@colinhacks if |
Just to add an additional data point, since Slonik adopted Zod for all query validation, our CPU usage doubled. Avoiding cloning would dramatically improve things. |
Typeform.com is also considering using |
I did a very fast experimental refactoring last weekend based on the descriptions in my previous comments (diff here, very WIPWIP) and got approximately 50% performance improvement in "real world" and "object" suites and 100% in primitive suites. I'd be nice to compare this with the no-clone approach. 🙂 |
We are currently parsing API responses with Immutable.js Record and looking to move to zod. Anyone know how performance compares? |
@colinhacks Is this something that you might still consider for v4? |
For those following this thread, https://github.com/duplojs/duplojs-zod-accelerator has been a good solution for me. I'm noticing improvements of ~50x in parsing speed (similar to what is shown on their ReadMe Benchmark). I can't infer types from the accelerated schemas, so I've been writing a lot of code like this, but it has been worth it so far for the performance gains on my largest schemas.
|
I've just started using this library to check that parsed JSON objects conform to the types I define. I'm really enjoying it, but I'm afraid I've hit a performance snag.
Everything works fine with small tests but when I ran it on a slightly bigger input file (~10MB) I noticed that it was really slow.
I've tried the latest version and the beta, and
check
vsparse
, with similar results. The beta is faster, but I don't see a big difference betweencheck
andparse
.After profiling the
check
version in the beta, I'm seeing calls to.check
taking in the 500ms-1100ms range per object.Are those numbers typical, or am I doing something wrong with the schema definitions?
My schema definitions look like:
I'm really hoping that there is a way to speed it up, as this is too expensive for the use case where I'll have to process files with 100k+ objects.
Thanks!
The text was updated successfully, but these errors were encountered: