-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot discovery and reading takes quadratic time #539
Comments
It definitely shouldn't be quadratic. I'll take a look this week. I suspect there's a few easy optimizations we can do. We also have benchmarking setup in the repo. |
This was a really good catch. As I mentioned above, there are a number of optimizations that can be made. I'll leave this GitHub issue as I merge in performance improvements over the next couple weeks. The first improvement is a change to only discover snapshots once per test file rather than once per assertion. This should have significant impact on your use case, especially if using parameterization. See https://github.com/tophat/syrupy/releases/tag/v1.4.2 I'll have to think about some options around hoisting the read/write snapshot logic out of the assertions, and into the session. Perhaps we batch the writes, and memoize the reads? There might be a memory concern for sufficiently large snapshots, however I think we can optimize for performance here. |
Thanks for the quick improvement! It leads to a 33% improvement in the test, and a 25% improvement in our real test suite (68s -> 51s), definitely a useful start. However, I'm not entirely sure it's working as intended: I installed the new version:
I reran the test in the issue, and the time, and the number of calls to
Assuming the extra calls in 1 can be eliminated (leaving 1 or 2
That seems reasonable to me. I wonder if throwing In terms of memory concerns, I think the memory impact should be reasonably controlled, as the cached memory would generally be proportional to the file size, which would generally be relatively small. For instance, in our real world test suite, we have 7.5MB total Also, my feeling is that read performance is far more important as a focus than write performance for now, as I'd expect read-only tests will happen far more often than write ones (e.g. CI is read-only, and most tests even in a |
I didn't notice any impact from caching the read calls in the ambr serializer. If anything, performance was slightly worse from the additional cache overhead. Ended up repeating the logic around caching "discovery" across "report" and "session". When I tried to memoize the discover_snapshot method, I didn't notice any improvements, which leads me to believe the time spent is actually in the looping + dict merging. With my latest change, cProfile reports 2 discover_snapshot calls rather than 1001. https://github.com/tophat/syrupy/releases/tag/v1.4.3 It should at least be linear now |
That's awesome, thank you! 👏 1.4.3 takes our test suite to 35s (-49% overall), way better! Here's the updated table; it's much faster, but still quadratic:
I experimented a little bit and found that putting into the |
If we're saying the read call is the issue then there's a linear relationship between the number of reads and the number of assertions. In fact there's one read per assertion. I'm not sure where the quadratic relationship is coming from. Have you tried with substantially more test cases? With and without syrupy? Also feel free to join our discord: https://discord.gg/kZYy8agD |
Yeah, unfortunately each file is generally O(number of assertions) size, because it stores info for each assertion, so when reads the whole file again it's doing O(number of assertions) work (since it has to at least touch every byte in the file to find the names). That is, it's O(number of assertions) assertions doing O(number of assertions) parsing work, leading to O(number of assertions**2) quadratic behaviour for each file. The simple test in the issue is an extreme example, with up to 2000 assertions in a single file, but it has an impact even for our real-world ~450 snapshots spread across 25 files.
Here's the test with
(syrupy 1.4.3 seems to be approximately 52 × the time for
Sorry, I'd prefer not to do so for now, but thanks for the invitation! 😄 |
Based on your metrics, it seems performance is under control now, or at a minimum it's no longer quadratic, so I'll close this issue. If you have other ideas/requests, we're always open to contributors. |
Sounds good. I think writing snapshots is still quadratic (in the number of written snapshots), but that's not something we're struggling with, compared to the reading performance as we've resolved here 🎉 Thanks for being so responsive! |
Describe the bug
We're using syrupy, and it works well. Thank you!
Unfortunately we have nearly 500 snapshots, and our tests runs are starting to get quite slow. It seems like syrupy unfortunately makes the testing take quadratic time with respect to the number of snapshots.
To reproduce
Create this file
Run, for instance:
The times reported by pytest scale scales quadratically with the number of tests/snapshots (O(size**2)). I think this is because the number of
read_file
/_read_snapshot_fossil
calls anddiscover_snapshots
calls, as reported bypython -m cProfile -m py.test test_performance.py
, scales linearly (O(size)) with the number of tests/snapshots, and the work required for each call also scales linearly (because the files contain O(size) data, for the snapshots).The times and number of calls (this is just for the invocation that's just checking snapshot) is something like:
discover_snapshots
callsread_file
callsThings of note:
Expected behavior
The test runs should be linear in the number of tests/snapshots. For instance, if the
assert x == x
line is used (and thesnapshot
fixture removed) instead ofassert x == snapshot
, the test run is linear: evenSIZE=10000
finishes in < 4s on my machine.It seems like this could be handled by discovering the snapshots once (or once per file) and reading each snapshot file once too.
Screenshots
Environment (please complete the following information):
Additional context
The text was updated successfully, but these errors were encountered: