-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hash_info: don't use dataclass #56
Conversation
How much perf improvements do you see with this? |
Codecov Report
@@ Coverage Diff @@
## main #56 +/- ##
==========================================
+ Coverage 34.24% 34.28% +0.03%
==========================================
Files 33 33
Lines 1673 1677 +4
Branches 261 262 +1
==========================================
+ Hits 573 575 +2
- Misses 1089 1090 +1
- Partials 11 12 +1
Continue to review full report at Codecov.
|
We are creating lots of these instances and dataclass is significantly slower.
@skshetry Great question! Added to the PR desc. |
Overall we are not heavy users of dataclasses in this application (and a few more places like |
Same as in iterative#56
We are creating lots of these, so it will help to save up on memory and improve performance. Similar to iterative/dvc-data#56
We are creating lots of these, so it will help to save up on memory and improve performance. Similar to iterative/dvc-data#56
I just wish we had not done this kind of change without looking into other alternatives (eg: |
We are creating lots of these instances and dataclass is significantly slower.
For example, running:
before this PR takes ~6.1sec and after ~4.9sec.
Most of the improvement is actually coming from using
__slots__
, which were finally supported by dataclasses in 3.10.Another (temporary) downside to using dataclass is that it is kinda buggy with other tools. E.g.
mypyc
is failing to compile it throwing an error like mypyc/mypyc#921 for theHashInfo.name
.