-
Notifications
You must be signed in to change notification settings - Fork 41
More efficient resolving #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More efficient resolving #16
Conversation
1 similar comment
You can also move |
Also, it may be worth to optionally cythonize jsonpointer for greater speedup. |
Also means that isinstance(..., Sequence) doesn't get called twice for non-list sequences (once in walk, and one in get_part)
The latest commit gives me the following numbers: 1.25914096832 … but adds a bit more duplication. It also makes |
Ah, the coverage decrease will because the tests no longer cover |
@alexsdutton my idea was just about to replace this:
with:
Inlining function calls may indeed change the timings because stack frames aren't cheap, but it there should be a balance between performance optimizations and code quality. Again, Cythonization will give you much better numbers without need to play with micro optimizations. |
@kxepal that gives:
I can play with Cython to see how that fares, but presumably that would lead to two codebases (i.e. you'd still need the pure Python implementation for non-CPython environments)? |
@alexsdutton no, it doesn't leads to have two codebases. You can make Cython module as optional and fallback to pure Python one in case if it's hard (hello, windows) or not makes a sense to build (hello, pypy). |
About the numbers, yes, there is no much gain, but this have to be done if you're going to optimize code for performance (: |
The reason I'm trying to push it as far as it can go is that I've got code that's doing 5.8m re codebases, you'd have the Cython implementation, and the Python one, which is two. Have I missed something? (I presume the usual approach is to do a "try: import _jsonpointer" to get the Cython implementation, and if that fails, use the pure Python implementation) |
@alexsdutton sure, things could be done iterative. Your improvements are significant enough boosts the performance. As about Cython, take a look on https://github.com/KeepSafe/aiohttp - it has multidict module implemented both as in Python and Cython in the same time. |
Well that was easy. Pulled out just the
Though I have broken the tests. (Yes, all a bit hacky at this hour) |
@alexsdutton w00t! looks promising, isn't it? (; |
Worth trying to get the attention of @stefankoegl at this point? |
(and yes, certainly promising 😁) |
ptype = type(doc) | ||
if ptype == dict: | ||
pass | ||
elif ptype == list or isinstance(doc, Sequence): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't ptype == list
imply isinstance(doc, Sequence)
It should then be possible to reduce this to
elif isinstance(doc, Sequence):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I miss this accidentally too. Normally, this should be used is
operator since types are some sort of singletons and there is no need to invoke regular comparison routines for them.
The idea with check for list before call isinstance is an optimization for general case where doc is mostly dict or list. So the idea is to use cheaper operations for common cases and fallback to more wide, but costly, type checks: isinstance(foo, Sequence)
is 100 times slower when direct type comparison because of function call and unwind all the registered types to those abstract Sequence one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so in that case the is
operator should be used, and it should have some comment explaining that this is a performance optimization.
Sorry for the late reply! I was away from computers for a few weeks. Yes, this certainly looks promising! I've added one inline comment. It might be just an optimization, but if so, it should be commented (otherwise it might not be noticed, and be removed in the future). |
I was profiling some of my code which used
jsonpointer
, and I noticed that resolving pointers could be made faster.My profiling script is at https://gist.github.com/alexsdutton/bb95e47a381d0e250fd3/988feaa0e8968893287df01c87eb233c9b86a010. Running on af04ec0, we get the following times on my laptop (Macbook Air, Fedora, Python 2.7.8):
After each commit we get, respectively:
So, about a 2x speed-up on the original code.