Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nicer method call syntax using keyword arguments #31

Merged
merged 13 commits into from
Apr 18, 2017

Conversation

dgelessus
Copy link
Collaborator

Allows calling methods using keyword arguments, like string.getCharacters(buffer, range=NSRange(0, 5)). This implements most of #26.

As a side effect, ObjCClass now gets the entire array of method pointers on creation, instead of looking them up on demand. (However, ObjCMethod objects are not created for every method - doing so causes lots of "unknown restype" errors for methods that aren't used.) This appears to have no noticeable effect on performance (the unit tests take roughly 200ms for me before and after the change).

Other changes that were needed to implement this:

  • The custom c_void_p subclasses (SEL, objc_id, Class, etc.) are now exported in rubicon.objc's __init__.py.
  • ObjCClasses now have a superclass attribute.
  • ObjCClass.__setattr__ no longer tries to look for an Objective-C setter if the attribute exists in __dict__. This allows setting attributes on an ObjCClass without manually accessing the __dict__.
  • Better handling of null selectors, and simple unit tests for SEL. (This is not strictly necessary, but it helped while tracking down a segfault.)
  • Minor simplification and clarification of how register works when creating new Objective-C subclasses. (This was also a result of me tracking down the segfault.)

rest = frozenset()
else:
args = [first_arg]
# Add "" to rest to indicate that the method takes arguments
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see below it's being decoded for selectors.

Is this convention more general?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean, I don't see any decoding here... If you're talking about the "add empty string to rest" comment, that is specific to how I implemented the "partial method" dict, and not a general convention.

@@ -3,6 +3,7 @@
from .objc import (
objc, send_message, send_super,
get_selector,
SEL, objc_id, Class, IMP, Method, Ivar, objc_property_t,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMP unused?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, IMP is not used much. It's mainly used in some of the function declarations (such as class_addMethod or method_getImplementation) as a placeholder type for any function pointer. Because of this, IMP needs to be publicly available, to allow others to call the functions that have an IMP in their signature.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess I misunderstood... It was confusing enough that I looked elsewhere in the changeset and somehow decided that # Selectors end in a colon is where same convention was ultimately used.

To be honest I'm still not sure whether I got it right or wrong.

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As always, this looks like a really solid implementation - great work. My only hesitation on merging this stems from the general approach to handling argument ordering, and the impact this will have on performance.

The current approach to method invocation is entirely on-demand - the first time you call selector X, it does a lookup to see if the method exists; if it does, the ObjCMethod wrapper is constructed. This means it doesn't matter if you have a deep or heavily overloaded method list - you only pay a processing cost for the methods you actually use.

If I'm reading this right, the approach you've taken here is to pre-cache the entire method list for any class that is loaded. For simple classes like the test example (where every method is being exercised anyway), that doesn't really matter; but in practice, some of the deeper classes in the Cocoa hierarchy are going to have very deep method structures, and lots of methods you never invoke in practice - but the act of just registering one will attract a CPU cost (and presumably a memory cost as well, for caching all the methods and all the variants on the object).

I understand that something needs to be done to work around the problem of not being able to rely on kwargs order (at least until we can rely on 3.6 as our minimum version, which is some years away).

Would it be possible to take the middle ground here? That is -rather than pre-emptively caching the entire class, just cache the methods the are actually invoked?

@dgelessus
Copy link
Collaborator Author

Right, performance is definitely an issue when many classes are loaded. As a "stress test" I tried loading every class that starts with NS (that is, everything from Foundation and AppKit):

import ctypes.util
import timeit

from rubicon import objc

c = ctypes.CDLL(ctypes.util.find_library("c"))
Foundation = ctypes.CDLL(ctypes.util.find_library("Foundation"))
AppKit = ctypes.CDLL(ctypes.util.find_library("AppKit"))

c.free.restype = None
c.free.argtypes = [ctypes.c_void_p]

objc.objc.objc_copyClassList.restype = ctypes.POINTER(objc.Class)
objc.objc.objc_copyClassList.argtypes = [ctypes.POINTER(ctypes.c_uint)]

out_count = ctypes.c_uint(0)
classes_ptr = objc.objc.objc_copyClassList(ctypes.byref(out_count))

names = []
try:
    for i in range(out_count.value):
        name = objc.objc.class_getName(classes_ptr[i])
        if name.startswith(b"NS"):
            names.append(name.decode("utf-8"))
finally:
    c.free(classes_ptr)

def our_main():
    for name in names:
        print("Loading", name)
        objc.ObjCClass(name)

if __name__ == "__main__":
    time = timeit.timeit("__main__.our_main()", setup="import __main__", number=1)
    print(f"{time:.3f} sec")

Before the changes, this ran in about 90 ms, now it runs in about 900 ms. That means it's now ten times slower, which is not great.

The original reason why I cached the method list on class creation was not related to the argument order. The issue I ran into was that ObjCInstance.__getattr__ needs to know when an attribute does or doesn't exist. With normal method calls this is easy - the attribute name is the full method name, so you can look up the method with the Objective-C runtime and see if it exists.

With "partial methods" this is not possible, because at the time of __getattr__ you only know the start of the method name. There's no way to ask Objective-C "are there any methods starting with this name", so you have to assume that there is one, and return a partial method for every attribute that you can't resolve otherwise. The issue with that is that you then don't have AttributeErrors anymore. If you write url.scheem instead of url.scheme by accident, you get a partial method, because at that point it's not known if there are any methods that start with scheem. This breaks a test, doesn't play well with duck typing (try-except AttributeError or hasattr), and is not very intuitive IMHO. That's why I went with the upfront caching approach.

As for the performance, I already tried to do as little work as possible when caching the method names. The method list is only cached once when the class is first loaded, and not when it's loaded from the object cache. The methods are only stored as C Method pointers, not as full ObjCMethod objects, and the ObjCPartialMethod is only created once the method is actually called. If memory is an issue, I could also remove the method pointers from the cache. Then it would only contain the method names, which can then be used to look up the desired method when it's called.

@freakboy3742
Copy link
Member

Thanks for the explanation. I figured the limitation would be the absence of a "get all methods starting with ..." method - I've had similar limitations bridging into Java.

And yes - while a 10x slowdown isn't ideal, it's also a lot better than I would have expected. Out of interest - how many classes are we talking about? If we're talking about all of Foundation and AppKit, 'm guessing it's quite a few....

I'm wondering if we could have the best of both worlds: Would it be possible to fully instantiate the method list, but only when the user attempts to access a method? i.e., defer populating the method list until you actually use it. That way, your "load everything" test should continue to run in close to 90ms; which means there won't be as much startup lag; you only pay the loading overhead for the classes you actually use, so there won't be a major performance penalty for just importing a class that isn't actually used; and any lag that does exist will be distributed over a longer period of time, as you only pay the startup cost when you start using a feature (for example, NSURL wouldn't be instantiated at startup, but when you first hit a JSON API).

@dgelessus
Copy link
Collaborator Author

It should be possible to postpone getting the method list until a method is called. I'm not sure how much of a performance difference that would make though. Because any non-Python attribute could be a method, the method list would be loaded as soon as any non-Python attribute is accessed, even if turns out that the attribute is not a method. I'll give it a try though and see how it works out.

@dgelessus
Copy link
Collaborator Author

This took a bit longer than expected, sorry about that. The change itself was quick to do, but I had some uncommitted work in a different branch that I needed to finish up first.

I've modified my test script a little, it now also includes the implementation-private NS classes that have underscores at the start of the name. On my system (OS X 10.11.6), there are 2579 NS classes in Foundation and AppKit, with 59558 methods in total. Loading all of them with the old version (method caching in ObjCClass.__new__) takes about 1.5 seconds.

With the new version (method caching in ObjCInstance.__getattr__), the test script only takes about 0.15 seconds. (Of course this isn't a fair comparison, because it only loads the classes and never uses any methods or properties.) The test suite runs in basically the same amount of time as before. I can't think of any case where the new version would be slower than the old one, and it definitely helps with classes that are loaded but not really used. I'd say that counts as a good idea, thanks for the suggestion. :)

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much to nitpick here - this is solid work. :shipit:

@freakboy3742 freakboy3742 merged commit d4a6697 into beeware:master Apr 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants