-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy-create an empty annotations dict in all unannotated user classes and modules #88067
Comments
The behavior of the __annotations__ member is wildly different between the three objects (function, class, module) that support it. Many of these differences are minor, but one in particular has been an awful wart for years: classes can inherit __annotations__ from their base classes. This is incontestably a misfire of the original design--it is never useful, it is only confusing, and libraries who examine annotations have had to work around it for years. I started a thread in January 2021 in which I brought up this and other inconsistencies with respect to __annotations_:
The reaction was positive: yes, that's a genuine problem, and there's an easy, highly compatible fix that everybody liked: classes should always have an __annotations__ dict set. So that the behavior is consistent, modules should always have one set too. This won't be a large change. However, there are a lot of changes flying around with respect to __annotations__ right now. My plan is to get the work ready, then when the dust is mostly-settled I'll get it updated, reviewed, and merged. Definitely before Python 3.10b1. Long term, I'd like to get the other changes to annotations I proposed in that thread:
|
Huh. The sample code in my thread got goofed up somewhere along the way--maybe it's my fault, maybe it was done to me by some automated process. Anyway, the example demonstrating classes inheriting annotations was meant to be formatted like this: class A:
ax:int=3
class B(A):
pass
print(getattr(B, '__annotations__', {})) Maybe some over-smart doodad decided that no paragraph would ever start with spaces, so they helpfully removed them? Hmm. |
Preliminary patch is 17 lines. Ah well. I must be terrible at this! |
Sounds like a plan, I agree that the original design misfired here (we were probably worried about million-line code bases with tens of thousands of classes having to pay the price of tens of thousands of empty dicts, but I now think that was an unnecessary worry -- that should be a few megabytes on a very much larger total). But given that you're not done yet:
|
Maybe for modules, definitely not for classes. The problem is that best practice for classes is to look in the class dict for __annotations__. That sidesteps the inheritance problem. Functions already lazily create a dict __annotations__ if not set. And it's not stored in the function dict. So nobody ever looks in the function dict for __annotations__. That all works fine. Would you prefer that *just classes* lazily create __annotations__? It's not hard code to write, but I was being cautious. "always set __annotations__ to an empty dict" is the easiest code to write and get correct. And since modules are not all that numerous even in the largest projects, it seemed like this un-optimized approach would have a minimal contribution to the heat death of the universe. It's also remotely possible that someone out there does look in the module dict for __annotations__, though I admit I've never seen it. It seems like it'd be tempting to write your code that way though: if isinstance(o, (type, types.ModuleType)):
ann = o.__dict__.get("__annotations__", None)
elif callable(o):
ann = o.__annotations__
else:
raise ValueError(f"{o!r} doesn't support annotations")
Simply because it's the cheapest sensible way to indicate "this object has no annotations". It would be nice if the world accepted __annotations__ being None to mean "no annotations". But I don't think we're there. I note that the function object has a special setter for __annotations__ (func_set_annotations()), since at least Python 3.1, which explicitly only allows setting __annotations__ to either a dict or None. (I didn't have 3.0 handy.)
Me? I've never contributed code to Python that restricts the permissible types of values one is allowed to set on __annotations__. Function objects are opinionated as mentioned (either dict or None), classes and modules have no opinion whatsoever, and allow you to set __annotations__ to any value. That was all true long before I started working on annotations.
AFAIK I've never seen anyone set __annotations__ to None.
Not done yet. I only started it last night, and there were a lot of test failures--it turns out, a *lot* of regression tests do a sanity check that __annotations__ isn't set on classes (and maybe modules too). For example, I was surprised that test_opcodes has two such test failures. |
By the way, here's a tidbit I never got around to posting in c.l.p-d. I noted in the conversation in January that attrs is an outlier here: it *doesn't* look in the class dict for __annotations__. Instead, it has some complicated code where it asks the class for its annotations, then iterates over the __mro__ and asks every base class for *its* annotations. If the class and a base class have the same class dict (using the "is" operator iirc) then attrs says "oh, that class doesn't have its own annotations, it's inheriting them" and reacts appropriately. I emailed Hynek to ask him why he did it that way. It turns out, he did that because at the time he didn't know you could peek in the class dict for __annotations__. He literally had a todo item saying "change to looking in the class dict". |
It occurs to me that part of this work should also be a new "best practices for __annotations__" entry in the Python docs. Best practices for working with annotations, for code that requires a minimum Python version of 3.10+: Best practice is to call either inspect.get_annotations() or typing.get_type_hints() to access annotations. But if you want to access '__annotations__' on an object directly:
|
Let me revise that to:
|
I’d say that best practices for 3.9+ are more useful. There seem to be several contradictions in your remarks, but we’ll get to those in the review. |
My point in writing this up was that the best practices change as of 3.10. So, I could add a section to the Python 3.10 documentation giving best practices for 3.10+ and 3.9-. But 3.9 and 3.10 have different best practices. |
And that’s a problem. There needs to be a recommendation on what to do for code that spans 3.9 and 3.10. What should users do otherwise? Drop 3.9 as soon as they introduce 3.10 support? Withhold 3.10 support until 3.9 reaches EOL? IOW you can’t just break backward compatibility. (That’s why PEP-563 was rolled back in the first place.) |
I'm not breaking backwards compatibility--that's the point of all this. But I'm improving the experience. And if you don't care about 3.9 and before, you can stick to the new improved experience. Looking in the class dict for annotations is terrible, but that's best practice for 3.9 and earlier because of the flawed design of annotations. For 3.10+, best practice to look at annotations is inspect.get_annotations() for all objects. But this isn't best practice for 3.9, because the function didn't exist in 3.9. Also, the paragraph might go on to say, if you have a good reason why you can't call inspect.get_annotations(), best practice no longer requires you to look in the class dict, because of this very issue/PR that is behavior we are changing for 3.10. So, best practices changed in 3.10. And I plan to add a section to the docs somewhere that says "Here are best practices for accessing annotations in 3.10+". As I said, if you think the Python 3.10 docs should have a section about "here are best practices in 3.9", I can add that too, but it'll be a different paragraph. |
It's find to have advice of "do X in 3.9", and "do Y in 3.10+". I think the issue is: if you have code that needs to run in 3.9 and 3.10+, what should you do? There needs to be some advice for that case. |
"It's fine to have advice ..." |
I'm please you folks are as supportive as you are of what I'm doing here, given that you seem a little unsure of the details. I concede that there's a lot going on and it can be hard to keep it all in your head. The point of this issue / PR is to improve the best practices for 3.10 without breaking the best practices for 3.9 and before. Best practice in 3.10 is to use inspect.get_annotations(), and failing that, three-argument getattr(), for all annotated objects. This issue / PR permits that to happen. Best practices for 3.9 differed between different objects. For classes, you had no choice but to look in the class dict because of the inheritance problem. (Either that, or, the elaborate scheme 'attrs' used.) That code will emphatically _still work_ in 3.10. We are not breaking backwards compatibility. So, the best practices in 3.9 and before will still work in 3.10. But the best practices for 3.10+ are improved because of the work we're doing here and in other places. My intent was to document the best practices for 3.10+. If folks think it would be helpful, this same section in the 3.10 docs can also describe best practices for 3.9 and before. |
Aaaaand I just had a realization. Lazy creation of an empty annotations dict, for both classes and modules, will work fine. As stated in my previous comment in this issue, my goal here is to improve best practices in 3.10+, while preserving the unfortunate best practices of 3.9 and before. I don't know why I couldn't see it before, but: adding a getset to modules and user classes, and lazily creating the empty annotations dict if not set, ought to work fine. The getset code will have to store the annotations dict in the class / module dict but that's not a big deal. If you're in 3.10+, and your code looks in the class dict for annotations, and we lazy-create it with a getset and store it in the class dict, then you still see what you expected to see.
In any scenario the old best practices code works fine. If you're in 3.10+, and your code uses cls.__annotations__ or getattr() to get the class dict, and we lazy-create it with a getset and store it in the class dict, then you also get what you expected. If the class or module has annotations, you get them; if it doesn't, it lazy creates an empty dict and stores and returns that. Your code works fine too. Sorry I've been slow on figuring this out--but at least I got there in the end, before beta. And it's good news! |
Hmm. Sorry for the stream-of-consciousness thought process here, but this approach adds wrinkles too. Function objects from the very beginning have lazy-created their annotations dict if it's not set. Which means this works fine: while True:
del fn.__annotations__
print(fn.__annotations__) You can do that all day. The function object will *always* create a new annotations object if it doesn't have one. del fn.__annotations__ always works, and accessing fn.__annotations__ always succeeds. It doesn't retain any state of "I already lazily created one, I shouldn't create another". If I add getsets to classes and modules so they lazy-create annotations on demand if they otherwise aren't set, then either a) they need an extra bit set somewhere that says "I lazily created an empty annotations dict once, I shouldn't do it again", or b) they will duplicate this behavior that functions display. a) would better emulate existing semantics; b) would definitely be a user-visible breaking change. There's actually a test in the Lib/test/test_opcodes (iirc) that explicitly tests deleting __annotations__, then checks that modifying the annotations dict throws an exception. I haven't done any investigating to see if this check was the result of a regression, and there was a bpo issue, and there was a valid use case, etc etc etc. My instinct is that deleting o.__annotations__ is not an important or widely-used use case. In fact I plan to recommend against it in my "best practices" documentation. So either approach should be acceptable. Which means I should go with the simpler one, the one that will duplicate the function object's always-recreate-annotations-dicts behavior. |
Functions don't store __annotations__ in their __dict__, it is a separate slot named func_annotations (see funcobject.c). I guess that's because the __dict__ is purely for user-defined function attributes. But perhaps for classes the C equivalent of this pseudo-code will work? @property
def __annotations__(self):
if "__annotations__" not in self.__dict__:
self.__dict__["__annotations__"] = {}
return self.__dict__["__annotations__"] The whole thing is protected by the GIL of course, so there's no race condition between the check and the assignment. So if you look in __dict__ it will be like it's still Python 3.9, but if you're using the attribute (the recommended approach for code that only cares about 3.10) it'll be as if it always existed. Sounds pretty compatible to me. |
So, honestly I don't understand what your concern with the lazy approach is. Was your design based on having a bit in the class/module object (outside its __dict__) saying "I already lazily created one"? Or am I missing something? Also, I'll stop going on about "best practice". |
I brought up functions because I'm now proposing to make classes and modules behave more like functions in this one way, that they too will now lazy-create an empty annotations dict every time. But yes, function objects don't store __annotations__ in their __dict__, and best practices for 3.9 and before was to use fn.__annotations__ or getattr() when getting the annotations from a function object, and 3.10 will not change that best practice. I don't know specifically why the implementor made that design choice, but I might have done that too. Most function objects don't have a __dict__ so this is almost always cheaper overall. And recreating the dict seems harmless. (Just to round the bases on this topic: I don't think you should be permitted to delete __annotations__, like most other metadata items on functions / classes / modules. But I don't propose to change that for 3.10.)
Yes, exactly. That was the thing I finally figured out this afternoon. Sorry for being a slow learner. Again, this approach will change the semantics around deleting annotations on class and module objects. Deleting them won't be permanent--if you delete one, then ask for it, a fresh one will be created. But that seems harmless.
My concern is that always lazy-creating on demand will change user-visible behavior. Consider this code: class C:
a:int=3
In 3.9, that throws an AttributeError, because C no longer has an '__annotations__' attribute. If I change Python 3.10 so that classes and modules *always* lazy-create __annotations__ if they don't have them, then this code will succeed and print an empty dict. That's a user-visible change, and I was hoping to avoid those entirely. Is it a breaking change? I doubt it. Is it an important change? It doesn't seem like it. I bring it up just in the interests of considering every angle. But I don't think this is important at all, and I think always lazy-creating annotations dicts on classes and modules is the right approach. |
Thanks for your feedback and reviews, everybody! Python just got an eensy teensy tiny bit better. |
Unfortunately commit 2f2b698 has broken the buildbots: ====================================================================== Traceback (most recent call last):
File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\test\test_module.py", line 338, in test_annotations_are_created_correctly
self.assertTrue("__annotations__" in ann_module4.__dict__)
AssertionError: False is not true Example failure: https://buildbot.python.org/all/#/builders/511/builds/12/steps/4/logs/stdio |
I think that test_annotations_are_created_correctly should use the import_helper.import_fresh_module helper |
Gee whiz! Sorry about that. I swear, it works fine on my machine. I'll incorporate import_helper.import_fresh_module helper into the test as you suggest, and once I get it to work I'll send you a PR. I don't know how to test fixing this failure, though, because again it already works fine on my machine. Why would it fail on Windows? Are there dramatic differences under the covers with respect to the import machinery and caching and such between POSIX and Windows? |
Oh, actually is workse because commenting out the test shows refleaks: 𓋹 ./python.exe -m test test_module -R : == Tests result: FAILURE == 1 test failed: Total duration: 2.0 sec |
It fails on all refleak buildbots. To reproduce: % ./configure --with-pydebug -C && make -j -s |
Eek! I swear I did a refleak check and got a clean bill of health. Again, sorry about this! |
No problem at all! I'm sure we can fix this on time :) Opened #25754 for the refleaks |
You want a separate PR for the refleak fix, or should I roll them both up into one? |
I made a fix for everything in #25754. Could you review it? |
Obviously the original assertion failure ("AssertionError: False is not true") wasn't caused by the refleaks. So I'm still puzzled about why that test worked on POSIX and failed on Windows. I admit I was pulling some wacky import shenanigans in the test, and the import_fresh_module change is an improvement. But still... what was going on?! |
I was able to reproduce it in my MacOS machine (maybe also it reproduced on Linux). The problem is that when you run with -R, the test runs several times and the import statement is a noop the second time because is cached. We needed to modify the test to import the module always |
Ah, I see. So it wasn't a Windows thing, it was a "we run the test multiple times and that particular test assumed it was only run once" thing. And reflink testing is guaranteed to run every test multiple times. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: