Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy-create an empty annotations dict in all unannotated user classes and modules #88067

Closed
larryhastings opened this issue Apr 21, 2021 · 39 comments
Assignees
Labels
3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker type-feature A feature request or enhancement

Comments

@larryhastings
Copy link
Contributor

BPO 43901
Nosy @gvanrossum, @larryhastings, @ericvsmith, @gvanrossum, @JelleZijlstra, @pablogsal, @erlend-aasland
PRs
  • bpo-43901: Lazy-create an empty annotations dict in all unannotated user classes and modules #25623
  • bpo-43901: Upgrade test to use cool module reloading tech. #25752
  • bpo-43901: Fix refleaks in test_module #25754
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/larryhastings'
    closed_at = <Date 2021-04-30.16:27:36.976>
    created_at = <Date 2021-04-21.06:44:01.810>
    labels = ['interpreter-core', 'type-feature', '3.10', 'release-blocker']
    title = 'Lazy-create an empty annotations dict in all unannotated user classes and modules'
    updated_at = <Date 2021-04-30.23:27:35.179>
    user = 'https://github.com/larryhastings'

    bugs.python.org fields:

    activity = <Date 2021-04-30.23:27:35.179>
    actor = 'erlendaasland'
    assignee = 'larry'
    closed = True
    closed_date = <Date 2021-04-30.16:27:36.976>
    closer = 'pablogsal'
    components = ['Interpreter Core']
    creation = <Date 2021-04-21.06:44:01.810>
    creator = 'larry'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 43901
    keywords = ['patch']
    message_count = 39.0
    messages = ['391490', '391491', '391495', '391542', '391555', '391556', '391558', '391562', '391572', '391762', '391781', '391802', '391809', '391810', '391812', '391815', '391817', '391819', '391820', '391824', '391825', '392157', '392162', '392373', '392374', '392449', '392450', '392452', '392453', '392454', '392455', '392457', '392458', '392461', '392469', '392471', '392476', '392516', '392542']
    nosy_count = 8.0
    nosy_names = ['gvanrossum', 'larry', 'eric.smith', 'Guido.van.Rossum', 'JelleZijlstra', 'pablogsal', 'erlendaasland', 'kokxxxxik']
    pr_nums = ['25623', '25752', '25754']
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue43901'
    versions = ['Python 3.10']

    @larryhastings
    Copy link
    Contributor Author

    The behavior of the __annotations__ member is wildly different between the three objects (function, class, module) that support it. Many of these differences are minor, but one in particular has been an awful wart for years: classes can inherit __annotations__ from their base classes. This is incontestably a misfire of the original design--it is never useful, it is only confusing, and libraries who examine annotations have had to work around it for years.

    I started a thread in January 2021 in which I brought up this and other inconsistencies with respect to __annotations_:

    https://mail.python.org/archives/list/python-dev@python.org/thread/AWKVI3NRCHKPIDPCJYGVLW4HBYTEOQYL/
    

    The reaction was positive: yes, that's a genuine problem, and there's an easy, highly compatible fix that everybody liked: classes should always have an __annotations__ dict set. So that the behavior is consistent, modules should always have one set too.

    This won't be a large change. However, there are a lot of changes flying around with respect to __annotations__ right now. My plan is to get the work ready, then when the dust is mostly-settled I'll get it updated, reviewed, and merged. Definitely before Python 3.10b1.

    Long term, I'd like to get the other changes to annotations I proposed in that thread:

    • del o.__annotations__ always raises an exception.
    • o.__annotations__ is permitted to be None.
    • You can only set o.__annotations__ to either a dict or None.
      But it's probably best to not change everything all at once.

    @larryhastings larryhastings added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.10 only security fixes type-feature A feature request or enhancement labels Apr 21, 2021
    @larryhastings larryhastings changed the title Add an empty annotations dict to all classes and modules Add an empty annotations dict to all unannotated classes and modules Apr 21, 2021
    @larryhastings larryhastings changed the title Add an empty annotations dict to all classes and modules Add an empty annotations dict to all unannotated classes and modules Apr 21, 2021
    @larryhastings
    Copy link
    Contributor Author

    Huh. The sample code in my thread got goofed up somewhere along the way--maybe it's my fault, maybe it was done to me by some automated process. Anyway, the example demonstrating classes inheriting annotations was meant to be formatted like this:

        class A:
            ax:int=3
        class B(A):
            pass
        print(getattr(B, '__annotations__', {}))

    Maybe some over-smart doodad decided that no paragraph would ever start with spaces, so they helpfully removed them? Hmm.

    @larryhastings
    Copy link
    Contributor Author

    Preliminary patch is 17 lines. Ah well. I must be terrible at this!

    @gvanrossum
    Copy link
    Member

    Sounds like a plan, I agree that the original design misfired here (we were probably worried about million-line code bases with tens of thousands of classes having to pay the price of tens of thousands of empty dicts, but I now think that was an unnecessary worry -- that should be a few megabytes on a very much larger total).

    But given that you're not done yet:

    • Is it possible to create __annotations__ lazily? (IIRC in January we came to a conclusion about this, something like yes for modules but for classes, or the other way around?)

    • Why would __annotations__ ever be None? Why do you allow setting it to None? Do we know of people ever write '__annotations__ = None' in their class or write 'cls.__annotations__ = None'?

    • And where's your PR?

    @larryhastings
    Copy link
    Contributor Author

    • Is it possible to create __annotations__ lazily? (IIRC in January we came to a conclusion about this, something like yes for modules but for classes, or the other way around?)

    Maybe for modules, definitely not for classes. The problem is that best practice for classes is to look in the class dict for __annotations__. That sidesteps the inheritance problem.

    Functions already lazily create a dict __annotations__ if not set. And it's not stored in the function dict. So nobody ever looks in the function dict for __annotations__. That all works fine.

    Would you prefer that *just classes* lazily create __annotations__? It's not hard code to write, but I was being cautious. "always set __annotations__ to an empty dict" is the easiest code to write and get correct. And since modules are not all that numerous even in the largest projects, it seemed like this un-optimized approach would have a minimal contribution to the heat death of the universe.

    It's also remotely possible that someone out there does look in the module dict for __annotations__, though I admit I've never seen it. It seems like it'd be tempting to write your code that way though:

        if isinstance(o, (type, types.ModuleType)):
            ann = o.__dict__.get("__annotations__", None)
        elif callable(o):
            ann = o.__annotations__
        else:
            raise ValueError(f"{o!r} doesn't support annotations")
    • Why would __annotations__ ever be None?

    Simply because it's the cheapest sensible way to indicate "this object has no annotations". It would be nice if the world accepted __annotations__ being None to mean "no annotations". But I don't think we're there.

    I note that the function object has a special setter for __annotations__ (func_set_annotations()), since at least Python 3.1, which explicitly only allows setting __annotations__ to either a dict or None. (I didn't have 3.0 handy.)

    Why do you allow setting it to None?

    Me? I've never contributed code to Python that restricts the permissible types of values one is allowed to set on __annotations__.

    Function objects are opinionated as mentioned (either dict or None), classes and modules have no opinion whatsoever, and allow you to set __annotations__ to any value. That was all true long before I started working on annotations.

    Do we know of people ever write '__annotations__ = None' in their class or write 'cls.__annotations__ = None'?

    AFAIK I've never seen anyone set __annotations__ to None.

    • And where's your PR?

    Not done yet. I only started it last night, and there were a lot of test failures--it turns out, a *lot* of regression tests do a sanity check that __annotations__ isn't set on classes (and maybe modules too). For example, I was surprised that test_opcodes has two such test failures.

    @larryhastings
    Copy link
    Contributor Author

    By the way, here's a tidbit I never got around to posting in c.l.p-d.

    I noted in the conversation in January that attrs is an outlier here: it *doesn't* look in the class dict for __annotations__. Instead, it has some complicated code where it asks the class for its annotations, then iterates over the __mro__ and asks every base class for *its* annotations. If the class and a base class have the same class dict (using the "is" operator iirc) then attrs says "oh, that class doesn't have its own annotations, it's inheriting them" and reacts appropriately.

    I emailed Hynek to ask him why he did it that way. It turns out, he did that because at the time he didn't know you could peek in the class dict for __annotations__. He literally had a todo item saying "change to looking in the class dict".

    @larryhastings
    Copy link
    Contributor Author

    It occurs to me that part of this work should also be a new "best practices for __annotations__" entry in the Python docs.

    Best practices for working with annotations, for code that requires a minimum Python version of 3.10+:

    Best practice is to call either inspect.get_annotations() or typing.get_type_hints() to access annotations. But if you want to access '__annotations__' on an object directly:

    • Always access the annotations on an object using the attribute interface, either o.__annotations__ or getattr(o, '__annotations__'). Accessing '__annotations__' through other mechanisms (e.g. looking in the class dict) is unsupported.
    • Assume that o.__annotations__ is always either a dict or None.
    • It's best to not assign to o.__annotations__. But if you must, always set it to either a dict or None.
    • Never delete o.__annotations__.
    • Never modify o.__annotations__.

    @larryhastings
    Copy link
    Contributor Author

    • Never modify o.__annotations__.

    Let me revise that to:

    • If o.__annotations__ is a dict, never modify the contents of that dict.

    @gvanrossum
    Copy link
    Member

    I’d say that best practices for 3.9+ are more useful.

    There seem to be several contradictions in your remarks, but we’ll get to those in the review.

    @larryhastings
    Copy link
    Contributor Author

    I’d say that best practices for 3.9+ are more useful.

    My point in writing this up was that the best practices change as of 3.10. So, I could add a section to the Python 3.10 documentation giving best practices for 3.10+ and 3.9-. But 3.9 and 3.10 have different best practices.

    @gvanrossum
    Copy link
    Member

    And that’s a problem. There needs to be a recommendation on what to do for code that spans 3.9 and 3.10. What should users do otherwise? Drop 3.9 as soon as they introduce 3.10 support? Withhold 3.10 support until 3.9 reaches EOL?

    IOW you can’t just break backward compatibility. (That’s why PEP-563 was rolled back in the first place.)

    @larryhastings
    Copy link
    Contributor Author

    I'm not breaking backwards compatibility--that's the point of all this. But I'm improving the experience. And if you don't care about 3.9 and before, you can stick to the new improved experience.

    Looking in the class dict for annotations is terrible, but that's best practice for 3.9 and earlier because of the flawed design of annotations.

    For 3.10+, best practice to look at annotations is inspect.get_annotations() for all objects. But this isn't best practice for 3.9, because the function didn't exist in 3.9. Also, the paragraph might go on to say, if you have a good reason why you can't call inspect.get_annotations(), best practice no longer requires you to look in the class dict, because of this very issue/PR that is behavior we are changing for 3.10.

    So, best practices changed in 3.10. And I plan to add a section to the docs somewhere that says "Here are best practices for accessing annotations in 3.10+". As I said, if you think the Python 3.10 docs should have a section about "here are best practices in 3.9", I can add that too, but it'll be a different paragraph.

    @ericvsmith
    Copy link
    Member

    It's find to have advice of "do X in 3.9", and "do Y in 3.10+". I think the issue is: if you have code that needs to run in 3.9 and 3.10+, what should you do? There needs to be some advice for that case.

    @ericvsmith
    Copy link
    Member

    "It's fine to have advice ..."

    @larryhastings
    Copy link
    Contributor Author

    I'm please you folks are as supportive as you are of what I'm doing here, given that you seem a little unsure of the details. I concede that there's a lot going on and it can be hard to keep it all in your head.

    The point of this issue / PR is to improve the best practices for 3.10 without breaking the best practices for 3.9 and before.

    Best practice in 3.10 is to use inspect.get_annotations(), and failing that, three-argument getattr(), for all annotated objects. This issue / PR permits that to happen.

    Best practices for 3.9 differed between different objects. For classes, you had no choice but to look in the class dict because of the inheritance problem. (Either that, or, the elaborate scheme 'attrs' used.) That code will emphatically _still work_ in 3.10. We are not breaking backwards compatibility.

    So, the best practices in 3.9 and before will still work in 3.10. But the best practices for 3.10+ are improved because of the work we're doing here and in other places.

    My intent was to document the best practices for 3.10+. If folks think it would be helpful, this same section in the 3.10 docs can also describe best practices for 3.9 and before.

    @larryhastings
    Copy link
    Contributor Author

    Aaaaand I just had a realization. Lazy creation of an empty annotations dict, for both classes and modules, will work fine.

    As stated in my previous comment in this issue, my goal here is to improve best practices in 3.10+, while preserving the unfortunate best practices of 3.9 and before. I don't know why I couldn't see it before, but: adding a getset to modules and user classes, and lazily creating the empty annotations dict if not set, ought to work fine. The getset code will have to store the annotations dict in the class / module dict but that's not a big deal.

    If you're in 3.10+, and your code looks in the class dict for annotations, and we lazy-create it with a getset and store it in the class dict, then you still see what you expected to see.

    • If the class has annotations, they'll be in the class dict.
    • If the class has no annotations, but someone accessed
      cls.__annotations__, the empty dict will be lazily created
      and you'll see it.
    • If the class doesn't have annotations, then the class dict
      won't have an '__annotations__' key, which is what you were expecting.

    In any scenario the old best practices code works fine.

    If you're in 3.10+, and your code uses cls.__annotations__ or getattr() to get the class dict, and we lazy-create it with a getset and store it in the class dict, then you also get what you expected. If the class or module has annotations, you get them; if it doesn't, it lazy creates an empty dict and stores and returns that. Your code works fine too.

    Sorry I've been slow on figuring this out--but at least I got there in the end, before beta. And it's good news!

    @larryhastings larryhastings changed the title Add an empty annotations dict to all unannotated classes and modules Lazy-create an empty annotations dict in all unannotated user classes and modules Apr 24, 2021
    @larryhastings larryhastings changed the title Add an empty annotations dict to all unannotated classes and modules Lazy-create an empty annotations dict in all unannotated user classes and modules Apr 24, 2021
    @larryhastings
    Copy link
    Contributor Author

    Hmm. Sorry for the stream-of-consciousness thought process here, but this approach adds wrinkles too.

    Function objects from the very beginning have lazy-created their annotations dict if it's not set. Which means this works fine:

        while True:
            del fn.__annotations__
            print(fn.__annotations__)

    You can do that all day. The function object will *always* create a new annotations object if it doesn't have one. del fn.__annotations__ always works, and accessing fn.__annotations__ always succeeds. It doesn't retain any state of "I already lazily created one, I shouldn't create another".

    If I add getsets to classes and modules so they lazy-create annotations on demand if they otherwise aren't set, then either a) they need an extra bit set somewhere that says "I lazily created an empty annotations dict once, I shouldn't do it again", or b) they will duplicate this behavior that functions display.

    a) would better emulate existing semantics; b) would definitely be a user-visible breaking change. There's actually a test in the Lib/test/test_opcodes (iirc) that explicitly tests deleting __annotations__, then checks that modifying the annotations dict throws an exception. I haven't done any investigating to see if this check was the result of a regression, and there was a bpo issue, and there was a valid use case, etc etc etc.

    My instinct is that deleting o.__annotations__ is not an important or widely-used use case. In fact I plan to recommend against it in my "best practices" documentation. So either approach should be acceptable. Which means I should go with the simpler one, the one that will duplicate the function object's always-recreate-annotations-dicts behavior.

    @gvanrossum
    Copy link
    Member

    Functions don't store __annotations__ in their __dict__, it is a separate slot named func_annotations (see funcobject.c). I guess that's because the __dict__ is purely for user-defined function attributes.

    But perhaps for classes the C equivalent of this pseudo-code will work?

    @property
    def __annotations__(self):
        if "__annotations__" not in self.__dict__:
            self.__dict__["__annotations__"] = {}
        return self.__dict__["__annotations__"]

    The whole thing is protected by the GIL of course, so there's no race condition between the check and the assignment.

    So if you look in __dict__ it will be like it's still Python 3.9, but if you're using the attribute (the recommended approach for code that only cares about 3.10) it'll be as if it always existed. Sounds pretty compatible to me.

    @gvanrossum
    Copy link
    Member

    So, honestly I don't understand what your concern with the lazy approach is. Was your design based on having a bit in the class/module object (outside its __dict__) saying "I already lazily created one"? Or am I missing something?

    Also, I'll stop going on about "best practice".

    @larryhastings
    Copy link
    Contributor Author

    Functions don't store __annotations__ in their __dict__, it is a
    separate slot named func_annotations (see funcobject.c). I guess
    that's because the __dict__ is purely for user-defined function
    attributes.

    I brought up functions because I'm now proposing to make classes and modules behave more like functions in this one way, that they too will now lazy-create an empty annotations dict every time. But yes, function objects don't store __annotations__ in their __dict__, and best practices for 3.9 and before was to use fn.__annotations__ or getattr() when getting the annotations from a function object, and 3.10 will not change that best practice.

    I don't know specifically why the implementor made that design choice, but I might have done that too. Most function objects don't have a __dict__ so this is almost always cheaper overall. And recreating the dict seems harmless.

    (Just to round the bases on this topic: I don't think you should be permitted to delete __annotations__, like most other metadata items on functions / classes / modules. But I don't propose to change that for 3.10.)

    So if you look in __dict__ it will be like it's still Python 3.9,
    but if you're using the attribute (the recommended approach for code
    that only cares about 3.10) it'll be as if it always existed. Sounds
    pretty compatible to me.

    Yes, exactly. That was the thing I finally figured out this afternoon. Sorry for being a slow learner.

    Again, this approach will change the semantics around deleting annotations on class and module objects. Deleting them won't be permanent--if you delete one, then ask for it, a fresh one will be created. But that seems harmless.

    So, honestly I don't understand what your concern with the lazy
    approach is. Was your design based on having a bit in the
    class/module object (outside its __dict__) saying "I already
    lazily created one"? Or am I missing something?

    My concern is that always lazy-creating on demand will change user-visible behavior. Consider this code:

        class C:
            a:int=3
    del C.__annotations__
    print(C.__annotations__)
    

    In 3.9, that throws an AttributeError, because C no longer has an '__annotations__' attribute. If I change Python 3.10 so that classes and modules *always* lazy-create __annotations__ if they don't have them, then this code will succeed and print an empty dict.

    That's a user-visible change, and I was hoping to avoid those entirely. Is it a breaking change? I doubt it. Is it an important change? It doesn't seem like it. I bring it up just in the interests of considering every angle. But I don't think this is important at all, and I think always lazy-creating annotations dicts on classes and modules is the right approach.

    @larryhastings
    Copy link
    Contributor Author

    New changeset 2f2b698 by larryhastings in branch 'master':
    bpo-43901: Lazy-create an empty annotations dict in all unannotated user classes and modules (bpo-25623)
    2f2b698

    @larryhastings
    Copy link
    Contributor Author

    Thanks for your feedback and reviews, everybody! Python just got an eensy teensy tiny bit better.

    @larryhastings larryhastings self-assigned this Apr 30, 2021
    @larryhastings larryhastings self-assigned this Apr 30, 2021
    @pablogsal
    Copy link
    Member

    Unfortunately commit 2f2b698 has broken the buildbots:

    ======================================================================
    FAIL: test_annotations_are_created_correctly (test.test_module.ModuleTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\buildarea\3.x.ware-win81-release.refleak\build\lib\test\test_module.py", line 338, in test_annotations_are_created_correctly
        self.assertTrue("__annotations__" in ann_module4.__dict__)
    AssertionError: False is not true

    Example failure:

    https://buildbot.python.org/all/#/builders/511/builds/12/steps/4/logs/stdio

    @pablogsal
    Copy link
    Member

    I think that test_annotations_are_created_correctly should use the import_helper.import_fresh_module helper

    @larryhastings
    Copy link
    Contributor Author

    Gee whiz! Sorry about that. I swear, it works fine on my machine.

    I'll incorporate import_helper.import_fresh_module helper into the test as you suggest, and once I get it to work I'll send you a PR. I don't know how to test fixing this failure, though, because again it already works fine on my machine.

    Why would it fail on Windows? Are there dramatic differences under the covers with respect to the import machinery and caching and such between POSIX and Windows?

    @pablogsal
    Copy link
    Member

    Oh, actually is workse because commenting out the test shows refleaks:

    𓋹 ./python.exe -m test test_module -R :
    0:00:00 load avg: 5.03 Run tests sequentially
    0:00:00 load avg: 5.03 [1/1] test_module
    beginning 9 repetitions
    123456789
    .........
    test_module leaked [47, 47, 47, 47] references, sum=188
    test_module leaked [6, 6, 6, 6] memory blocks, sum=24
    test_module failed

    == Tests result: FAILURE ==

    1 test failed:
    test_module

    Total duration: 2.0 sec

    @pablogsal
    Copy link
    Member

    > Why would it fail on Windows?

    It fails on all refleak buildbots. To reproduce:

    % ./configure --with-pydebug -C && make -j -s
    % ./python.exe -m test test_module -R :

    @larryhastings
    Copy link
    Contributor Author

    Eek! I swear I did a refleak check and got a clean bill of health. Again, sorry about this!

    @pablogsal
    Copy link
    Member

    Eek! I swear I did a refleak check and got a clean bill of health. Again, sorry about this!

    No problem at all! I'm sure we can fix this on time :)

    Opened #25754 for the refleaks

    @larryhastings
    Copy link
    Contributor Author

    You want a separate PR for the refleak fix, or should I roll them both up into one?

    @pablogsal
    Copy link
    Member

    I made a fix for everything in #25754. Could you review it?

    @pablogsal
    Copy link
    Member

    New changeset e374a40 by Pablo Galindo in branch 'master':
    bpo-43901: Fix refleaks in test_module (GH-25754)
    e374a40

    @larryhastings
    Copy link
    Contributor Author

    Obviously the original assertion failure ("AssertionError: False is not true") wasn't caused by the refleaks. So I'm still puzzled about why that test worked on POSIX and failed on Windows. I admit I was pulling some wacky import shenanigans in the test, and the import_fresh_module change is an improvement. But still... what was going on?!

    @pablogsal
    Copy link
    Member

    So I'm still puzzled about why that test worked on POSIX and failed on Windows

    I was able to reproduce it in my MacOS machine (maybe also it reproduced on Linux). The problem is that when you run with -R, the test runs several times and the import statement is a noop the second time because is cached. We needed to modify the test to import the module always

    @larryhastings
    Copy link
    Contributor Author

    Ah, I see. So it wasn't a Windows thing, it was a "we run the test multiple times and that particular test assumed it was only run once" thing. And reflink testing is guaranteed to run every test multiple times.

    @erlend-aasland
    Copy link
    Contributor

    test_grammar also needed a fix. It has been updated to use import_helper.import_fresh_module in bpo-43995 (GH-25764).

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants