Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-123339: fix out-of-bounds values in inspect.findsource #123347

Closed
wants to merge 3 commits into from

Conversation

picnixz
Copy link
Contributor

@picnixz picnixz commented Aug 26, 2024

@picnixz
Copy link
Contributor Author

picnixz commented Aug 26, 2024

I don't really know how I can test this. I'll need to think about tomorrow.

@picnixz
Copy link
Contributor Author

picnixz commented Aug 27, 2024

I'm requesting a review from @gaogaotiantian since you were the author of #117025.

@gaogaotiantian
Copy link
Member

Hmm, why does it give a line number that's out of range? Is there something we should fix deeper?

@picnixz
Copy link
Contributor Author

picnixz commented Aug 27, 2024

This happens on some classes in collections.abc, e.g.

import collections.abc
import inspect

inspect.getsource(collections.abc.AsyncGenerator)   # raises

I believe that this might be because of _collections_abc being the module that actually defines it. In the function, you do:

file = inspect.getsourcefile(collections.abc.AsyncGenerator)

but inspect.getsourcefile(collections.abc.AsyncGenerator) gives back collections/abc.py which only contains

from _collections_abc import *
from _collections_abc import __all__  # noqa: F401
from _collections_abc import _CallableGenericAlias  # noqa: F401

And in _collections_abc, actually, we have

# This module has been renamed from collections.abc to _collections_abc to
# speed up interpreter startup. Some of the types such as MutableMapping are
# required early but collections module imports a lot of other modules.
# See issue #19218
__name__ = "collections.abc"

I think the issue is just that getsourcefile is recovering the wrong file because of:

    if isclass(object):
        if hasattr(object, '__module__'):
            module = sys.modules.get(object.__module__)

in inspect.getfile (collections.abc.AsyncGenerator.__module__ == 'collections.abc' and not _collections_abc, so you get the incorrect source file).

@gaogaotiantian
Copy link
Member

Okay I guess that's the deeper issue I mentioned earlier. I'm not saying we should not protect the out of index issue in inspect.findsource, but getting the wrong source file is a problem and it should be fixed. Otherwise you can imagine that we are getting a line number that's wrong, but within the range, and we will give a random line from a wrong file - that's also pretty bad.

@picnixz
Copy link
Contributor Author

picnixz commented Aug 27, 2024

Yes, that's something I also wondered but since there's a similar check for code objects (which could be changed!), then I thought it was (for now) easier to protect it. I don't think this check can do worse than what we currently do but we definitely need to address the issue of getting the right file...

@gaogaotiantian
Copy link
Member

Okay I took a look at the check for code object and realized that was done by me :). I thought about that and the check is inherited from some very very old code (20 years old I believe), where a regex was used to check the definition and it's possible that we missed something.

However, with the current implementation, I don't believe that check was needed. It's also not exactly trivial to remove that because - well removing “useless” code could cause issues. And that's why I'm a bit hesitated to add the check - it might not be as easy to remove it.

I'm not entirely convinced that adding this "sanity check" is just innocent, because it will introduce two different behaviors for the same root issue - like I mentioned above. OSError for now meant "we can't find the source", not "we found a source that seems to be wrong".

For a relatively lower level library, I was hoping that we can provide a behavior that as consistent as possible (others might disagree with me). Now for this function, we should return the sources that we can retrieve and the line number - it's impossible for us to know if they match.

I think we should remove the check for code object too to achieve a consistent behavior. However, I might be wrong. For this specific issue though, I believe locating the correct source file is the way to go, this PR is not a fix.

@serhiy-storchaka has his experience with inspect module so maybe we can hear from him?

@picnixz
Copy link
Contributor Author

picnixz commented Aug 28, 2024

For this specific issue though, I believe locating the correct source file is the way to go, this PR is not a fix.

I'll see what I can do. But, if people set __firstlineno__ on their custom class, then we'll also retrieve an incorrect number... Should we prevent this to happen? On the other hand, people might modify code objects. We can do it publicly since we allow them to replace values using __code__.replace(...), so I wouldn't be surprised that they could also change the co_firstlineno value, in which case the lines and the line number could be inconsistent. So the check you had is (I think) still relevant.

@picnixz
Copy link
Contributor Author

picnixz commented Sep 3, 2024

Super-seeded by #123613

@picnixz picnixz closed this Sep 3, 2024
@picnixz picnixz deleted the fix-findsource-oob-123339 branch September 3, 2024 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants