Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for type checking #19

Open
encukou opened this issue Oct 12, 2021 · 22 comments
Open

Better support for type checking #19

encukou opened this issue Oct 12, 2021 · 22 comments

Comments

@encukou
Copy link
Owner

encukou commented Oct 12, 2021

As PEP 630 notes, there's no good equivalent for Py*_Check functions (like for PyUnicode_Check for str a static type).

These checks are used to ensure instances have compatible memory layout (and thus, whether it's safe to cast from PyObject to a specific type).
We can assume that types made from the same spec share the memory layout, so, I could imagine API for checking if any superclass was defined using a given PyType_Spec.

This means the API wouldn't be usable without access to that original spec (in particular, it wouldn't be usable with classes created completely "on-demand").

The spec would also need to be stored on the class for this comparison. Since specs for "on-demand" classes can be deallocated right after the class is created, this would become a dangling pointer.
By using the new typechecking API, the user has a pointer to the original spec. Sadly, AFAIK, comparing a live pointer to a dangling pointer is implementation-defined, so we can't use it.
So, there probably needs to be a new flag for saying that the spec will outlive the type (e.g. by being static), which would be required for the type check.

@seberg
Copy link

seberg commented Oct 12, 2021

I suppose this is mainly about fast subclass checking. Currently some types do not have a fast path:

#define PyComplex_Check(op) PyObject_TypeCheck(op, &PyComplex_Type)

while many of the built-ins have the luxury of using the TP-flags for super-fast subclass checking?

Personally, I am still confused about the idea because it seems to effectively introduce the spec as a "virtual" superclass. Normally, I have PyObject_TypeCheck(op, MyType) (which is equivalent to isinstance if I know that MyType does not have __issubclass__). A spec has no Python or other "concept" behind it, so it feels confusing to use it for logic. Couldn't you use a superclass flag to indicate that the superclass should be preferential for a "fast-check" slot?
All of it feels a bit tricky though, since inheritance could mean that we have two Specs/SuperTypes to fast-check for? And that would only work if we store both or reject it (but rejecting it could be problematic from an API evolution standpoint: A baseclass could break a subclass by starting to use the flag).

In any case, this is probably all more brainstorming and a side issue, so I don't want to waste much time discussing it. Thanks for working on improving the C-API, it is much appreciated!

@encukou
Copy link
Owner Author

encukou commented Oct 12, 2021

Oh, important context: this is mainly for cases where it's not easy to get the superclass object.
Per PEP 630 that pointer should be part of module state (and there may be more module obects, each exposing a distinct class object, which however all have the same memory layout). And despite PEP 573 the module state is not always easy to get (e.g. in slot methods like tp_add, where you frequently want to typecheck).

@erlend-aasland
Copy link

erlend-aasland commented Oct 12, 2021

If I understand you correctly, you're suggesting something like this:

  • store PyType_Spec * in struct _typeobject (new member)
  • add something a la PyModule_GetDef for types: PyType_GetSpec
  • introduce a non-inheritable Py_TPFLAGS_HAVE_STATIC_SPEC to make it possible to mark types that have a static spec
  • add a PyType_IsSpec API

@encukou
Copy link
Owner Author

encukou commented Oct 12, 2021

Well, I wasn't planning to add PyType_GetSpec and Py_TPFLAGS_HAVE_STATIC_SPEC, if possible :)

I'd like to do some more research into comparing a live pointer to a dangling pointer. If this would be possible without an additional type flag (just by relying on the fact that the user has a valid pointer to the original spec), PyType_GetSpec would be dangerous (so I'd leave it out) but the rest of the change would be much more elegant.

@erlend-aasland
Copy link

So where are you suggesting to store the spec pointer, then? :) As a slot?

@seberg
Copy link

seberg commented Oct 12, 2021

comparing a live pointer to a dangling pointer

But you would still need to know that the same pointer could not point to a different spec later (by coincidence)? And by that time, I am not sure it matters anymore whether or not you require it to be a life pointer?

But, at least I understand the point now: We need something that "bypasses" the issue of global state for sub-interpreters, which indeed is a worthy objective. But, I still feel you may be overcomplicating things. Why not ask the user to provide a/the unique symbol instead and document it as: If you define your spec static, just use the spec here!

@erlend-aasland
Copy link

erlend-aasland commented Oct 12, 2021

But you would still need to know that the same pointer could not point to a different spec later (by coincidence)?

I recon setting the spec pointer would be done implicitly in PyType_FromModuleAndSpec (or a similar place), and it is impossible to change the spec of a type, once created, so I think one can safely assume that the spec pointer will not change as long as the type object is alive.

However, we need to know if we can "trust" the pointer; that is, does it point to a heap allocated spec (can be a dangling pointer), or a static spec (can never be a dangling pointer). One solution to that is to use a non-inheritable type flag.

@encukou
Copy link
Owner Author

encukou commented Oct 13, 2021

But, at least I understand the point now: We need something that "bypasses" the issue of global state for sub-interpreters, which indeed is a worthy objective. But, I still feel you may be overcomplicating things. Why not ask the user to provide a/the unique symbol instead and document it as: If you define your spec static, just use the spec here!

Aha! Right, that would work nicely. (Having to put a pointer to the spec in the spec won't be very ergonomic, but we can live with that).
Thanks, your perspective helps :)

@neonene
Copy link

neonene commented Apr 22, 2024

Having to put a pointer to the spec in the spec

Like the following?

PyType_Spec foo_spec = {
    ...
    .static_spec = &foo_spec  // ask a user to ensure it's a static spec
};

If so, Py_TPFLAGS_HAVE_STATIC_SPEC also needs to be introduced to invalidate the unused (uninitialized) member in a non-static local spec? The tp_flags appears to have no vacancy, though.

If you mean a type slot, I have tried introducing a slot ID: python/cpython#118139. This way enforces a forward declaration of the spec.

@encukou
Copy link
Owner Author

encukou commented Apr 22, 2024

Yes, a slot like that. But I think it needs a PEP-like (though maybe much smaller) proposal, and approval from the C-API WG. Would you be willing to write up something like that?

As said above, this doesn't necessarily need to be a pointer to the spec itself -- we only need a pointer that will

  • outlive the class, so it's not reused for something else while the class exists, and
  • “belongs” to the extension, so it doesn't clash with other extensions.

For example, an extensions that automatically wraps C++ classes could put the typeid here.
(Of course you then can't access the PyType_Spec members; the only operation allowed on the pointer would be comparing it to another such “token”.)

@neonene
Copy link

neonene commented Apr 22, 2024

  • outlive the class, so it's not reused for something else while the class exists
  • “belongs” to the extension, so it doesn't clash with other extensions.

Does it also mean that a user-defined unique number in the module will also work fine, if the type has the PyModuleDef pointer. Then, PyType_GetBaseByModuleDefAndID() ?

@encukou
Copy link
Owner Author

encukou commented Apr 23, 2024

Hmm. Would that work for the StgInfo case, where the module might be being deallocated?

Something like this was requested by the JPype maintainer; that discussion reminds me that things might be easier in modules that use a metaclass. I commented on the StgInfo issue.

@neonene
Copy link

neonene commented Apr 23, 2024

Hmm. Would that work for the StgInfo case, where the module might be being deallocated?

PyModuleDef * should not be a dangling pointer unless the DLL is reloaded unloaded. And all interpreters shares the same adress. I'm not aware of the OSes that reload the DLL everytime in CPython.

>_testembed_d test_repeated_init_exec "import ctypes"
--- Loop #1 ---
exec: module: 0000000001EC6AB0, _ctypesmodule(def): 000007FEEC389000
free: module: 0000000001EC6AB0, _ctypesmodule(def): 000007FEEC389000
--- Loop #2 ---
exec: module: 0000000002005A30, _ctypesmodule(def): 000007FEEC389000
free: module: 0000000002005A30, _ctypesmodule(def): 000007FEEC389000
--- Loop #3 ---
exec: module: 00000000020059D0, _ctypesmodule(def): 000007FEEC389000
free: module: 00000000020059D0, _ctypesmodule(def): 000007FEEC389000
--- Loop #4 ---
exec: module: 00000000020059D0, _ctypesmodule(def): 000007FEEC389000
free: module: 00000000020059D0, _ctypesmodule(def): 000007FEEC389000

EDIT: Draft version of _ctypes is used.

@neonene
Copy link

neonene commented Apr 23, 2024

In my understanding, the pointer to the statically allocated spec should also become invalid after the DLL unload.

@neonene
Copy link

neonene commented Apr 23, 2024

Would that work for the StgInfo case, where the module might be being deallocated?

Ah, I agree that the existing PyModule_GetDef(ht_module) approach cannot be used. I mean adding a new ht_module_def member to the type, and possibly the classes created on-demand have a chance to use a non-pointer ID.

@encukou
Copy link
Owner Author

encukou commented Apr 25, 2024

We don't unload DLLs. There are no plans to do that; even if there were, we definitely wouldn't want to unload them if their classes are still alive, or if code that references their statics could still run.

@mathstuf
Copy link

There are no plans to do that; even if there were

…it would be highly platform-specific. musl's dlclose is a no-op and macOS doesn't allow unloading any library that has touched thread-local storage.

@neonene

This comment was marked as outdated.

@encukou
Copy link
Owner Author

encukou commented Apr 26, 2024

Thanks.
I don't think there's enough time to get it into 3.13; the feature freeze is in a week and half, and in that time I (and most other core devs) will focus on features that should make it in. (If you have any, please ping me on PRs I should look at first -- there's a lot to juggle.)

I should get to the draft in around 2 weeks.

@neonene

This comment was marked as resolved.

@neonene
Copy link

neonene commented Jun 12, 2024

Leaving two approaches we have come up with:

PyType_Slot foo_slots[] = {
    {Py_tp_token, &pointee_in_the_module},  // spec, typeid, ...
    ...
};
PyType_Spec foo_spec = { ..., .slots = foo_slots};
PyType_Spec bar_spec = { ..., .slots = foo_slots};

(a)

  • {Py_tp_token, NULL}:
    Token will be the pointer to the assosiated type spec rather than NULL. Py_USE_SPEC identifier may be defined for the NULL.

(b)

  • {Py_tp_token, NULL}:
    Equivalent to the absence of the slot.

  • {Py_tp_token, foo_slots}:
    Token will be the pointer to the assosiated type spec rather than foo_slots.

I would prefer (b) if possible when creating a spec dynamically (not just copy & paste).

@neonene
Copy link

neonene commented Jun 12, 2024

Thank you for reviewing my draft. I'm trying Discourse.

https://discuss.python.org/t/allowing-heaptypes-to-have-a-token-for-superclass-identification/55598

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants