Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define guideline for parameter types: PyObject vs specific type #29

Open
vstinner opened this issue Oct 13, 2023 · 10 comments
Open

Define guideline for parameter types: PyObject vs specific type #29

vstinner opened this issue Oct 13, 2023 · 10 comments
Labels
guideline To be included in guidelines PEP

Comments

@vstinner
Copy link
Contributor

See python/devguide#1127 issue.

See also capi-workgroup/problems#31 issue.

@encukou
Copy link
Contributor

encukou commented Oct 17, 2023

IMO, we should continue to use PyObject* to keep the current API consistent.

XXX: To me, PyTypeObject is the type that needs the most casting between PyObject* and the concrete type. Should we allow PyTypeObject*?

In a future API layer I'd love to switch to concrete types, but that's for the revolution repo: capi-workgroup/api-revolution#3

@encukou encukou added the guideline To be included in guidelines PEP label Oct 17, 2023
@zooba
Copy link

zooba commented Oct 17, 2023

Should we allow PyTypeObject*?

I think this is only because of static type definitions? If we'd had heap types from the start, we'd likely have made them PyObject* and made the caller responsible for handling them. (I'm personally quite a fan of "you instantiate a type with PyObject_Call".)

@vstinner
Copy link
Contributor Author

vstinner commented Oct 17, 2023

While I know that it's not shared opinion, but in the long term, I would prefer to only use one type for all object arguments: PyObject*, and make this type opaque. No more PyDictObject, PyTypeObject, or anything else. Just PyObject.

Issues with using other types was discussed in PR python/cpython#106005.

@gvanrossum
Copy link
Contributor

Since we're supposed to decide this in this tracker, let's use this issue to debate it instead of linking to other issues (we should considered the "problems" repo closed at this point -- we will soon be able to read all about it in PEP 733).

I agree that in the evolutionary development of the C API we should stick to PyObject * as much as possible. I also agree that possibly PyTypeObject * may have to be a historical exception.

In the radical department we could possibly do this differently, but I personally would prefer not to -- however, I'll debate that when we get to it in capi-workgroup/api-revolution#3.

@encukou
Copy link
Contributor

encukou commented Oct 18, 2023

I think this is only because of static type definitions?

No, it's also lots of functions like PyType_HasFeature, PyType_IsSubtype, PyType_GetName, PyDescr_NewMethod, even the new PyType_GetDict.

I'm fine sticking to PyObject * in new evolutionary API, just to make things consistent.

What does this mean for type checking? Should API assert that the correct type was passed in? Or should it raise a runtime TypeError? (That would propably mean performance-sensitive code will grow internal API that skips the check -- that is what capi-workgroup/api-revolution#3 aims to avoid.)

@vstinner
Copy link
Contributor Author

Here I'm talking about the hypothetical case of "opaque types" just to explain my point, but I'm not really arguing to move towards that or not here. It's just to consider all cases.

I agree that in the evolutionary development of the C API we should stick to PyObject * as much as possible. I also agree that possibly PyTypeObject * may have to be a historical exception.

On the API side, we can keep PyTypeObject* type even if everything become opaque outside Python. We can keep all types, maybe just make all aliases to a single unique opaque PyObject type. Like typedef PyObject PyTypeObject;. This is not exactly an alias but a different type, no? I'm not sure.

In C, it's perfectly fine to define a type with an undefined structure: typedef struct PyTypeObject PyTypeObject; without having to declare struct PyTypeObject (members).

My argument here is more the desire for the long term to remove members from structures, but keep type names. Example: using PyTypeObject is fine, but in the long term, type->tp_name should become a compiler error.

Well, making PyTypeObject opaque will be a long journey, I started to take notes: C API: Investigate how the PyTypeObject members can be removed from the public C API. The good part is that PyTypeObject members are already excluded from the limited C API!

If we change PyAPI_FUNC(PyObject *) PyType_GetQualName(PyTypeObject *); declaration with PyAPI_FUNC(PyObject *) PyType_GetQualName(PyObject *);: change parameter type to PyObject*, it will introduce compiler warnings. I'm not sure of the value of such change. Maybe tomorrow, PyTypeObject will just be an alias to an opaque PyObject type.


So I think that we should consider multiple things:

  • Leave existing API unchanged to not introduce new compiler warnings
  • Consider the case of maybe making types (such as PyObject and PyTypeObject) opaque in the future
  • When adding new API, try to keep a kind of consistency with existing APIs to not surprise too much users

The consistency concern was discussed when adding a new PyDict_GetItemRef() API. Mark Shannon asked to use PyDictObject* type, whereas others preferred to keep PyObject*. At the end, the consistency was chosen: consistency with other PyDict functions which take PyObject* and not PyDictObject*. Example: PyAPI_FUNC(PyObject *) PyDict_GetItem(PyObject *mp, PyObject *key);.

@vstinner
Copy link
Contributor Author

Another data point: PEP 670 – Convert macros to functions in the Python C API was approved with Avoid the cast in the limited C API version 3.11:

The casts will be excluded from the limited C API version 3.11 and newer. When an API user opts into the new limited API, they must pass the expected type or perform the cast.

It means that APIs which take PyObject* such as Py_INCREF() no longer accept types other than PyObject* and the caller is responsible to do the cast.

As @encukou wrote, I think that in practice, PyTypeObject* is the most common type different than PyObject* which is widely used in C extensions. As @zooba noted, there are already inconsistencies when using PyTypeObject*: PyObject_Call() can be used to create a type, but it returns PyObject*. Then PyType functions expect a PyTypeObject* parameter. But many Python C API expect a PyObject again, such as Py_INCREF().

In the non-limited API, there are multiple APIs which accept "any type" (such as PyTypeObject*) and then transform the result to a PyObject*. Example: Py_NewRef().

Example from Python code base:

obj->lru_list_elem_type = (PyTypeObject*)Py_NewRef(state->lru_list_elem_type);

Input: PyTypeObject*. Py_NewRef() output: PyObject* which requires a cast to PyTypeObject*.


Between you and me, I would prefer to use PyObject* everywhere. But I'm not sure if it makes sense, nor how many compiler warnings we would have to fight to migrate to that :-(

@encukou
Copy link
Contributor

encukou commented Oct 18, 2023

I opened #27 for opaque structs (and also the structs that aren't opaque).

Let's keep this issue about PyObject* vs. concrete types?

And yes, guidelines should be about new API. You're right that we can't really change existing API.

@zooba
Copy link

zooba commented Oct 18, 2023

I think this is only because of static type definitions?

No, it's also lots of functions like PyType_HasFeature, PyType_IsSubtype, PyType_GetName, PyDescr_NewMethod, even the new PyType_GetDict.

I meant that those functions used PyTypeObject* because it was more convenient with already having defined types as a static struct PyTypeObject.1 People didn't have to write (PyObject*)my_type as often. But we "know" that they're the same shape underneath2 so the cast isn't really changing anything.

If we'd used the time machine on this one, they probably would've been PyObject* from the start, and we'd do the cast everywhere. Unless Guido says that it was a deliberate design choice to rely on C's famously rich and reliable type system 🙃 But it doesn't seem like that:

I agree that in the evolutionary development of the C API we should stick to PyObject * as much as possible. I also agree that possibly PyTypeObject * may have to be a historical exception.

I also agree. All our deliberately public APIs should be PyObject * to minimise abstraction leakage, and do their type checks (raising TypeErrors, not assertions). Fast APIs that skip checks should be the special case, not the default.

Footnotes

  1. Or were made to be consistent with existing API that was designed pre-heap types.

  2. Within the definition of how a PyObject is shaped.

@encukou
Copy link
Contributor

encukou commented Oct 18, 2023

Sounds good. New API should use PyObject* and do run-time type checking.
We have #4 for the necessary exceptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guideline To be included in guidelines PEP
Projects
None yet
Development

No branches or pull requests

4 participants