Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The C API is weakly typed #31

Open
markshannon opened this issue May 17, 2023 · 44 comments
Open

The C API is weakly typed #31

markshannon opened this issue May 17, 2023 · 44 comments
Labels
evolution-proposed theme: the C language issues related to the way we use the C language

Comments

@markshannon
Copy link

markshannon commented May 17, 2023

The C type system is weak, compared to Java or Rust, but we should use it where possible.

The function PyTuple_Size() should be infallible; all tuples have a size.
Yet, it can fail if passed a non-tuple, which can only happen because it is weakly typed.
Like PyTuple_Size(PyObject *), many API functions take PyObject * when they should take more specific types.

Here are a few examples:
PyTuple_Size()
PyList_Append()
PyDict_GetItem()

The error handling of these functions is awful. PyTuple_Size() and PyList_Append() raise a SystemError, not a TypeError if passed the wrong type. PyDict_GetItem() just acts as if the item is not in the container, not raising at all.

@encukou
Copy link
Contributor

encukou commented May 17, 2023

Note that C type safety does nothing for interop with other languages, so if we push for this we should also invest in that area -- e.g. publish enough parseable info to make safe wrappers easy to generate (#7).

@markshannon
Copy link
Author

Violating type rules is unsafe. Wrappers don't make things safer, just more convenient.
E.g.
Suppose we have PyTuple_Length(PyTupleObject *). It might crash if you pass a PyFloatObject * (or more likely produce a meaningless result).
If we add a "safe" wrapper PyTuple_Size(PyObject *), it still crashes if you call PyTuple_Size(7).

@encukou
Copy link
Contributor

encukou commented May 17, 2023

Yup, C wrappers are useless. Let me try to rephrase.

Languages without a C compiler only see exported DLL symbols (and whatever they manage to parse from C headers).
If we ditch runtime checks, we should make it easy as possible to generate whatever boilerplate they need to ensure type safety on their end.

@vstinner
Copy link
Contributor

I would prefer to do the opposite: move to HPy-like opaque handles to fix issue #22.

@steve-s
Copy link
Contributor

steve-s commented May 18, 2023

One way of looking at this is that we still do not want to expose the memory layout and have opaque pointers, but we can still use C types to provide bit better user experience and some level of type checking, but not a real type system that would allow us to, for example, be absolutely sure that only handles pointing to tuples ever make it to PyTuple_Size.

We did consider this in HPy and I think it would be useful, but we need to iron out the API and do more research into use-cases. For example, this would be also useful for user defined types since those are also opaque. When porting NumPy we noticed that by replacing the types like DType* with generic HPy we loose valuable semantic info and also the little bit of type checking that C gives us. We might want to introduce it eventually.

Example:

typedef struct _HPy_s { intptr_t _i; } HPy;
typedef struct _HPyType_s { HPy h; } HPyType;

HPyType (*ctx_Type)(HPyContext *ctx, HPy obj);
int (*ctx_TypeCheck)(HPyContext *ctx, HPy obj, HPyType type);

// Usage:
HPyType t = HPy_Type(ctx, some_handle);

// Downcasting, always safe
HPy generic_handle = t.h;

// Upcasting, could use some helper functions/macros
// It can be safe or unsafe. Anyone can do unsafe cast using just (void*)
// as intermediate step (putting aside the C standard...), so IMO there's
// no point in not providing helpers for that and trying to "hide" it from users
HPyType type = HPyCast(ctx, HPyType, generic_handle);
if (HPy_IsNull(type)) { ...it was not a type... }

// Still passing in ctx, so that maybe in debug mode we can do some checks(?)
HPyType i_know_what_i_am_doing = HPyUnsafeCast(ctx, HPyType, generic_handle);

@vstinner
Copy link
Contributor

vstinner commented Jun 8, 2023

See also issue #37.

@vstinner
Copy link
Contributor

vstinner commented Jun 8, 2023

For me, the question here is also how much the Python release build should help developers when they misuse the C API. Provide a cute well formatted exception? Kill the process with SIGABRT on an assertion error? Ignore silently the error and attempt to provide a behavior / result which prevent a crash?

A practical problem is that a debug build of Python is not widely available: see issue #36. I'm in favor of removing assertions in a release build and require developers to use a debug build, but there are practical issues with that.

@vstinner
Copy link
Contributor

vstinner commented Jun 8, 2023

I would prefer to move the API towards type agnostic functions like PyObject_GetItem() instead of PyDict_GetItem(), PyObject functions have a good API: return a strong reference and raise the expected exception in case of error.

So far, I'm not convinced that outside CPython it's really worth it to use specialized functions like PyTuple_GetItem() or PyTuple_GET_ITEM(), instead of generic PySequence_GetItem(). Did anyone run a (micro or macro) benchmark to see the benefits on specialized functions? Is it really a bottleneck in terms of performance?

The other problems is that the C API has no clear tradeoff between performance and stability. "It depends" on the function, on the type, on the context. Some types have specialized functions. Some types have macros. Some others don't. How can users discover what's the best practice?

@gvanrossum
Copy link

If "nogil" is accepted, at least internally we may have PyDict_FetchItem which is like PyDict_GetItem but returns a strong reference. In that case we might as well adopt that API and gently start deprecating PyDict_GetItem. For example.

@markshannon
Copy link
Author

markshannon commented Jun 22, 2023

It is worth noting that debug builds should still do the type checks, so we expect any type errors that slip pass the compiler to be caught in testing.

uintptr_t PyTuple_GetSize(PyTupleObject *t)
{
    assert(PyTuple_Check(t));
    return Py_SIZE(t);
}

@markshannon
Copy link
Author

I would prefer to move the API towards type agnostic functions like PyObject_GetItem() instead of PyDict_GetItem().

It makes sense to have both, and to use the appropriate one.

PyObject *obj, PyDictObject *dict;
PyObject_GetItem(obj);   // 👍 
PyDict_GetItem(dict);   // 👍 
PyObject_GetItem((PyObject *)dict); // Pointlessly inefficient
PyDict_GetItem((PyDictObject *)obj); // Unsafe

@erlend-aasland
Copy link

erlend-aasland commented Jun 26, 2023

IMO, we should try to come up with a guideline regarding this problem before adding new APIs. I don't have a strong preference for strongly or weakly typed APIs; there are pros and cons for each variant as I see it1. I think @markshannon's proposal in #31 (comment) of having both weakly typed and strongly typed APIs makes sense, though.

Footnotes

  1. I find Steve's comment interesting

@erlend-aasland
Copy link

I created python/devguide#1127 to discuss a solution (that is: guidelines for new APIs)

@markshannon
Copy link
Author

The PyObject interfaces should be dynamically typed, not weakly typed. Weak typing is always bad.

PyObject_GetItem(PyObject *op, PyObject *index) is strongly, dynamically typed. It can handle tuples, lists, dicts, etc.

PyTuple_GetItem(PyTupleObject *op, intptr_t index) is strongly typed and efficient (it needs to do bounds checks, but not type checks).

@erlend-aasland
Copy link

Yeah, that's a better term; thanks.

@vstinner
Copy link
Contributor

I would prefer to do the opposite: move to HPy-like opaque handles to fix issue #22.

Let me elaborate: I would prefer that PyObject, PyDictObject and PyTupleObject members to not be part of the public C API: enforce the usage of getter and setter functions. If we move towards "opaque handles", technically, PyObject* or PyDictObject* is basically the same thing for me: an opaque pointer which cannot be dereferenced.

The problem is that a big part of the existing C API currently uses PyObject* for function parameters and function results. For example, PyDict_New() returns PyObject*. If we consider migrating the existing C API towards specific types (like PyDictObject*), we should also see how to handle functions returning "naked" PyObject* pointer like Py_NewRef(). The PyObject_New() macro handles this issue by casting the result to the requested type:

#define PyObject_New(type, typeobj) ((type *)_PyObject_New(typeobj))

If possible I would prefer to move away from macros (see PEP 670) :-(

@vstinner
Copy link
Contributor

Example of worst case: "downgrade" the type to be able to call a generic Py_NewRef() function which expects PyObject** and "upgrade" the result type since Py_NewRef() returns a PyObject* type:

PyTypeObject *base = type->tp_base;
// !!! 2 casts are required in a single line of code :-( !!!
type->tp_base = (PyTypeObject*)Py_NewRef((PyObject*)base);

Maybe casting Py_NewRef() argument to PyObject* can be hidden with a macro, to have a more convenient API.

Maybe we could have a macro calling Py_NewRef() and casting the result to an expected type. Would it be better than doing the cast explicitly? I'm not sure.

@vstinner
Copy link
Contributor

If we consider migrating the existing C API towards specific types (like PyDictObject*), (...)

In PR #106005, @markshannon asked me to use a PyDictObject* instead of PyObject* for the PyDict_GetItemRef() function that I propose to add:

PyAPI_FUNC(int) PyDict_GetItemRef(PyObject *mp, PyObject *key, PyObject **result);

My concern is that it would make the API inconsistent: all PyDict functions create and expect PyObject*. Examples:

PyAPI_FUNC(PyObject *) PyDict_New(void);

PyAPI_FUNC(PyObject *) PyDict_GetItem(PyObject *mp, PyObject *key);
PyAPI_FUNC(void) PyDict_Clear(PyObject *mp);
PyAPI_FUNC(PyObject *) PyDict_Keys(PyObject *mp);
PyAPI_FUNC(Py_ssize_t) PyDict_Size(PyObject *mp);
PyAPI_FUNC(PyObject *) PyDict_Copy(PyObject *mp);
PyAPI_FUNC(int) PyDict_Contains(PyObject *mp, PyObject *key);

If a single function expects PyDictObject*, a function should now cast the argument but just to call PyDict_GetItemRef().

Would it be possible to have a "switch" to opt-in for the PyDictObject* API but all at once? Not just for PyDict_GetItemRef(), but for all PyDict APIs?

I suppose that such "switch" could be two things:

I'm also open to consider an incremental approach: use specific type but only for newly added function.

@vstinner
Copy link
Contributor

@steve-s:

When porting NumPy we noticed that by replacing the types like DType* with generic HPy we loose valuable semantic info and also the little bit of type checking that C gives us. We might want to introduce it eventually.

Can HPy provide a type which can be used to define types which are aliases to HPy? The problem is the addition of the pointer: typedef HPy DTrace; doesn't work, since DTrace* is used.

@vstinner
Copy link
Contributor

@gvanrossum:

If "nogil" is accepted, at least internally we may have PyDict_FetchItem which is like PyDict_GetItem but returns a strong reference. In that case we might as well adopt that API and gently start deprecating PyDict_GetItem. For example.

I'm proposing to add PyDict_GetItemRef() which is basically the same (just the name is different): PR #106005.

@markshannon
Copy link
Author

Why is PyDictObject * any more or less opaque than PyHandle or whatever you want to call it?

If we move towards "opaque handles", technically, PyObject* or PyDictObject* is basically the same thing for me: an opaque pointer which cannot be dereferenced.

Except that PyDictObject * is more strongly typed, resulting in fewer errors and less overhead in defensive checks.

You claim this would make the API inconsistent because PyDict_New returns a PyObject *, and that calling a function that expects a dictionary needs a cast.
Why? A function that returns a dict should have a return type of PyDictObject *. If you have a PyDictObject * and need a PyObject * that is a safe upcast, no problem.
If you have a PyObject * and need a PyDictObject * then you must check the type before the cast, or it is unsafe.

Having some functions produce and take a PyObject * when they could be more strongly typed just encourages sloppy casting and unsafe code.
Having all functions produce and take a PyObject * when they could be more strongly typed means that all functions need defensive type checks, which introduces new failure modes and is slow.

@gvanrossum
Copy link

I suspect one barrier against returning PyDictObject* is that for operations that don't care whether it's a dict, you'd end up having to cast to PyObject*, which unfortunately also casts nonsense without even a warning. If we had a proper inheritance system in C (like in C++) we could get such type safety at a lower cost (except the cost of switching to C++ would be momentous for other reasons of course).

@vstinner
Copy link
Contributor

I wrote private macros to cast a pointer to PyObject*, PyTupleObject*, etc.

#define _PyObject_CAST(op) _Py_CAST(PyObject*, (op))
#define _PyTuple_CAST(op) \
    (assert(PyTuple_Check(op)), _Py_CAST(PyTupleObject*, (op)))

When a generic PyObject* pointer is casted to a specialized type, the Py<type>_Check() function is checked with an assertion.

These macros might implement more advanced checks tomorrow if needed: I recommend using them :-)

By the way, I tried but failed to fix _Py_CAST() for C++: avoid "old-style C cast" in C++ without emitting new compiler warnings is really complicated :-(

@vstinner
Copy link
Contributor

Why is PyDictObject * any more or less opaque than PyHandle or whatever you want to call it?

I corrected myself: using PyDictObject* instead of PyObject* is compatible with my plan of "opaque handles". So I'm fine with it.

Why? A function that returns a dict should have a return type of PyDictObject *.

I mean that the current PyDict_New() function returns PyObject*.

@serhiy-storchaka
Copy link

PyDict_GetItemRef((PyDictObject*)foo, bar, &baz) is not more safe, but less safe than PyDict_GetItemRef(foo, bar, &baz), because it prevents the compiler from rejecting the code where foo is int or const char*. It will provoke more errors as well as make the code uglier.

Even in programming languages like C++ and Java there is a problem of non-homogeneous containers (generics and templates solve it only partially). For example, let you have a list of dicts. PyList_GetItem() returns PyObject*, which you should cast to PyDictObject* before using any of concrete dict C API. If you add a new dict to the list, and PyDict_New() returns PyDictObject*, you need to cast it to PyObject* for PyList_Append(). A lot of unneeded unsafe C casts.

@gpshead
Copy link

gpshead commented Jul 12, 2023

It is worth noting that debug builds should still do the type checks, so we expect any type errors that slip pass the compiler to be caught in testing.

Most Python developers and Python C API extension maintainers never have a debug build of CPython. So those asserts do not help them. They're always developing against release builds of CPython.

I know this because we do our default testing builds at Google by default with -UNDEBUG -Og so that assertions are enabled - this has caught numerous bugs in third_party (read: PyPI or github) packages as well as bugs with incorrect assertions within CPython. We appear to be unique in always building our interpreter and building it with assertions enabled. These code bugs would not exist in everyone's software if the world at large actually used builds of CPython with assertions enabled during development. They don't.

(what we find and fix tends to get pushed upstream, in a randomly delayed fashion as we're usually not on the latest versions of things, but lets not assume we can rely on the goodwill of one unusual large user eventually cleaning up everyone else's undetected already-shipped messes for this)

@gpshead
Copy link

gpshead commented Jul 12, 2023

IMNSHO for the entire history of CPython our C API has used the opaque PyObject * on all of our C APIs so that users never need to typecast all over their code. We should not change this practice in our existing public C API now.

I understand the theoretical desires here, but the type of an object coming out of a Python C API is generally opaque and we never guarantee that there even is a C type corresponding to a given Python type. If and when true, that is supposed to be a hidden implementation detail. ie: Start returning specific C types from our constructors and everyone will then need to pull their hair out adding typecasts everywhere as they pass that into other API calls to make use of it.

There is no reason to encourage the use of specific C pointer types. The only thing that'll lead to is users needing to litter their code with typecasts that they rarely need today because we're a dynamic language. Every typecast is an opportunity to be wrong in a manner that reads to glazed over eyes as if it were correct to future maintainers of the code.

Put another way: What specific C API use bugs will exploding our C API to litter people's code with PyObject *, PyDictObject *, PyListObject *, PyTupleObject *, PySetObject *, PyFrozenSetObject *, PyBytesObject *, PyUnicodeObject *, PyLongObject *, PyFloatObject *, PyComplexObject *, etc. type declarations and typecasts like this actually prevent?

Weigh that against what long term bugs it'll cause anytime a typecast is wrong.

I wouldn't mind if we had strongly typed C APIs for use by the CPython internals for our own purposes where we control everything - but I think it'd be a mistake to clutter our public C API with these.

@gpshead
Copy link

gpshead commented Jul 12, 2023

fwiw the opening statement of this issue conflates two problems: C types for Python internal things, and existing C APIs that have awful behavior. those are unrelated concepts. we should seek to provide non-awful behavior for our C APIs that have such behaviors. But that has nothing to do with weak C types.

@encukou
Copy link
Contributor

encukou commented Jul 13, 2023

Long-term:
Sounds like we should decouple the API (for humans) from the ABI (for speed), as Mark has been saying all along. (edit: this kind of split is also called plumbing/porcelain (in git), or salt/sugar, which are probably better terms)
If we generate the API (#7), we could theoretically have any flavour -- a PyObject * C API with assertions, a Py*Object * C API where you need casting, or even a C++ API with overloading that uses both at the same time (edit: or a C11 API that does this with _Generic).
If we want to allow easily generating API for JS/Wasm or Rust or whatever, we'll need to do this anyway.

@markshannon
Copy link
Author

fwiw the opening statement of this issue conflates two problems: C types for Python internal things, and existing C APIs that have awful behavior. those are unrelated concepts. we should seek to provide non-awful behavior for our C APIs that have such behaviors. But that has nothing to do with weak C types.

TBH, this comment made me quite angry.
Please don't misrepresent what I said to suit your argument.

I am not conflating internal things with the C API. The title of this issue is "The C API is weakly typed". The project is "CAPI workgroup". This is about the C API.

@markshannon
Copy link
Author

IMNSHO for the entire history of CPython our C API has used the opaque PyObject * on all of our C APIs so that users never need to typecast all over their code. We should not change this practice in our existing public C API now.

While it is true that almost all C API functions accept PyObject *, many important functions only accept pointers to one type of Python object. This makes them weakly typed.

Functions that are strongly typed like PyObject_GetItem() are fine. The PyObject * type accurately reflects the values that the function accepts.

It is functions like PyDict_GetItem() that are a problem.
These functions only behave correctly when passed an argument of the correct (Python) class, but have a signature that implies that they accept any PyObject *

Let me give an example (skipping error handling and refcounts for brevity):

PyObject *t = PyTuple_Pack(...);
PyObject *i = PyLong_FromLong(...);
PyObject *v = PyObject_GetItem(t, i);

which is fine, it all works as expected.

Whereas

PyObject *t = PyTuple_Pack(...);
PyObject *i = PyLong_FromLong(...);
PyObject *v = PyDict_GetItem(t, i);

is obviously broken, but is just fine as far as the C compiler is concerned. The two examples type check the same.

There is no reason to encourage the use of specific C pointer types. The only thing that'll lead to is users needing to litter their code with typecasts that they rarely need today because we're a dynamic language.
Put another way: What specific C API use bugs will exploding our C API to litter people's code with PyObject *, PyDictObject *, PyListObject *, PyTupleObject *, PySetObject *, PyFrozenSetObject *, PyBytesObject *, PyUnicodeObject *, PyLongObject *, PyFloatObject *, PyComplexObject *, etc. type declarations and typecasts like this actually prevent?

This feels like a strawman argument. You are proposing a bad solution to a problem (littering the code with typecasts) and using that to claim that there is no problem.

Every typecast is an opportunity to be wrong in a manner that reads to glazed over eyes as if it were correct to future maintainers of the code.

While downcasts are an opportunity to be wrong, upcasts are always safe. The distinction is important.
Casts are explicit, but implicit assumptions about the type of an object are invisible, and thus much harder to check.

I don't want to propose a solution here, but here are some ideas on how to handle casts. An alternative to casts would be to make all API functions dynamically typed, removing PyDict_..., PyList..., etc. But this would be bad for performance and probably quite unpopular.

Whatever the solution, weak typing in the C API is a problem.

@encukou
Copy link
Contributor

encukou commented Jul 13, 2023

(edit: I was wrong!) The two examples are the same, you probably didn't mean that.

@markshannon
Copy link
Author

They are different, PyObject_GetItem vs PyDict_GetItem.
But thanks for pointing out how easy it is to misread code using the current API 🙂

@steve-s
Copy link
Contributor

steve-s commented Jul 13, 2023

IMNSHO for the entire history of CPython our C API has used the opaque PyObject * on all of our C APIs so that users never need to typecast all over their code. We should not change this practice in our existing public C API now.

I would argue that this is not true with custom types. Take a look at the signature of this function from NumPy:

PyArray_GetCastSafety(
        PyArray_Descr *from, PyArray_Descr *to, PyArray_DTypeMeta *to_dtype)
{
    // ...

With universal ABI and opaque PyObject*, this would have to be

PyArray_GetCastSafety(
        PyObject *from, PyObject *to, PyObject *to_dtype)
{
    // ...

and I would argue that this is bit less readable and can lead to confusion about whether, e.g., from is a dtype or array or whatever.

We actually have experience with this when porting NumPy to HPy, where you get the same problem, and we run into few bugs due to this "type" confusion.

This GitHub issue is about builtin types, but I wanted to point this out, because if this is to be solved for custom types, then the solution for builtin types (if any) should be consistent or ideally the same.

@vstinner
Copy link
Contributor

I feel that people are talking past each others.

@markshannon clearly wants to use types other than PyObject* on the whole API and considers writing a brand new API: issue capi-workgroup/api-revolution#9. He wants best performance by relying on C types: use assertions rather than runtime checks.

@gpshead is talking about the current C API. He explains that it's rare that people have access to a Python built with assertions, so we must keep the runtime checks. He explained that PyObject* is the norm in Python C API. @serhiy-storchaka explained why using PyObject* everywhere is also convenient for different reasons.


For now, I would like to ask if for API addition to the current existing C API, should we continue the trend of using PyObject* with runtime checks, or should we switch to types like PyDictObject* and use assertions to check types.

As I wrote previously, if we switch to specific types like PyDictObject*, I would prefer to switch all at once: either in a new API (issue capi-workgroup/api-revolution#9) or using an opt-in macro (issue #54). But for now, in the "default current existing" C API, I would prefer to keep the current API consistent: reuse what's being used around, so basically PyObject everywhere.

@markshannon
Copy link
Author

Please stop putting words in my mouth.

All this issue says is that the current C API is weakly typed, and that is a problem, both in terms of performance and ease of use.

What I, or anyone else, wants to do about it is out of scope for this repo.

@markshannon
Copy link
Author

Also, please stop with the straw man arguments about casts.

Casts undermines the type safety that C provides, we all know that.
But claims that one form of API needs more, or worse, casts than another without proper evidence does not help.

@iritkatriel iritkatriel added the theme: the C language issues related to the way we use the C language label Jul 13, 2023
@gpshead
Copy link

gpshead commented Jul 13, 2023

An example of why I bring up casts: If PyTuple_Pack were declared in C to return a PyTupleObject* t (it does not in todays C API) and PyObject_GetAttrString returns a PyObject* obj (true todays C API) which I then want to pass both of into a PyDict_SetItem(obj, t) in our C API. If we actually used concrete types with that function requiring a PyDictObject and PyObject, code would be forced to do something to trigger typecasts equivalent to (PyDictObject *)obj and (PyObject *)t if the C API were pedantic about using distinct pointer types in situations where the API is intended to know without a doubt that the argument is what the API name says it is.

This is a practical example of existing C API code people routinely write today. Code goes both directions from knowing a specific type of something to the generic object and from a generic object to claiming it is a specific type in existing C API use all the time.

Maybe those don't have to be written as direct C and C++ cast syntax, they could be hidden behind APIs such as static inlines or macros that do the type laundering job (@vstinner's example) including possible assertions at a level that end users compiling their own code, regardless of CPython's own build mode, might see. I'll call this a "type transition API" to avoid the word cast there (even though it probably does one internally - the point is that the API user isn't writing a cast).

From a C API evolution standpoint, we could declare user code doing casting to be a bad idea with a long term goal of requiring the use of specific APIs for all object<->known_type transitions. Those would replace any C or C++ casts that exist today or would need to exist if we changed the pointer types.

Channelling @encukou, we could change this today on an opt-in basis: generate flavors of our C API headers, or use preprocessor defines to select if specific-pointer-types are desired as inputs and outputs from type specific APIs. (at an ABI level nothing changes, binary pointers are generic w/o a type).

But we shouldn't let the possibility of such an overall change hold us back on adding new needed C APIs today using our existing generic PyObject practice, or encourage us to start using specific types on new APIs instead of generic objects. Adding specific types on new APIs before we've decided on and provided type transition APIs is harmful. If we do manage to provide such type transition APIs before a newly added C API has shipped, we can go ahead and just upgrade it to require the specific type from the start and point users of it to the similarly new type transition API.

The transition period for people's C code if doing this for our existing APIs will be long. Most PyPI extension authors need to support compilation against the oldest widely used version of CPython (assume 3.8 today). So they'd be unlikely to opt-in to the more-C-typed API themselves until the version we ship it in has become "oldest". Otherwise their code would be full of ifdefs or C/C++ casts in order to span compilation across all versions.

Changing types on parameters in public .h files tends to be quite painful for existing code. A recent practical example of a mistake we made in doing this: 3.7 shipped a change to add const to a bunch of char * pointers on Unicode/Bytes/String APIs. While logically accurate, it made the transition for everyone's code to support compilation against <3.7 and >=3.7 headers quite painful.

@gvanrossum
Copy link

To a large extent the C API emulates pure Python APIs which are dynamically typed. So the C API is also dynamically typed. In many cases this is good for discoverability. We should think hard whether we want that for a new C API.

@markshannon
Copy link
Author

To summarize this issue so far.

  • Mark:
    Weak typing is bad.
  • Greg, Erland, Guido
    Dynamic typing is good.

Those two viewpoints, that weak typing is bad, and that dynamic typing is good, are not in contradiction.
So, please, no more strawman arguments about dynamic typing.

Much of the API that appears more statically typed is weakly typed.
E.g. PyTuple_GetSize(), PyList_Append(), etc.
Whether those parts of the API should be removed, or should be strengthened, or somehow made more dynamic, is obviously a matter for debate.
But this issue, or repo, is not the place for that debate.

@gvanrossum
Copy link

Amen.

@vstinner
Copy link
Contributor

As I wrote, IMO two groups are talking past each other because they are talking about two different things.

I created issue #61: No clear separation between "fast API" (unsafe) and "safe API".


Mark: Weak typing is bad.
Greg, Erland, Guido: Dynamic typing is good.

I don't think that "good vs bad" is a good summary. Each solution has advantages and disadvantages. Depending on the use cases, some advantages become disadvantages and the opposite. Debating is part of the process to list use cases, advantages and disadvantages.

But this issue, or repo, is not the place for that debate.

I would prefer to say that this GitHub project is not a place to take decisions, but to collect use cases, issues, and maybe solutions. It's hard to not mention solutions to list their advantages and disadvantages.

The issue title is "The C API is weakly typed". For me, it's more a solution ("the C API should use dynamic typing") than a problem.

Apparently, the root problem is: "The error handling of these functions is awful." Sadly, I don't think that "awful" is helpful here. But the problem is elaborated later:

  • Some C functions raise SystemError, whereas a similar Python function would raise TypeError in the same case
  • It should be strongly typed function which have no runtime check for best performance; they can use assertions in debug mode

From what I understood, dynamic typing advantages are:

  • Better performance: avoid runtime type check (maybe use an assertion)
  • The developer and the compiler know the type (ex: PyDictObject*): first parameter type should be a specific type like PyDictObject* (not a generic PyObject*)

And disadvantages:

  • Compared to the current "PyObject* everywhere API", more casts are needed -- some people see that as a worse option
  • No runtime type check: developers have to get access to a Python built with assertions (whereas currently, a Python release builds have such type checks and raise an exception)

IMO the root underline question is: should the C API be only consumed by 3rd party C extensions who want to best performance? Should it be only used by CPython itself? Or should the C API be designed for any C extensions, and so remain (as currently) very nice with developers who don't read the documentation?

Some people would prefer to advertize a safe HPy API for everyone, and have a faster low-level API without error checking. For me, it means that a single API cannot fit all use cases.

Currently, the problem is made of multiple sub-problems:

  • CPython consumes its own C API. For example, many internals were exposed as public functions by mistake.
  • Until recently (5 years ago?), there was no technical separation between the "internal C API" and the "public C API"
  • Usually, the internal is designed for best performance where checks are only implemented with assertions
  • The public C API has many runtime checks. Because of that, it's uncommon that developers need a debug build of Python, and the majority of C extension developers just use a release build of Python.
  • In the wild, there is no real "debug mode" being used to develop C extensions.

It's not really possible to choose between an "universal API" working on most Python versions and implementations versus an "unstable API" which has best performance and is likely to break between PYthon version and implementations.

@gpshead
Copy link

gpshead commented Jul 17, 2023

@markshannon your use the phrases "please stop", "please no more", "strawman arguments", "this repo is not the place for", "out of scope for this repo" and claims that I misrepresented you... are all indications to me that you don't want me here and don't appear interested in listening to what I had to say.

I did not find that respectful. I'm done with this github repo/project. You've driven me away (intended or not).

See y'all over on discuss.python.org. /unsub

@encukou
Copy link
Contributor

encukou commented Oct 23, 2023

Issue for proposed guidelines: capi-workgroup/api-evolution#29

IMO, we can do better on a lower layer, see capi-workgroup/api-revolution#1 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
evolution-proposed theme: the C language issues related to the way we use the C language
Projects
None yet
Development

No branches or pull requests

9 participants