Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor terminology around reference ownership #11

Open
markshannon opened this issue May 10, 2023 · 30 comments
Open

Poor terminology around reference ownership #11

markshannon opened this issue May 10, 2023 · 30 comments

Comments

@markshannon
Copy link

Currently we use the terms "borrowed" and "stolen" to describe references, both passed those passed as arguments, and those returned.

Returning a "borrowed" reference is fundamentally different to passing a "borrowed" reference as an argument. Using the same term is confusing.

The term "stolen" is needlessly pejorative, which distracts from the semantics.

@encukou
Copy link
Contributor

encukou commented May 10, 2023

Returning a "borrowed" reference is fundamentally different to passing a "borrowed" reference as an argument. Using the same term is confusing.

Could you elaborate?
IMO, the key is to know what you're borrowing from. Borrowing from the caller (for the duration of a C function). But so is borrowing from an immutable object.

@markshannon
Copy link
Author

The caller always outlives the callee, so it can lend a reference to the callee.

However, the callee cannot lend anything to the caller as it cannot own anything once it no longer exists.
So the only way a borrowed return value makes sense is if the reference is borrowed from another object passed as an argument.

@encukou
Copy link
Contributor

encukou commented May 11, 2023

Yup. For example, if you call PyTuple_GetItem or PyArg_ParseTuple with O, the reference is borrowed from a tuple rather than the callee.

@gvanrossum
Copy link

IIUC "borrowed" is only used for return values, e.g. "PyTuple_GetItem returns a borrowed reference to the item" whereas "stolen" is only used for arguments, e.g. "PyList_Append steals a reference to the item passed in".

These concepts, by whatever name, are occasionally useful, when an API is intended to transfer ownership (stealing) or not (borrowing). If we picked a neutral term that applies to both cases, we'd end up stating that most calls that return an object return a new reference, which is then owned by the caller, whereas most calls that take an object as argument don't change the ownership from the caller's POV.

Given the intentional asymmetry in the defaults here I'm not sure that picking a neutral term is actually better. And given that we probably should strive to make all public APIs use the default behavior (i.e., neither borrowing nor stealing) I'm not sure that we need to find replacement terminology.

For internal APIs (e.g. _PyList_AppendTakeRef or _PyLong_GetZero) the concepts will remain useful, but I don't think it's worth trying to change (or even standardize) the terminology used.

@encukou
Copy link
Contributor

encukou commented May 17, 2023

And given that we probably should strive to make all public APIs use the default behavior (i.e., neither borrowing nor stealing) I'm not sure that we need to find replacement terminology.

That's a good rule of thumb, but IMO there are many legitimate exceptions.
IMO, we should make sure any non-default behaviour is very visible in call sites, so reviewers see it. That is, we should indicate it in the function's name. And for that, we need good terms.

@markshannon
Copy link
Author

We still need a name for the default behavior. "default" doesn't convey any information to the reader.

We also need to decide whether to describe the behavior from the point of view of the caller or the callee.
For API functions it makes sense to use the caller's POV. For callbacks, the callee's POV probably makes more sense.

@gvanrossum
Copy link

We will probably be able to keep the term "borrowing", since that's now widely used in other languages (esp. Rust, but I predict other languages will also be talking about "borrow checking"). Agreed that "stealing" is pejorative; "takes a reference" might work?

@hauntsaninja
Copy link

"transfers ownership" could be a good way to describe the semantics

@gvanrossum
Copy link

So the default behavior would be

  • For arguments, ownership is not transferred (caller retains existing ownership)
  • For return values, ownership is transferred (caller obtains new owhership)

@encukou
Copy link
Contributor

encukou commented May 18, 2023

“Transfers ownership” is good in general!

I'd like to keep using “borrow”, but if you use it you should always explicitly say what is being borrowed from. So

  • for the default argument passing behaviour (caller retains existing ownership), the callee borrows from the caller.
  • With the current PyArg_ParseTuple(args, "O", &obj), obj is borrowed from the args tuple.
  • A hypothetical internal _PyLong_BorrowZero() returns a reference borrowed from the interpreter state.

For transferring ownership of an argument, perhaps give? It's not perfect, but short: _PyList_Append_Give, or even PyTuple_Give(tuple, n, obj). (Though I still kinda like “steal” for its connotations of something nefarious going on.)

@davidhewitt
Copy link

Rust uses the term "move" to describe transfer of ownership. This might also be familiar to C++ readers.

@gvanrossum
Copy link

Honestly I've always wondered what C++ "move" semantics are for. So I think it's not a great term. It's mostly convenient there because it has the same number of letters as "copy" and contrasts nicely -- buy we don't talk about copying references either in CPython.

@vstinner
Copy link
Contributor

I was always very confused about such code pattern:

Py_INCREF(startkey);
int cmp = PyObject_RichCompareBool(startkey, key, Py_EQ);
Py_DECREF(startkey);

Who owns startkey strong reference? Is it PyObject_RichCompareBool()? Is it the callee?

Why is startkey treated differently that key which is also a PyObject*?

I proposed adding a Py_HOLD_REF() macro to hold a strong reference to a Python object while executing code in python/cpython#99481 But the implementation of such macro was non-trivial (and "ugly"), so I gave up on that idea.


If I understood correctly the code, the problem is that startkey is a borrowed reference, whereas key should be a strong reference. Is key really a strong reference? Well, I didn't copy/paste the full code to show how much confusing it is if you cannot check the "context" of the code: where references come from, how object reference count is handled, etc.

@gvanrossum
Copy link

What is your point about not copy/pasting the full code? The answer to your question "who owns startkey" in dictobject.c is literally on the line before (where it is taken out of the container data structure). In setobject.c it is a bit farther up, because there are a bunch of guards that check whether startkey is usable.

@vstinner
Copy link
Contributor

What is your point about not copy/pasting the full code?

My point is that when I just look at the int cmp = PyObject_RichCompareBool(startkey, key, Py_EQ); line, it's unclear to me if passing startkey and key is sane. We don't have "strong reference" and "borrowed reference" attributes on C variables, and variable names don't give hints.

My concern is that to me, it's unclear if PyObject_RichCompareBool() must be called with strong references or if passing borrowed references is safe. Well, apparently it's unsafe. Is there a guideline saying when it's safe or not? Is it on a case by case basis? Can a case change if a function implement changes (if it didn't call arbitrary Python code, and then it does)?

I mean that I don't understand what "borrowed reference" means when passing a Python object to a function. In general, it looks unsafe. But sometimes, if you are lucky, it's safe :-)

@gvanrossum
Copy link

(I suspect I'm falling into a trap you've carefully designed to show some logical fallacy. So be it.)

I think the only safe guideline is that the caller should always have a strong reference to each object passed as an argument into an API function. In most cases this isn't a problem -- I don't think this is something specific to PyObject_RichCompareBool(). Usually the caller already has a strong reference. For example, assuming this invariant holds recursively (even in use code), passing an argument you received should already be safe -- since your caller has a strong reference, a strong reference exists, and so it's safe to pass it into an API (without extra incref/decref calls). Or, if you received the argument from another API that returns a strong reference (and most do, PyList_GetItem notwithstanding), you own a strong reference and you're good.

The problem occurs when you aren't sure you have a strong reference -- e.g. if you received an object from PyList_GetItem, you can't be sure that you have a strong reference, because it might be removed from that list as soon as you relinquish control to another API (you don't have to pass that object -- just calling any API can do it). Because lists are mutable, owning a strong reference to the list is insufficient. (OTOH, if you got it from PyTuple_GetItem, and you have a reference strong reference to the tuple, you can presumably rely on the tuple keeping the item alive.)

In the examples you quote, startkey was pulled directly from the dictionary or set entry, and since dicts and sets are mutable, this is a situation where you don't have a strong reference to it. The key, however, was received from the caller (e.g. of unicodekeys_lookup_generic()), so you can assume you have a strong reference there.

Note that I subtly distinguish between owning and having a strong reference. If you own it, you must eventually give up ownership (decref) or transfer it (e.g. by returning it, or inserting it into a mutable data structure that "borrows" the reference, like PyList_SetItem). The condition I describe merely requires that you have a strong reference. Owning one is sufficient but not necessary.

@encukou
Copy link
Contributor

encukou commented Oct 23, 2023

Proposed guideline issue: capi-workgroup/api-evolution#25

tldr of the current proposal, as it applies to the discussion here:

  • transfer ownership is the general term, the shorter steal, consume, decref etc. are appropriate in their respective contexts
  • “Steal” is used when ownership of a function argument is transferred. (The connotation of something shady is going on is, in my opinion, correct.)
  • “Borrow” is used for return values. The opposite, and the default behaviour, is again transferring ownership. That's a mouthful, but describing the default hopefully isn't needed that often.
  • In the default calling convention, the caller must guarantee that all arguments stay valid for the duration of the call. Whether that's using “strong” or “borrowed” references, doesn't matter -- we don't even define those terms.

@vstinner
Copy link
Contributor

“Steal” is used when ownership of a function argument is transferred. (The connotation of something shady is going on is, in my opinion, correct.)

That's a misleading term. IMO it's closer to C++ std::move. The ownership is moved, not "stolen".

@erlend-aasland
Copy link

I don't think we should use terms with negative connotations for legitimate operations. IMO, the wording should reflect what's being done (move/transfer).

@gvanrossum
Copy link

I don't think we should use terms with negative connotations for legitimate operations. IMO, the wording should reflect what's being done (move/transfer).

So function that "steal" arguments we can name Py_FooBarMove() (in contrast with plain Py_FooBar()), and in explanations we can use "transfer of ownership".

For functions that return a "borrowed" argument I'm less sure. I don't think of "borrow" as a pejorative word. The concepts do not feel symmetric (the caller does not "steal" a reference -- it's the callee that does something unusual).

@vstinner
Copy link
Contributor

Concrete example: I propose to make _PyTuple_FromArraySteal() and _PyList_FromArraySteal() functions public in issue #111489. It would be nice to give them a better name than "Steal".

Do PyTuple_FromArrayMove() and PyList_FromArrayMove() name make sense to you?

@gvanrossum
Copy link

It does to me, but let’s leave this up to the whole WG, if it gets established.

@encukou
Copy link
Contributor

encukou commented Oct 30, 2023

I don't think we should use terms with negative connotations for legitimate operations. IMO, the wording should reflect what's being done (move/transfer).

It's a legitimate operation, but it does go against the usual convention. Frankly, I don't see how Steal distracts from the semantics -- it's quite the opposite. If we want the API to be regular and predictable, I'd prefer strong hints in the names of legitimate exceptions.

Steal is clearer, and already established. I count 28 occurences in the docs, all meaning “transfer ownerhip of a PyObject* argument”.

@vstinner
Copy link
Contributor

Example of documentation currently using "steal" term:

For me, it's even more surprising when the negative form is used. Example:

@erlend-aasland
Copy link

Sorry, I thought this issue was about the terminology we use [in the docs], not about C API naming.

@gvanrossum
Copy link

Oh dear. My miscommunication-spider-sense didn't catch that! :-) Yeah, let's use the less loaded terms in definitions and discussions, but I'm fine with using the (traditional) "steal" in API names.

Still not sure what to use for API names that return a borrowed reference; I don't believe we have that word in any API name currently.

@encukou
Copy link
Contributor

encukou commented Oct 31, 2023

Oh dear. My miscommunication-spider-sense didn't catch that! :-) Yeah, let's use the less loaded terms in definitions and discussions, but I'm fine with using the (traditional) "steal" in API names.

But, IMO, This function “steals” the *obj* argument is much easier to follow than something like Ownership of the *obj* argument is transferred.
I'd be very happy to always put “steal” in quotes and make it link to a definition. But it's a good term for the concept.

Still not sure what to use for API names that return a borrowed reference; I don't believe we have that word in any API name currently.

True, but we practically don't have Steal in API names either -- it's just one private function.

In capi-workgroup/api-evolution#25, I propose to use Borrow itself.
The docs can use “borrow” too, e.g. “The return value is borrowed from the p argument.”, with borrow a link to a definition/explanation.

@erlend-aasland
Copy link

erlend-aasland commented Oct 31, 2023

The evolution names are discussed over at capi-workgroup/api-evolution#23 and capi-workgroup/api-evolution#21. Perhaps the C API naming should continue over there? The docs terminology does not have a "solution" issue yet, so I guess it's ok to continue discussing that issue here Found it in Petr's post: capi-workgroup/api-evolution#25.

The revolution naming does not have an issue yet; perhaps the closest would be capi-workgroup/api-revolution#9?

@encukou
Copy link
Contributor

encukou commented Oct 31, 2023

IMO, the we're deciding on the desired “target” ; the terms should be the same regardless of whether we approach it gradually or do grand rename (modulo some API-wide prefix for the latter case).

@encukou
Copy link
Contributor

encukou commented Nov 3, 2023

But, IMO, This function “steals” the *obj* argument is much easier to follow than something like Ownership of the *obj* argument is transferred.

Going through this again, I see that it's possible clear wording with something like Ownership of *obj* is transferred to the *list*, with transferred linking to the glossary. With Steal nearby, in the API name, it should be enough.

But, I would still like to use “steal” in docs of existing API that doesn't have it in the name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants