-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve and maintain Doc/data/refcounts.dat
#127443
Comments
Doc/data/refcounts.dat
Doc/data/refcounts.dat
I think it's worth trying to maintain this for all of the public C API, not just the stable ABI. If the goal is to design a static analyzer, then it would be ideal to have everything. |
Yes, but I'd like to start with the stable ABI for now. It was quite simple to extract the diff but in order to extract all the exported functions and make the diff, I think my one-liner will be a bit too long... |
Related to python#127443 Add a script to check and update `refcounts.dat` and integrate it into CI workflows. * **Add `Doc/tools/check_refcounts.py`**: - Create a script to check if `refcounts.dat` is up-to-date and sorted. - Implement logic to compare `refcounts.dat` with the stable ABI. - Implement logic to check if `refcounts.dat` is alphabetically sorted. - Handle cases where input files do not exist or cannot be read. - Allow passing file paths as command-line arguments. - Print total counts of entries and mismatches for better insights. - Optionally auto-sort and save the `refcounts.dat` file. * **Add `Doc/tools/Makefile`**: - Add a target to run the `check_refcounts.py` script. * **Update `.github/workflows/build.yml`**: - Add a CI job to run the `check_refcounts.py` script. - Ensure the job runs when a new function is added to the Stable ABI. * **Update `.github/workflows/lint.yml`**: - Add a CI linter job to check if `refcounts.dat` is correctly formatted. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/python/cpython/issues/127443?shareId=XXXX-XXXX-XXXX-XXXX).
Ok, so having an autoformatter may be a bit tricky. I really don't want to create a C parser just to parse the type declaration. I think I'll just manually change them with some powerful regexes and that's it. But for now, I'll just wait for the other PRs to land before changing them. (Anyway, at some point I'll need to sort the file alphabetically, and the diff won't really matter :D) |
I've never used it so I don't know what the current status is, but there is a C parser under |
Yes, but I felt it would have been an overkill to use that one actually. I just wanted a simplest parser for declarations without the rest. |
@ZeroIntensity For the TOML conversion, we have some limitations. We need to quote '$' and 'null' if they appear and we need to use either inline tables or inline arrays to reduce the size of the file IMO. Instead, I suggest to always create a top-level section per function: [$funcname$] We can add "deprecated" and "comment" to that top-level section, helping the newcomers. Followed by the the return section: [$funcname$.return]
ctype = "PyObject *"
refdelta = 1
# or for functions that always return NULL
[$funcname$.return]
ctype = "PyObject *"
idempotent = true And finally by the section for the parameters, here's an example: [$funcname$.params.a]
ctype = "PyObject *"
steals = true
[$funcname$.params.b]
ctype = "PyObject *"
refdelta = 1
[$funcname$.params.c]
ctype = "PyObject *"
refdelta = -1
[$funcname$.params.d]
ctype = "PyObject *"
refdelta = 0
[$funcname$.params.e]
ctype = "const char *" Each section will always have an optional "comment" key. For instance:
becomes [PyBytes_AS_STRING]
[PyBytes_AS_STRING.return]
ctype = "char *"
[PyBytes_AS_STRING.params.string]
ctype = "PyObject *"
refdelta = 0 We can also compress it into: [PyBytes_AS_STRING]
return.ctype = "char *"
params.string.ctype = "PyObject *"
params.string.refdelta = 0 wDYT? |
I'm not all that worried about compression--it's going to be a big file any way we do it. I think we should use an array of tables for the parameters instead of trying to create new keys for each parameter (it's significantly easier to parse it back into Python that way). For reference, the TOML I generated ended up looking like this: [PyArg_Parse]
name = "PyArg_Parse"
return_type = "int"
[[PyArg_Parse.args]]
name = "args"
type = "PyObject *"
effect = 0
[[PyArg_Parse.args]]
name = "format"
type = "const char *"
[[PyArg_Parse.args]]
name = "..."
type = "ellipsis" I'm not sure how we should denote some weird things like "always NULL" though. Special boolean flags like |
I denote the always NULL by saying that the function is idempotent. It's not entirely accurate. Or we can have By the way, it's probably better to have an array rather than a mapping in order to handle the variadic argument. |
An array rather than a mapping for what? |
For the list of parameters. My first suggestion was using a mapping where the key is the parameter name: [PyBytes_AS_STRING.params.string]
ctype = "PyObject *"
refdelta = 0 but for you, it would be: [[PyArg_Parse.params]]
name = "string"
ctype = "PyObject *"
refdelta = 0 |
Fix incorrect entries in `Doc/data/refcounts.dat`
…GH-127451) Fix incorrect entries in `Doc/data/refcounts.dat` (cherry picked from commit 1f8267b) Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
…GH-127451) Fix incorrect entries in `Doc/data/refcounts.dat` (cherry picked from commit 1f8267b) Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
…#127451) Fix incorrect entries in `Doc/data/refcounts.dat`
I suggest putting the info in C headers rather than a new format -- unless you can generate the headers (or something else that's helpful) from the new format. That said, I've tried it and found some dead ends. I still think it'd be worthwhile, though: and external file is bound to become unmaintained. |
Actually What we can do however, is generate the (unfilled) entries from the headers directly and the dev should only fill out the reference count effect. |
Do you want to treat it as the first step of a long project? It would be a good first step :) |
Honestly, yes. Because knowing exactly what a function does on reference counts is always hard when you don't want to read the entire code... or open the docs (which sometimes do not tell you more =/) |
I commented with parts elsewhere but I'll consolidate here: Would it make sense to prefix existing comments with #, and leave field 5 to annotations like stolen? IMO, if we're filling this info in, it could make sense to look at each function and add more annotations to it than just
|
The
Doc/data/refcounts.dat
file is sometimes updated and sometimes not. I suggest we try to make it up-to-date as much as possible, at least by adding the functions part of the stable ABI.In addition, the entries are not sorted alphabetically. While they are semantically sorted, it does not help when adding new entries (where should we put it?). So I suggest to reformat the file once I'm done with adding the new entries.
Finally, I suggest adding a small script that checks whether the file is up-to-date, at least by checking whether the stable ABI is a subset of that file and whether the refcounts.dat is sorted or not (again it will be part of a CI workflow).
The main motivation behind this is to help newcomers in understanding whether a reference is borrowed or not (sometimes the comment doesn't say anything). It could also help in managing a future linter. Note that we have no syntax to indicate whether a reference is stolen or not so we should also improve that part.
cc @vstinner @encukou @Eclips4 @ZeroIntensity
Linked PRs
Doc/data/refcounts.dat
#127451Doc/data/refcounts.dat
(GH-127451) #127496Doc/data/refcounts.dat
(GH-127451) #127497Doc/data/refcounts.dat
#127468Doc/data/refcounts.dat
#127476Tasks
The text was updated successfully, but these errors were encountered: