-
Notifications
You must be signed in to change notification settings - Fork 52
Description
I have been working on attempting to auto-generate a version of the NumPy API based on it's usage from downstream libraries. I am far enough along to present some end to end results, but I still need to run it with more examples for it to be that meaningful.
Here is the generating numpy
module, based on running the skimage, xarray, and sklearn test suites.
Next steps
I would appreciate any feedback on the end result or the process. My next steps are to start looking for more codebases to run and analyze. If you wanna take it for a spin, please feel free to clone the repo and run it on your own codebase, and upload the results as well. I will work on adding some more instructions, but the Makefile should get you started.
Also, it would be nice to match it against the documentation data or other more curated resources. We could also experiment with hand writing a list of included functions/classes, and letting this generate signatures for us.
Broadly speaking, this can help us get a sense of what the current API usage looks like for different array libraries and so could help form the base of a proposed API spec. The JSON format is a bit verbose, but does work at describing the different forms of the APIs.
Any other ideas on where to move with this would be appreciated. Or better yet, download the data and tools yourself and see if it's useful.
How?
That prettier form is generated from a structured JSON file, which in turn is generated from the various traces of running the different test suites.
It works by using the setprofile
hook to intercept every bytcode execution, and peek at the stack to see if it's a function call what the function and arguments are. It then saves calls from some particular module (xarray and skimage in this case) and to some particular module (numpy), ignoring the rest.
For the API generation, it tries to take the union of the various types and call signatures to come up with a single signature for each function.
Lot's of limitations here, but it gives a start. Again, any feedback would be much appreciated.