Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation and annotations for all user facing python classes and functions #749

Closed
timsaucer opened this issue Jul 9, 2024 · 0 comments · Fixed by #750
Closed
Labels
enhancement New feature or request

Comments

@timsaucer
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Per discussion in the datafusion python discord channel, some users feel that the datafusion-python project is not "pythonic". Some, such as myself, have found it necessary to dig through the rust documentation to discover how to use the features that are currently exposed. Some classes and functions have documentation available, but most do not. For example, see the API page for functions. Here is one randomly selected entry from that page:

datafusion.functions.functions.approx_distinct(*args, distinct=False)

As a user, the only way right now to understand both what args can and must be passed or to understand the utility of this function is to dig into the rust code, either in this repo or the datafusion repo.

Additionally, from the point of view of a python user who wants to look at the list of functions that are generated, there is no easy way to do this from the repository itself. One can look at the online documentation as linked above. However many users like to clone the repo and look through the code themselves. It can be obscure to python users who are unfamiliar with rust procedural macros how we generate and expose functions and classes. For these users, looking into the python/datafusion directory within this repo is not helpful.

Describe the solution you'd like

Similar to the approach used by the polars project, it would be nice to have wrappers for the functions and classes that our end users interact with. I have identified two down sides to doing this. It will add an additional step for the developer to expose a new function and it will increase the number of calls. The benefit is that the repository will be much more user friendly to python developers.

Describe alternatives you've considered

An alternative approach is to use .pyi files inside the python/datafusion directory as started in this repo. These pyi serve a similar purpose to what I have described above. They have the advantage of removing the additional function call that a wrapper introduces. The down side to using the pyi file approach is that there are no guarantees that the pyi files are kept up to date with the underlying code. Function parameters may change as the code evolves and if the user does not update these pyi files we will have documentation that is out of sync with the underlying code. By using wrapper libraries, if these parameters change they will ideally be caught by the unit level tests.

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant