You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Per discussion in the datafusion python discord channel, some users feel that the datafusion-python project is not "pythonic". Some, such as myself, have found it necessary to dig through the rust documentation to discover how to use the features that are currently exposed. Some classes and functions have documentation available, but most do not. For example, see the API page for functions. Here is one randomly selected entry from that page:
As a user, the only way right now to understand both what args can and must be passed or to understand the utility of this function is to dig into the rust code, either in this repo or the datafusion repo.
Additionally, from the point of view of a python user who wants to look at the list of functions that are generated, there is no easy way to do this from the repository itself. One can look at the online documentation as linked above. However many users like to clone the repo and look through the code themselves. It can be obscure to python users who are unfamiliar with rust procedural macros how we generate and expose functions and classes. For these users, looking into the python/datafusion directory within this repo is not helpful.
Describe the solution you'd like
Similar to the approach used by the polars project, it would be nice to have wrappers for the functions and classes that our end users interact with. I have identified two down sides to doing this. It will add an additional step for the developer to expose a new function and it will increase the number of calls. The benefit is that the repository will be much more user friendly to python developers.
Describe alternatives you've considered
An alternative approach is to use .pyi files inside the python/datafusion directory as started in this repo. These pyi serve a similar purpose to what I have described above. They have the advantage of removing the additional function call that a wrapper introduces. The down side to using the pyi file approach is that there are no guarantees that the pyi files are kept up to date with the underlying code. Function parameters may change as the code evolves and if the user does not update these pyi files we will have documentation that is out of sync with the underlying code. By using wrapper libraries, if these parameters change they will ideally be caught by the unit level tests.
Additional context
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Per discussion in the datafusion python discord channel, some users feel that the
datafusion-python
project is not "pythonic". Some, such as myself, have found it necessary to dig through the rust documentation to discover how to use the features that are currently exposed. Some classes and functions have documentation available, but most do not. For example, see the API page for functions. Here is one randomly selected entry from that page:As a user, the only way right now to understand both what
args
can and must be passed or to understand the utility of this function is to dig into the rust code, either in this repo or thedatafusion
repo.Additionally, from the point of view of a python user who wants to look at the list of functions that are generated, there is no easy way to do this from the repository itself. One can look at the online documentation as linked above. However many users like to clone the repo and look through the code themselves. It can be obscure to python users who are unfamiliar with rust procedural macros how we generate and expose functions and classes. For these users, looking into the
python/datafusion
directory within this repo is not helpful.Describe the solution you'd like
Similar to the approach used by the polars project, it would be nice to have wrappers for the functions and classes that our end users interact with. I have identified two down sides to doing this. It will add an additional step for the developer to expose a new function and it will increase the number of calls. The benefit is that the repository will be much more user friendly to python developers.
Describe alternatives you've considered
An alternative approach is to use
.pyi
files inside thepython/datafusion
directory as started in this repo. Thesepyi
serve a similar purpose to what I have described above. They have the advantage of removing the additional function call that a wrapper introduces. The down side to using thepyi
file approach is that there are no guarantees that thepyi
files are kept up to date with the underlying code. Function parameters may change as the code evolves and if the user does not update thesepyi
files we will have documentation that is out of sync with the underlying code. By using wrapper libraries, if these parameters change they will ideally be caught by the unit level tests.Additional context
The text was updated successfully, but these errors were encountered: