-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some import/module adjustments #992
Conversation
I am bouldering in the woods, so I will take a close look after this weekend. Some initial thoughts: All DataFrame constructors except the init are created from the For functions, I plan to make all work in eager and lazy alike. So maybe we should create a top level functions and add the hybrid functions there. |
Enjoy your trip! Regarding I do agree that the distiction between the Maybe park this MR for now and take a moment next week to just think through what the ideal module setup would like like? |
I think we can still merge this work in if we keep a top level Great work, I do think its an improvement. :) |
All right, I will shuffle things around and make that happen. Probably tonight or tomorrow. |
All right, updated the MR with the points we discussed. I still have all the file IO functions in a separate |
@@ -700,12 +700,14 @@ def test_to_json(df): | |||
|
|||
|
|||
def test_from_rows(): | |||
df = pl.from_rows([[1, 2, "foo"], [2, 3, "bar"]], column_name_mapping={1: "foo"}) | |||
df = pl.DataFrame.from_rows( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also work as pl.from_rows
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it will! I figured the test_from_rows
function here should test the actual from_rows
function that's attached to the DataFrame, not the functions
version (since it's test_frame.py
). The other from_rows
function is tested earlier in the file, in test_concat
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Then I merge it in. Thanks again. :)
* Added __all__ dunders to the __init__.py files to solve the problem mentioned above. This caused some issues with mypy, so I had to explicitly import Series, DataFrame, etc. in the top __init__.py (I filed a mypy issue for this... I believe there's a bug). * Removed the dtype_to_int function in the datatypes module. It was unused internally, and I see no use for this function. * I tried hiding the wrapping functions (wrap_s, wrap_df, etc.) from the main scope, as I believe users should never have to explicitly use these. But the Rust backend expects those functions to be there. I left them for now; something for the future maybe (possibly just rename them with a leading underscore or something). * Renamed lazy/expr_functions to lazy/functions (now possible thanks to the fix to the first mentioned issue). This conforms to the syntax people know from pyspark: use from polars.lazy import functions as F and then use F.col, F.sum, etc. Split up functions.py into io.py (for all the read functions like read_csv, etc.) and eager/functions.py (for concat, get_dummies, etc.). * Moved StringCache and toggle_string_cache to their own file.
* Added __all__ dunders to the __init__.py files to solve the problem mentioned above. This caused some issues with mypy, so I had to explicitly import Series, DataFrame, etc. in the top __init__.py (I filed a mypy issue for this... I believe there's a bug). * Removed the dtype_to_int function in the datatypes module. It was unused internally, and I see no use for this function. * I tried hiding the wrapping functions (wrap_s, wrap_df, etc.) from the main scope, as I believe users should never have to explicitly use these. But the Rust backend expects those functions to be there. I left them for now; something for the future maybe (possibly just rename them with a leading underscore or something). * Renamed lazy/expr_functions to lazy/functions (now possible thanks to the fix to the first mentioned issue). This conforms to the syntax people know from pyspark: use from polars.lazy import functions as F and then use F.col, F.sum, etc. Split up functions.py into io.py (for all the read functions like read_csv, etc.) and eager/functions.py (for concat, get_dummies, etc.). * Moved StringCache and toggle_string_cache to their own file.
Ok, I promise this is my last unsollicited refactor 😬
I noticed an issue in the current code base with the star imports in the
__init__.py
files. They would not only bring up the objects (likeDataFrame
), but also the modules (likeframe
). This would cause a clash; because nowpolars.frame
could point to either the eager or lazy version of the module (it would point to the lazy version, but no idea why specifically). This is untransparent. So I fixed it.I also moved some other stuff around; possibly breaking backwards compatibility. Please read carefully. Maybe I need to put some stuff back / deprecate it instead. Let me know.
Changes:
__all__
dunders to the__init__.py
files to solve the problem mentioned above.Series
,DataFrame
, etc. in the top__init__.py
(I filed a mypy issue for this... I believe there's a bug).dtype_to_int
function in thedatatypes
module. It was unused internally, and I see no use for this function.Removed theDecided this is too soon/rigorous. Let's improve the DataFrame constructor first. I did adjust the tests to use the 'source' function (from_rows
function.DataFrame.from_rows
) instead.wrap_s
,wrap_df
, etc.) from the main scope, as I believe users should never have to explicitly use these. But the Rust backend expects those functions to be there. I left them for now; something for the future maybe (possibly just rename them with a leading underscore or something).lazy/expr_functions
tolazy/functions
(now possible thanks to the fix to the first mentioned issue). This conforms to the syntax people know frompyspark
: usefrom polars.lazy import functions as F
and then useF.col
,F.sum
, etc.functions.py
intoio.py
(for all the read functions likeread_csv
, etc.) andeager/functions.py
(forconcat
,get_dummies
, etc.).StringCache
andtoggle_string_cache
to their own file.