Some import/module adjustments #992

stinodego · 2021-07-15T23:31:37Z

Ok, I promise this is my last unsollicited refactor 😬

I noticed an issue in the current code base with the star imports in the __init__.py files. They would not only bring up the objects (like DataFrame), but also the modules (like frame). This would cause a clash; because now polars.frame could point to either the eager or lazy version of the module (it would point to the lazy version, but no idea why specifically). This is untransparent. So I fixed it.

I also moved some other stuff around; possibly breaking backwards compatibility. Please read carefully. Maybe I need to put some stuff back / deprecate it instead. Let me know.

Changes:

Added __all__ dunders to the __init__.py files to solve the problem mentioned above.
This caused some issues with mypy, so I had to explicitly import Series, DataFrame, etc. in the top __init__.py (I filed a mypy issue for this... I believe there's a bug).
Removed the dtype_to_int function in the datatypes module. It was unused internally, and I see no use for this function.
Removed the from_rows function. Decided this is too soon/rigorous. Let's improve the DataFrame constructor first. I did adjust the tests to use the 'source' function (DataFrame.from_rows) instead.
I tried hiding the wrapping functions (wrap_s, wrap_df, etc.) from the main scope, as I believe users should never have to explicitly use these. But the Rust backend expects those functions to be there. I left them for now; something for the future maybe (possibly just rename them with a leading underscore or something).
Renamed lazy/expr_functions to lazy/functions (now possible thanks to the fix to the first mentioned issue). This conforms to the syntax people know from pyspark: use from polars.lazy import functions as F and then use F.col, F.sum, etc.
Split up functions.py into io.py (for all the read functions like read_csv, etc.) and eager/functions.py (for concat, get_dummies, etc.).
Moved StringCache and toggle_string_cache to their own file.

ritchie46 · 2021-07-17T08:03:56Z

I am bouldering in the woods, so I will take a close look after this weekend. Some initial thoughts:

All DataFrame constructors except the init are created from the pl namespace. Think pl.read_csv, pl.scan_csv, pl.from_arrow, etc. In that light pl.from_rows is preferred over creation from DataFrame.

For functions, I plan to make all work in eager and lazy alike. So maybe we should create a top level functions and add the hybrid functions there.

stinodego · 2021-07-17T11:54:02Z

I am bouldering in the woods, so I will take a close look after this weekend. Some initial thoughts:

All DataFrame constructors except the init are created from the pl namespace. Think pl.read_csv, pl.scan_csv, pl.from_arrow, etc. In that light pl.from_rows is preferred over creation from DataFrame.

For functions, I plan to make all work in eager and lazy alike. So maybe we should create a top level functions and add the hybrid functions there.

Enjoy your trip!

Regarding from_rows: I do see a distinction betwen read_csv (filesystem I/O) and from_rows (takes a Python object). from_arrow would actually be in the same category as from_rows.

I do agree that the distiction between the lazy and the eager module feels quite arbitrary.

Maybe park this MR for now and take a moment next week to just think through what the ideal module setup would like like?

ritchie46 · 2021-07-19T08:02:02Z

Maybe park this MR for now and take a moment next week to just think through what the ideal module setup would like like?

I think we can still merge this work in if we keep a top level functions module. I also think the string_cache module should be top level, as it does not make sense to speak of it in an eager or lazy manner.

Great work, I do think its an improvement. :)

stinodego · 2021-07-19T08:05:37Z

Maybe park this MR for now and take a moment next week to just think through what the ideal module setup would like like?

I think we can still merge this work in if we keep a top level functions module. I also think the string_cache module should be top level, as it does not make sense to speak of it in an eager or lazy manner.

Great work, I do think its an improvement. :)

All right, I will shuffle things around and make that happen. Probably tonight or tomorrow.

stinodego · 2021-07-19T15:44:16Z

All right, updated the MR with the points we discussed. I still have all the file IO functions in a separate io.py; wasn't sure if you wanted those back in functions.py. In any case, they're available from top level as pl.read_csv etc.

ritchie46 · 2021-07-19T17:31:13Z

py-polars/tests/test_df.py

@@ -700,12 +700,14 @@ def test_to_json(df):


 def test_from_rows():
-    df = pl.from_rows([[1, 2, "foo"], [2, 3, "bar"]], column_name_mapping={1: "foo"})
+    df = pl.DataFrame.from_rows(


This will also work as pl.from_rows right?

Yes it will! I figured the test_from_rows function here should test the actual from_rows function that's attached to the DataFrame, not the functions version (since it's test_frame.py). The other from_rows function is tested earlier in the file, in test_concat.

Great! Then I merge it in. Thanks again. :)

* Added __all__ dunders to the __init__.py files to solve the problem mentioned above. This caused some issues with mypy, so I had to explicitly import Series, DataFrame, etc. in the top __init__.py (I filed a mypy issue for this... I believe there's a bug). * Removed the dtype_to_int function in the datatypes module. It was unused internally, and I see no use for this function. * I tried hiding the wrapping functions (wrap_s, wrap_df, etc.) from the main scope, as I believe users should never have to explicitly use these. But the Rust backend expects those functions to be there. I left them for now; something for the future maybe (possibly just rename them with a leading underscore or something). * Renamed lazy/expr_functions to lazy/functions (now possible thanks to the fix to the first mentioned issue). This conforms to the syntax people know from pyspark: use from polars.lazy import functions as F and then use F.col, F.sum, etc. Split up functions.py into io.py (for all the read functions like read_csv, etc.) and eager/functions.py (for concat, get_dummies, etc.). * Moved StringCache and toggle_string_cache to their own file.

stinodego added 16 commits July 19, 2021 17:09

Add __all__ to init files

0088442

Some import fixes

157d631

Split functions into io.py and functions.py

97b82ba

Reintroduce wrap functions to top level

847bff0

Remove from_rows function

e52a29a

Move functions and stringcache to eager

777cf90

Fix from_rows tests

259aa21

Fix mypy error

04d1bbc

Refactor dtype_to_int

5818b38

Whitespace fix

fff2cb6

Remove unused function dtype_to_int

10f1af8

Add back from_rows function

6e5a380

Update from_rows columns to take sequence

1ee30fe

Move functions and string_cache to top level

f4b8389

Clean up __init__

e1ff753

Make test use functions.from_rows

cf7406b

ritchie46 reviewed Jul 19, 2021

View reviewed changes

ritchie46 merged commit 6f80cf8 into pola-rs:master Jul 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some import/module adjustments #992

Some import/module adjustments #992

stinodego commented Jul 15, 2021 •

edited

Loading

ritchie46 commented Jul 17, 2021

stinodego commented Jul 17, 2021

ritchie46 commented Jul 19, 2021

stinodego commented Jul 19, 2021

stinodego commented Jul 19, 2021

ritchie46 Jul 19, 2021

stinodego Jul 19, 2021

ritchie46 Jul 19, 2021

Some import/module adjustments #992

Some import/module adjustments #992

Conversation

stinodego commented Jul 15, 2021 • edited Loading

ritchie46 commented Jul 17, 2021

stinodego commented Jul 17, 2021

ritchie46 commented Jul 19, 2021

stinodego commented Jul 19, 2021

stinodego commented Jul 19, 2021

ritchie46 Jul 19, 2021

Choose a reason for hiding this comment

stinodego Jul 19, 2021

Choose a reason for hiding this comment

ritchie46 Jul 19, 2021

Choose a reason for hiding this comment

stinodego commented Jul 15, 2021 •

edited

Loading