[ENH] to_orc #43860

ghost · 2021-10-03T14:24:53Z

Add pandas.io.orc.to_orc method definition

pandas.io.orc.to_orc method definition

pep8speaks · 2021-10-03T14:24:55Z

Hello @NickFillot! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file pandas/core/frame.py:

Line 2737:1: W293 blank line contains whitespace
Line 2784:89: E501 line too long (90 > 88 characters)
Line 2811:1: W293 blank line contains whitespace

In the file pandas/io/orc.py:

Line 102:1: W293 blank line contains whitespace
Line 111:1: W293 blank line contains whitespace

Comment last updated at 2021-10-03 14:47:17 UTC

set to_orc to pandas.DataFrame

debnathshoham · 2021-10-03T17:35:21Z

Thanks for the PR @NickFillot!
Is there an existing pandas issue number you're trying to address?
Do you mind creating one, if there isn't one?

ghost · 2021-10-03T18:05:09Z

Thanks for the PR @NickFillot! Is there an existing pandas issue number you're trying to address? Do you mind creating one, if there isn't one?

Just created one @ to_orc Issue didn't see one related to it

Thank you

jreback · 2021-10-03T18:49:07Z

tests pls

follow the existing way we test to_parquet for example with the fixtures that skip based in thr version

jreback

see comments

alimcmaster1 · 2021-10-12T23:01:34Z

pandas/io/orc.py

+        try:
+            assert engine.__name__ == 'pyarrow', "engine must be 'pyarrow' module"
+            assert hasattr(engine, 'orc'), "'pyarrow' module must have orc module"
+        except Exception as e:


Can be more specific about the exception type

alimcmaster1 · 2021-10-12T23:04:20Z

pandas/io/orc.py

+
+    if path is None:
+        # to bytes: tmp path, pyarrow auto closes buffers
+        with tm.ensure_clean(os.path.join(gettempdir(), os.urandom(12).hex())) as path:


Why is this getting written to a file? Thought path = None will just return byte string?

Yes I do close the file-like object from my side by default in Arrow. It does seem to be different from the behavior of the Parquet writer in Arrow. If this is indeed an issue I can discuss with the Arrow community whether we should change it.

Right now I use PyArrow buffer and avoid creating a temp file.

alimcmaster1 · 2021-10-12T23:05:22Z

pandas/core/frame.py

+        Write a DataFrame to the orc/arrow format.
+        Parameters
+        ----------
+        df : DataFrame


Can we reuse the docstring opposed to copy/paste

Hmm there isn't an orc/arrow format. Maybe it should be "Write a DataFrame to the ORC format using PyArrow"?

alimcmaster1 · 2021-10-12T23:06:03Z

Thanks for the PR @NickFillot comments above!

ghost · 2021-10-13T19:32:28Z

Working on tests, i'm trying to understand how pandas testing works

iajoiner · 2021-10-17T23:48:17Z

@NickFillot Thanks for working on this! Note that your ordering actually doesn't work for write_table in pyarrow 4.0.0 so please either use the path, table ordering to accommodate that version or set the minimum version of pyarrow to 4.0.1.

jreback

pls add tests.

jreback · 2021-10-18T23:22:19Z

pandas/core/frame.py

+            a bytes object is returned.
+        engine : {{'pyarrow'}}, default 'pyarrow'
+            Parquet library to use, or library it self, checked with 'pyarrow' name
+            and version > 4.0.0


is it > 4.0.0, meaning >= 5.0? would be more informative

iajoiner · 2021-11-14T18:06:20Z

@NickFillot Do you mind me reopening it?

iajoiner · 2021-11-21T09:57:33Z

This PR has been reopened as #44554

[ENH] to_orc

fe84920

pandas.io.orc.to_orc method definition

NickFillot added 2 commits October 3, 2021 16:34

pandas.DataFrame.to_orc

6cc7030

set to_orc to pandas.DataFrame

Cleaning

2d1515e

jreback requested changes Oct 3, 2021

View reviewed changes

jreback added IO Parquet parquet, feather Enhancement labels Oct 4, 2021

alimcmaster1 reviewed Oct 12, 2021

View reviewed changes

jreback requested changes Oct 18, 2021

View reviewed changes

ghost closed this Nov 12, 2021

ghost deleted the patch-2 branch November 12, 2021 10:44

iajoiner mentioned this pull request Nov 21, 2021

[EHN] pandas.DataFrame.to_orc #44554

Merged

4 tasks

This pull request was closed.

Uh oh!

[ENH] to_orc #43860

[ENH] to_orc #43860

Uh oh!

Conversation

ghost commented Oct 3, 2021

Uh oh!

pep8speaks commented Oct 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2021-10-03 14:47:17 UTC

Uh oh!

debnathshoham commented Oct 3, 2021

Uh oh!

ghost commented Oct 3, 2021

Uh oh!

jreback commented Oct 3, 2021

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

alimcmaster1 Oct 12, 2021

Choose a reason for hiding this comment

Uh oh!

iajoiner Nov 21, 2021

Choose a reason for hiding this comment

Uh oh!

alimcmaster1 Oct 12, 2021

Choose a reason for hiding this comment

Uh oh!

iajoiner Nov 21, 2021

Choose a reason for hiding this comment

Uh oh!

alimcmaster1 Oct 12, 2021

Choose a reason for hiding this comment

Uh oh!

iajoiner Oct 17, 2021

Choose a reason for hiding this comment

Uh oh!

alimcmaster1 commented Oct 12, 2021

Uh oh!

ghost commented Oct 13, 2021

Uh oh!

iajoiner commented Oct 17, 2021

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

jreback Oct 18, 2021

Choose a reason for hiding this comment

Uh oh!

iajoiner commented Nov 14, 2021

Uh oh!

iajoiner commented Nov 21, 2021

Uh oh!

Uh oh!

pep8speaks commented Oct 3, 2021 •

edited

Loading