[WIP] Excel table output #24899

tdamsma · 2019-01-24T09:45:33Z

closes Output excel table objects with to_xlsx() #24862
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This PR should allow writing excel table objects. Proof of concept implemented for XlsxWriter and OpenPyXL. Exactly which options of the to_excel api should be supported is up for discussion, see #24862.

import pandas as pd
data = {'B' : dict(col1=1), 'A' : dict(col1=2), 'C' :dict(col1=3, col2=4.1)}
df = pd.DataFrame.from_dict(data, orient='index')

df.to_excel('test1.xlsx', engine='xlsxwriter', as_table=True, index=True)
df.to_excel('test2.xlsx', engine='openpyxl', as_table=True, index=False)

WillAyd

Thanks for the PR! Gave it a very brief review but the critical thing missing here is tests. Ideally should have those first and foremost - can you add some for the expected output here?

pandas/io/excel.py

tdamsma · 2019-01-24T10:28:45Z

@WillAyd, can you offer some guidance on how to test this functionality? I presume there is no suitable way to involve MS Excel in the process, what should be tested? If the function runs without errors? If the data can be loaded from the excel file? As far as I know there is not a simple way to load an excel table in python referenced by the e.g. tablename.

WillAyd · 2019-01-24T10:35:27Z

Sure - check pandas/tests/io/data you will see Excel files created which explicitly show the intended layout. The corresponding writing test will compare a generated file against what was expected.

…

Sent from my iPhone

On Jan 24, 2019, at 5:28 AM, Thijs Damsma ***@***.***> wrote: @WillAyd, can you offer some guidance on how to test this functionality? I presume there is no suitable way to involve MS Excel in the process, what should be tested? If the function runs without errors? If the data can be loaded from the excel file? As far as I know there is not a simple way to load an excel table in python referenced by the e.g. tablename. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

codecov · 2019-01-27T16:42:00Z

Codecov Report

Merging #24899 into master will decrease coverage by 49.49%.
The diff coverage is 12.5%.

@@            Coverage Diff             @@
##           master   #24899      +/-   ##
==========================================
- Coverage   92.38%   42.88%   -49.5%     
==========================================
  Files         166      166              
  Lines       52398    52418      +20     
==========================================
- Hits        48406    22480   -25926     
- Misses       3992    29938   +25946

Flag	Coverage Δ
#multiple	`?`
#single	`42.88% <12.5%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/generic.py	`39.86% <ø> (-56.77%)`	⬇️
pandas/io/formats/excel.py	`14.69% <12.5%> (-82.71%)`	⬇️
pandas/io/formats/latex.py	`0% <0%> (-100%)`	⬇️
pandas/core/categorical.py	`0% <0%> (-100%)`	⬇️
pandas/io/sas/sas_constants.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/converter.py	`0% <0%> (-100%)`	⬇️
pandas/io/formats/html.py	`0% <0%> (-99.35%)`	⬇️
pandas/core/groupby/categorical.py	`0% <0%> (-95.46%)`	⬇️
pandas/io/sas/sas7bdat.py	`0% <0%> (-91.17%)`	⬇️
... and 125 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2b16e2e...ca9c36e. Read the comment docs.

codecov · 2019-01-27T16:42:00Z

Codecov Report

Merging #24899 into master will decrease coverage by 0.44%.
The diff coverage is 87.5%.

@@            Coverage Diff             @@
##           master   #24899      +/-   ##
==========================================
- Coverage   92.83%   92.38%   -0.45%     
==========================================
  Files         182      166      -16     
  Lines       50430    52403    +1973     
==========================================
+ Hits        46815    48411    +1596     
- Misses       3615     3992     +377

Flag	Coverage Δ
#multiple	`90.8% <87.5%> (-0.7%)`	⬇️
#single	`42.88% <12.5%> (+0.43%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/generic.py	`96.63% <ø> (+2.81%)`	⬆️
pandas/io/formats/excel.py	`97.12% <87.5%> (-0.34%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-75%)`	⬇️
pandas/compat/__init__.py	`57.91% <0%> (-34.09%)`	⬇️
pandas/plotting/_misc.py	`38.68% <0%> (-26.18%)`	⬇️
pandas/io/common.py	`72.86% <0%> (-21.19%)`	⬇️
pandas/io/gcs.py	`80% <0%> (-20%)`	⬇️
pandas/io/s3.py	`86.36% <0%> (-13.64%)`	⬇️
pandas/core/computation/expr.py	`88.68% <0%> (-9.12%)`	⬇️
pandas/core/groupby/base.py	`91.83% <0%> (-8.17%)`	⬇️
... and 180 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c64c9cb...d0a6391. Read the comment docs.

pep8speaks · 2019-01-27T16:42:02Z

Hello @tdamsma! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-07-15 09:50:42 UTC

tdamsma · 2019-01-27T19:46:45Z

@WillAyd, I added a simple round-trip test. I tried to look for a example on how to check formatting, but the only test for formatting is completely commented out (test_to_excel_header_styling_xls). As far as I can tell the example sheets are only used in read tests, and not for comparison with write tests. I addressed your other comments, can you give some feedback if the general approach fits with pandas? I know there are still many tests to add, but I would like to know if I am on the right track for having this merged eventually. Thanks!

WillAyd · 2019-01-29T17:42:45Z

@tdamsma yea this might be tricky; not sure of best approach off the top of my head, but are there any testing facilities you know of with say openpyxl that would allow us to make stronger assertions about what is being written out? As is now there's just no way that the tests ensure this is actually working

tdamsma · 2019-01-30T07:44:51Z

@WillAyd, I wasn't aware of any options, but I just had a better look at openpyxl and this is indeed possible. In essence we need a function that can read an excel table into a dataframe, which would also be quite useful for pandas in general.

WillAyd · 2019-02-05T04:59:33Z

@tdamsma given conversation in associated issue number and some of the other pre-cursors going to close this one for now to keep our queue minimal. If you plan on addressing soon let me know, otherwise can reopen as we get back around to this

# Conflicts: # pandas/core/generic.py # pandas/io/excel.py # pandas/io/formats/excel.py # pandas/tests/io/test_excel.py

… string

WillAyd · 2019-07-10T14:27:46Z

pandas/core/generic.py

@@ -2182,8 +2182,8 @@ def _repr_data_resource_(self):
    freeze_panes : tuple of int (length 2), optional
        Specifies the one-based bottommost row and rightmost column that
        is to be frozen.
-
-        .. versionadded:: 0.20.0.


We want to keep this

WillAyd · 2019-07-10T14:29:50Z

pandas/io/excel/_xlsxwriter.py

@@ -235,3 +235,51 @@ def write_cells(
                )
            else:
                wks.write(startrow + cell.row, startcol + cell.col, val, style)
+
+    def write_table(


I don't think this needs to be a separate method at all. Can't you just use the normal write method and apply the table after cells are written out?

I think that can be done, but there are several differences with the normal write function so don't think the result will be that nice. Key differences:

merge cells are not allowed for tables

header needs to be dealt with differently

individual call styling makes little sense (don't think it should be supported or at least discouraged) for tables

Disregard the above, I think I can make it work with not too big adjustments

WillAyd · 2019-07-10T14:30:57Z

pandas/tests/io/excel/test_writers.py

+        with ensure_clean(ext) as pth:
+            df.to_excel(pth, header=True, table="Table1")
+
+            xf = ExcelFile(pth)


Should use this as a context manager

Are you sure? This style is consistent with all the other tests

WillAyd · 2019-07-10T14:31:33Z

pandas/tests/io/excel/test_writers.py

+            result = pd.read_excel(xf, xf.sheet_names[0], index_col=0)
+
+            tm.assert_frame_equal(result, df)
+            assert result.index.name == "foo"


Is there a better assertion to be had to make sure a table actually appears in the excel file? read_excel isn't going to care about Table styles so I think need to approach another way

The tests definitely need some work

Ok, implemented a proof of concept. Don't really like it to be hones as for openpyxl the default formatting needs to be disabled which was not really supported, and for xlsxwriter the column names need to be passed to the table explicitly, otherwise previously written header cells are over written with generated default names.

Unfortunately need some ugly hacks

WillAyd · 2019-07-11T20:13:11Z

pandas/core/generic.py

@@ -2185,6 +2185,9 @@ def _repr_data_resource_(self):

        .. versionadded:: 0.20.0.

+    table : string, default None


Would accepting a TableStyle be an option instead (speaking only in openpyxl terms, not sure if xlsxwriter offers that)? I feel like that could do the same thing but also give users more power over output formatting

Both accept a style, but of course the syntax is a bit different: openpyxl uses "TableStyleMedium9" as default, and xlsxwriter uses "Table Style Medium 9".

Hmm wasn't thinking about that as much as having the user pass in an actual object itself

Do you mean something like?

style = TableStyleInfo(name="TableStyleMedium9", showFirstColumn=False, showLastColumn=False, showRowStripes=True, showColumnStripes=True)

Right that's what I had in mind (not tied to it just asking)

WillAyd · 2019-07-11T20:15:21Z

pandas/io/excel/_openpyxl.py

@@ -389,6 +389,18 @@ def _convert_to_protection(cls, protection_dict):

        return Protection(**protection_dict)

+    @classmethod
+    def _to_excel_range(cls, startrow, startcol, endrow, endcol):


Rather than using this can you just pass a CellRange into openpyxl?

Yes, but doesn't the CellRange constructor only accept a string of the form A1:B2?

I think can use min_col, min_row, max_col, max_row from constructor

https://openpyxl.readthedocs.io/en/stable/api/openpyxl.worksheet.cell_range.html#openpyxl.worksheet.cell_range.CellRange

indeed, the cell range can be constructed from those arguments, however the Table constructor itself really only accepts a string. Internally also the string representation is used, so I don't see a way around this.

I think can just cast to a str then:

>>> from openpyxl.worksheet.cell_range import CellRange >>> str(CellRange(min_col=1, min_row=1, max_col=3, max_row=3)) 'A1:C3'

That's what we are ultimately after here right? (save maybe off-by-one)

That was what I expected, but I was on openpyxl 2.4.8, and this apparently a 2.5.0 feature. Assuming updating the minimum supported version is not an issue I will fix this

WillAyd · 2019-07-11T20:17:41Z

pandas/io/excel/_openpyxl.py

        for cell in cells:
+            n_cols = max(n_cols, cell.col)


Can these be inferred outside of the loop from the dimensions of the frame?

There is quite some code translating a frame to excel cells, dealing with multiindexes etc. So this is not too straight forward. But that code in intertwined with the formatting code, so I considered the following options:

get size from frame, try to deal with edge cases for multiindex, index True/False, header True/False etc

get result from the get_formatted_cells iterator, run through it a second time

bypass the normal writer function altogether and use a separate, dedicated write_table function

extract the size from the writer function

I picked the latter

Hmm OK. At least outside the loop though wouldn't the nrows be len(cells) and the ncols just be the length of any item within cells?

unfortunately the cells are a 1D iterator of items (each with a row and col property), not a list of rows.

WillAyd · 2019-07-11T20:18:23Z

pandas/io/formats/excel.py

@@ -408,19 +418,7 @@ def __init__(
        self.header = header
        self.merge_cells = merge_cells
        self.inf_rep = inf_rep
-
-    @property


Why did you change this?

To provide an interface to override (i.e. disable) the cell styling

….5.0

WillAyd · 2019-08-24T08:01:56Z

I think this has gone a little stale. Our PR queue has grown quite large so closing for now to keep things clean but ping if you'd like to pick back up!

ENH: implement excel table (pandas-dev#24862)

32f10e5

WillAyd requested changes Jan 24, 2019

View reviewed changes

pandas/io/excel.py Outdated Show resolved Hide resolved

pandas/io/excel.py Outdated Show resolved Hide resolved

pandas/io/excel.py Outdated Show resolved Hide resolved

WillAyd added the IO Excel read_excel, to_excel label Jan 24, 2019

tdamsma added 4 commits January 27, 2019 17:22

Merge branch 'master' into excel-tables-pandas-dev#24862

5f9d664

revert unintended search and replace

4b857d4

use loops instead of nested comprehension; dotn't use f-strings

8ad332f

add brackets

ca9c36e

tdamsma added 4 commits January 27, 2019 17:44

pep8

5032133

remove another f-string

36301fb

add basic roundtrip test; don't skip writing falsey comn names

f0e9583

formatting fixes

c12a9f6

tdamsma mentioned this pull request Jan 30, 2019

Output excel table objects with to_xlsx() #24862

Closed

WillAyd closed this Feb 5, 2019

tdamsma added 2 commits July 9, 2019 16:10

Merge branch 'master' into excel-tables-pandas-dev#24862

9636999

# Conflicts: # pandas/core/generic.py # pandas/io/excel.py # pandas/io/formats/excel.py # pandas/tests/io/test_excel.py

rename keyword 'as_table' to 'table', and change type from boolean to…

d0a6391

… string

WillAyd reopened this Jul 9, 2019

tdamsma force-pushed the excel-tables-#24862 branch from 1aab7ca to cbc5d01 Compare July 10, 2019 08:59

don't use f-string, fix formatting issues

86d5499

tdamsma force-pushed the excel-tables-#24862 branch from cbc5d01 to 86d5499 Compare July 10, 2019 09:44

WillAyd requested changes Jul 10, 2019

View reviewed changes

unde delete versionadd comment

be7033b

tdamsma added 3 commits July 11, 2019 22:05

improve test, few fixes

0b72d6e

Improve test

4302231

Use writer to write cells, and apply table formatting afterwards.

91deb15

Unfortunately need some ugly hacks

WillAyd reviewed Jul 11, 2019

View reviewed changes

replace _to_excel_range with CellRange, upgrade openpyxl min ver to 2…

22739b4

….5.0

WillAyd closed this Aug 24, 2019

0ignacia0 mentioned this pull request Dec 3, 2019

use named table in Excel spreadsheet, if it exists BCDA-APS/apstools#122

Closed

		@@ -2185,6 +2185,9 @@ def _repr_data_resource_(self):

		.. versionadded:: 0.20.0.

		table : string, default None

Uh oh!

[WIP] Excel table output #24899

[WIP] Excel table output #24899

Uh oh!

Conversation

tdamsma commented Jan 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tdamsma commented Jan 24, 2019

Uh oh!

WillAyd commented Jan 24, 2019 via email

Uh oh!

codecov bot commented Jan 27, 2019

Codecov Report

Uh oh!

codecov bot commented Jan 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pep8speaks commented Jan 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2019-07-15 09:50:42 UTC

Uh oh!

tdamsma commented Jan 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WillAyd commented Jan 29, 2019

Uh oh!

tdamsma commented Jan 30, 2019

Uh oh!

WillAyd commented Feb 5, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdamsma commented Jan 24, 2019 •

edited

Loading

codecov bot commented Jan 27, 2019 •

edited

Loading

pep8speaks commented Jan 27, 2019 •

edited

Loading

tdamsma commented Jan 27, 2019 •

edited

Loading