Skip to content

[WIP] Excel table output #24899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ci/deps/azure-35-compat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ dependencies:
- jinja2=2.8
- numexpr=2.6.2
- numpy=1.13.3
- openpyxl=2.4.8
- openpyxl=2.5.0
- pytables=3.4.2
- python-dateutil=2.6.1
- python=3.5.3
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/azure-36-locale.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dependencies:
- lxml
- matplotlib=2.2.2
- numpy=1.14.*
- openpyxl=2.4.8
- openpyxl=2.5.0
- python-dateutil
- python-blosc
- python=3.6.*
Expand Down
2 changes: 1 addition & 1 deletion doc/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@ gcsfs 0.2.2 Google Cloud Storage access
html5lib HTML parser for read_html (see :ref:`note <optional_html>`)
lxml 3.8.0 HTML parser for read_html (see :ref:`note <optional_html>`)
matplotlib 2.2.2 Visualization
openpyxl 2.4.8 Reading / writing for xlsx files
openpyxl 2.5.0 Reading / writing for xlsx files
pandas-gbq 0.8.0 Google Big Query access
psycopg2 PostgreSQL engine for sqlalchemy
pyarrow 0.9.0 Parquet and feather reading / writing
Expand Down
2 changes: 1 addition & 1 deletion pandas/compat/_optional.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"matplotlib": "2.2.2",
"numexpr": "2.6.2",
"odfpy": "1.3.0",
"openpyxl": "2.4.8",
"openpyxl": "2.5.0",
"pandas_gbq": "0.8.0",
"pyarrow": "0.9.0",
"pytables": "3.4.2",
Expand Down
5 changes: 5 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -2185,6 +2185,9 @@ def _repr_data_resource_(self):

.. versionadded:: 0.20.0.

table : string, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would accepting a TableStyle be an option instead (speaking only in openpyxl terms, not sure if xlsxwriter offers that)? I feel like that could do the same thing but also give users more power over output formatting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both accept a style, but of course the syntax is a bit different: openpyxl uses "TableStyleMedium9" as default, and xlsxwriter uses "Table Style Medium 9".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm wasn't thinking about that as much as having the user pass in an actual object itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean something like?

style = TableStyleInfo(name="TableStyleMedium9", showFirstColumn=False,
                       showLastColumn=False, showRowStripes=True, showColumnStripes=True)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right that's what I had in mind (not tied to it just asking)

Write the dataframe to a named and formatted excel table object

See Also
--------
to_csv : Write DataFrame to a comma-separated values (csv) file.
Expand Down Expand Up @@ -2249,6 +2252,7 @@ def to_excel(
inf_rep="inf",
verbose=True,
freeze_panes=None,
table=None,
):
df = self if isinstance(self, ABCDataFrame) else self.to_frame()

Expand All @@ -2272,6 +2276,7 @@ def to_excel(
startcol=startcol,
freeze_panes=freeze_panes,
engine=engine,
table=table,
)

def to_json(
Expand Down
36 changes: 36 additions & 0 deletions pandas/io/excel/_openpyxl.py
Original file line number Diff line number Diff line change
Expand Up @@ -409,7 +409,11 @@ def write_cells(
row=freeze_panes[0] + 1, column=freeze_panes[1] + 1
)

n_cols = 0
n_rows = 0
for cell in cells:
n_cols = max(n_cols, cell.col)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be inferred outside of the loop from the dimensions of the frame?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite some code translating a frame to excel cells, dealing with multiindexes etc. So this is not too straight forward. But that code in intertwined with the formatting code, so I considered the following options:

  • get size from frame, try to deal with edge cases for multiindex, index True/False, header True/False etc
  • get result from the get_formatted_cells iterator, run through it a second time
  • bypass the normal writer function altogether and use a separate, dedicated write_table function
  • extract the size from the writer function

I picked the latter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm OK. At least outside the loop though wouldn't the nrows be len(cells) and the ncols just be the length of any item within cells?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately the cells are a 1D iterator of items (each with a row and col property), not a list of rows.

n_rows = max(n_rows, cell.row)
xcell = wks.cell(
row=startrow + cell.row + 1, column=startcol + cell.col + 1
)
Expand Down Expand Up @@ -456,6 +460,38 @@ def write_cells(
for k, v in style_kwargs.items():
setattr(xcell, k, v)

return wks, n_rows, n_cols, False

def format_table(
self, wks, table_name, table_range, header=True, index=True, first_row=None
):
# Format the written cells as table

from openpyxl.worksheet.table import Table, TableStyleInfo
from openpyxl.worksheet.cell_range import CellRange

ref = str(
CellRange(
min_row=table_range[0] + 1,
min_col=table_range[1] + 1,
max_row=table_range[2] + 1,
max_col=table_range[3] + 1,
)
)

tab = Table(displayName=table_name, ref=ref, headerRowCount=1 if header else 0)

# Add a default style with striped rows
style = TableStyleInfo(
name="TableStyleMedium9",
showFirstColumn=index,
showLastColumn=False,
showRowStripes=True,
showColumnStripes=False,
)
tab.tableStyleInfo = style
wks.add_table(tab)


class _OpenpyxlReader(_BaseExcelReader):
def __init__(self, filepath_or_buffer: FilePathOrBuffer) -> None:
Expand Down
31 changes: 31 additions & 0 deletions pandas/io/excel/_xlsxwriter.py
Original file line number Diff line number Diff line change
Expand Up @@ -211,8 +211,15 @@ def write_cells(
if _validate_freeze_panes(freeze_panes):
wks.freeze_panes(*(freeze_panes))

n_cols = 0
n_rows = 0
first_row = {}
for cell in cells:
n_cols = max(n_cols, cell.col)
n_rows = max(n_rows, cell.row)
val, fmt = self._value_with_fmt(cell.val)
if cell.row == 0:
first_row[cell.col] = val

stylekey = json.dumps(cell.style)
if fmt:
Expand All @@ -235,3 +242,27 @@ def write_cells(
)
else:
wks.write(startrow + cell.row, startcol + cell.col, val, style)

return wks, n_rows, n_cols, first_row

def format_table(
self, wks, table_name, table_range, first_row={}, header=True, index=True
):
# Format the written cells as table
options = dict(
autofilter=True,
header_row=header,
banded_columns=False,
banded_rows=True,
first_column=index,
last_column=False,
style="Table Style Medium 9",
total_row=False,
name=table_name,
)
if header:
options["columns"] = [
{"header": first_row[i]} for i in range(len(first_row))
]

wks.add_table(*table_range, options=options)
44 changes: 30 additions & 14 deletions pandas/io/formats/excel.py
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,16 @@ def __init__(
merge_cells=False,
inf_rep="inf",
style_converter=None,
header_style={
"font": {"bold": True},
"borders": {
"top": "thin",
"right": "thin",
"bottom": "thin",
"left": "thin",
},
"alignment": {"horizontal": "center", "vertical": "top"},
},
):
self.rowcounter = 0
self.na_rep = na_rep
Expand Down Expand Up @@ -408,19 +418,7 @@ def __init__(
self.header = header
self.merge_cells = merge_cells
self.inf_rep = inf_rep

@property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To provide an interface to override (i.e. disable) the cell styling

def header_style(self):
return {
"font": {"bold": True},
"borders": {
"top": "thin",
"right": "thin",
"bottom": "thin",
"left": "thin",
},
"alignment": {"horizontal": "center", "vertical": "top"},
}
self.header_style = header_style

def _format_value(self, val):
if is_scalar(val) and missing.isna(val):
Expand Down Expand Up @@ -695,6 +693,7 @@ def write(
startcol=0,
freeze_panes=None,
engine=None,
table=None,
):
"""
writer : string or ExcelWriter object
Expand All @@ -712,6 +711,9 @@ def write(
write engine to use if writer is a path - you can also set this
via the options ``io.excel.xlsx.writer``, ``io.excel.xls.writer``,
and ``io.excel.xlsm.writer``.
table : string, default None
Write the dataframe to a named and formatted excel table object

"""
from pandas.io.excel import ExcelWriter
from pandas.io.common import _stringify_path
Expand All @@ -730,13 +732,27 @@ def write(
writer = ExcelWriter(_stringify_path(writer), engine=engine)
need_save = True

if table is not None:
self.header_style = {}
formatted_cells = self.get_formatted_cells()
writer.write_cells(

worksheet, n_rows, n_cols, first_row = writer.write_cells(
formatted_cells,
sheet_name,
startrow=startrow,
startcol=startcol,
freeze_panes=freeze_panes,
)

if table is not None:
table_range = (startrow, startcol, startrow + n_rows, startcol + n_cols)
writer.format_table(
worksheet,
table_name=table,
table_range=table_range,
first_row=first_row,
header=self.header,
index=self.index,
)
if need_save:
writer.save()
61 changes: 61 additions & 0 deletions pandas/tests/io/excel/test_writers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1212,6 +1212,67 @@ def test_raise_when_saving_timezones(self, engine, ext, dtype, tz_aware_fixture)
df.to_excel(self.path)


@td.skip_if_no("xlrd")
@td.skip_if_no("openpyxl")
@pytest.mark.parametrize(
"engine,ext",
[
pytest.param("openpyxl", ".xlsx"),
pytest.param("openpyxl", ".xlsm"),
pytest.param("xlsxwriter", ".xlsx", marks=td.skip_if_no("xlsxwriter")),
],
)
class TestTable(_WriterBase):
def read_table(self, tablename):
from openpyxl import load_workbook

wbk = load_workbook(self.path, data_only=True, read_only=False)

# first discover all tables in workbook
tables = {}
for wks in wbk:
for table in wks._tables:
tables[table.name] = (table, wks)

# then retrieve the desired one
table, wks = tables[tablename]

columns = [col.name for col in table.tableColumns]
data_rows = wks[table.ref][
(table.headerRowCount or 0) : -table.totalsRowCount
if table.totalsRowCount is not None
else None
]

data = [[cell.value for cell in row] for row in data_rows]
frame = DataFrame(data, columns=columns, index=None)

if table.tableStyleInfo.showFirstColumn:
frame = frame.set_index(columns[0])

return frame

@pytest.mark.parametrize("header", (True, False))
@pytest.mark.parametrize("index", (True, False))
def test_excel_table_options(self, header, index):
df = DataFrame(np.random.randn(2, 4))

df.columns = ["1", "2", "a", "b"]
df.index.name = "foo"

df.to_excel(self.path, header=header, index=index, table="TestTable1")
result = self.read_table("TestTable1")
if not header:
result.columns = df.columns
if index:
result.index.name = df.index.name

if not index:
result.index = df.index

tm.assert_frame_equal(df, result)


class TestExcelWriterEngineTests:
@pytest.mark.parametrize(
"klass,ext",
Expand Down