Writers discarding rightmost data columns of tables with merged header cells #2633

wfdd · 2016-01-08T01:27:22Z

With pandoc version 1.16, Imported docx tables in the format

| header1            | header2 | header3 |
| data1.1 | data 1.2 | data2   | data3   |

appear to be interpreted as

| header1 | header2 | header3 |       |
| data1   | data2   | data3   | data4 |

which some writers, like rst, print as above, whereas others, notably plain
and markdown, print as below:

| header1 | header2 | header3 | |
| data1   | data2   | data3   | |

pandoc --from=docx --to=markdown test.docx produces:

  --------- --------- --------- -------
  Header1   Header2   Header3
  Data1.1   Data1.2   Data2
  --------- --------- --------- -------

pandoc --from=docx --to=markdown-simple_tables test.docx produces:

|         |         |         |
|---------|---------|---------|
| Header1 | Header2 | Header3 |
| Data1.1 | Data1.2 | Data2   |

pandoc --from=docx --to=rst test.docx produces:

+-----------+-----------+-----------+---------+
| Header1   | Header2   | Header3   |
+-----------+-----------+-----------+---------+
| Data1.1   | Data1.2   | Data2     | Data3   |
+-----------+-----------+-----------+---------+

pandoc --from=docx --to=native test.docx produces:

[Table [] [AlignDefault,AlignDefault,AlignDefault] [0.0,0.0,0.0]
 []
 [[[Plain [Str "Header1"]]
  ,[Plain [Str "Header2"]]
  ,[Plain [Str "Header3"]]]
 ,[[Plain [Str "Data1.1"]]
  ,[Plain [Str "Data1.2"]]
  ,[Plain [Str "Data2"]]
  ,[Plain [Str "Data3"]]]]]

The text was updated successfully, but these errors were encountered:

jgm · 2016-01-08T05:17:23Z

The pandoc document model doesn't yet support table cells
that span multiple columns. So I suppose the docx reader
is doing as well as it can with your table.

Pandoc expects table rows to have the same number of cells.
(Ideally this would be enforced at the type level, but
currently it is not.) When this assumption fails,
unexpected things may happen, but I'm not sure what
you consider a better behavior would be.

wfdd · 2016-01-08T11:08:24Z

If both the layout can't be replicated and the data's gone, I don't think that's gonna be a win for anybody. I do prefer the rst writer's behaviour, but it might not be worth toiling over; personally, I've taken to parsing the JSON output, which is what I should've been doing from the start.

On 8 Jan 2016, at 07:17, John MacFarlane notifications@github.com wrote:

The pandoc document model doesn't yet support table cells
that span multiple columns. So I suppose the docx reader
is doing as well as it can with your table.

Pandoc expects table rows to have the same number of cells.
(Ideally this would be enforced at the type level, but
currently it is not.) When this assumption fails,
unexpected things may happen, but I'm not sure what
you consider a better behavior would be.

—
Reply to this email directly or view it on GitHub.

jgm · 2017-03-05T16:07:59Z

See #2783

jgm closed this as completed Mar 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writers discarding rightmost data columns of tables with merged header cells #2633

Writers discarding rightmost data columns of tables with merged header cells #2633

wfdd commented Jan 8, 2016

jgm commented Jan 8, 2016

wfdd commented Jan 8, 2016

jgm commented Mar 5, 2017

Writers discarding rightmost data columns of tables with merged header cells #2633

Writers discarding rightmost data columns of tables with merged header cells #2633

Comments

wfdd commented Jan 8, 2016

jgm commented Jan 8, 2016

wfdd commented Jan 8, 2016

jgm commented Mar 5, 2017