Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writers discarding rightmost data columns of tables with merged header cells #2633

Closed
wfdd opened this issue Jan 8, 2016 · 3 comments
Closed

Comments

@wfdd
Copy link

wfdd commented Jan 8, 2016

With pandoc version 1.16, Imported docx tables in the format

| header1            | header2 | header3 |
| data1.1 | data 1.2 | data2   | data3   |

appear to be interpreted as

| header1 | header2 | header3 |       |
| data1   | data2   | data3   | data4 |

which some writers, like rst, print as above, whereas others, notably plain
and markdown, print as below:

| header1 | header2 | header3 | |
| data1   | data2   | data3   | |

pandoc --from=docx --to=markdown test.docx produces:

  --------- --------- --------- -------
  Header1   Header2   Header3
  Data1.1   Data1.2   Data2
  --------- --------- --------- -------

pandoc --from=docx --to=markdown-simple_tables test.docx produces:

|         |         |         |
|---------|---------|---------|
| Header1 | Header2 | Header3 |
| Data1.1 | Data1.2 | Data2   |

pandoc --from=docx --to=rst test.docx produces:

+-----------+-----------+-----------+---------+
| Header1   | Header2   | Header3   |
+-----------+-----------+-----------+---------+
| Data1.1   | Data1.2   | Data2     | Data3   |
+-----------+-----------+-----------+---------+

pandoc --from=docx --to=native test.docx produces:

[Table [] [AlignDefault,AlignDefault,AlignDefault] [0.0,0.0,0.0]
 []
 [[[Plain [Str "Header1"]]
  ,[Plain [Str "Header2"]]
  ,[Plain [Str "Header3"]]]
 ,[[Plain [Str "Data1.1"]]
  ,[Plain [Str "Data1.2"]]
  ,[Plain [Str "Data2"]]
  ,[Plain [Str "Data3"]]]]]
@jgm
Copy link
Owner

jgm commented Jan 8, 2016

The pandoc document model doesn't yet support table cells
that span multiple columns. So I suppose the docx reader
is doing as well as it can with your table.

Pandoc expects table rows to have the same number of cells.
(Ideally this would be enforced at the type level, but
currently it is not.) When this assumption fails,
unexpected things may happen, but I'm not sure what
you consider a better behavior would be.

@wfdd
Copy link
Author

wfdd commented Jan 8, 2016

If both the layout can't be replicated and the data's gone, I don't think that's gonna be a win for anybody. I do prefer the rst writer's behaviour, but it might not be worth toiling over; personally, I've taken to parsing the JSON output, which is what I should've been doing from the start.

On 8 Jan 2016, at 07:17, John MacFarlane notifications@github.com wrote:

The pandoc document model doesn't yet support table cells
that span multiple columns. So I suppose the docx reader
is doing as well as it can with your table.

Pandoc expects table rows to have the same number of cells.
(Ideally this would be enforced at the type level, but
currently it is not.) When this assumption fails,
unexpected things may happen, but I'm not sure what
you consider a better behavior would be.


Reply to this email directly or view it on GitHub.

@jgm
Copy link
Owner

jgm commented Mar 5, 2017

See #2783

@jgm jgm closed this as completed Mar 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants