HTML reader - support new table features #6312

jgm · 2020-04-24T01:23:09Z

Add support for new table features introduced in
jgm/pandoc-types#66
including table attributes (including identifier), rowspan, colspan, table head and foot, multiple header lines, row headers, captions that allow block-level content and include an optional short caption.

danlobo02 · 2020-08-24T15:59:58Z

subscribing

jgm · 2020-08-24T16:36:34Z

@danlobo02 - note that you can subscribe to issues using the Subscribe button on the right; you don't need to make a comment.

* HTML writer: add support for row headers, colspans, rowspans * Add planet table tests See #6312

tarleb · 2020-11-23T17:32:24Z

I'd be happy to get some opinions on a problem I'm facing: Cell alignment cannot (or should not) be specified for all cells in a column, as the align attribute has been deprecated for <col> elements. So each cell must specify its own alignment. However, other formats don't support per-cell alignments, and expect column-wise alignment info. Should we try to detect column alignments in the HTML reader?

Put differently, given this table

<table>
  <tr><td style="text-align: left">1</td><td style="text-align: right">2</td></tr>
  <tr><td style="text-align: left">3</td><td style="text-align: right">4</td></tr>
</table>

What should the Markdown output be?

  --- ---
  1     2
  3     4
  --- ---

or

  --- ---
  1   2
  3   4
  --- ---

I lean towards the first version, as that is also what pandoc produces currently. But I believe that, in the presence of colspans and rowspans, the first version will require table normalization similar to what we do in the HTML writer, which might impact performance.

Also, where should this computation happen? In the writer, so filters have access to the original information? Or in the reader, because then we can provide more guarantees for the well-formedness of tables?

jgm · 2020-11-23T17:47:23Z

Yes, I agree, the first version.
By normalization, do you mean moving from a representation where the alignments are on each cell to one where the cell alignments are default and an alignment is specified on the column?
It seems to me that it would make most sense to do this in the reader, though I'm open to other ideas.

jgm · 2020-11-23T17:47:35Z

@despresc may also have feedback on this.

ickc · 2020-11-23T22:56:29Z

If I understand this question correctly, I think the reader should try its best to represent the input in the pandoc AST, and in this case means it is representing the align per cell, since the AST is capable to do that.

I don’t know much about the normalization already performed in pandoc, in this particular case, is it just checking if the whole column is sharing the same align and if it did then “normalize” it such that there’s no cell alignment but one single column alignment for that column? (And repeat for each column.) I think this kind of normalization probably shouldn’t hurt the performance a lot.

By the way I’m working on a library and pandoc filters that essentially can read and write all pandoc table AST feature so my biased preference would be an AST that has as much information from the reader as possible.

despresc · 2020-11-24T02:03:59Z

Setting the column alignment like that seems fine to me. If that is done here, it should probably be after the table builder has done its work. Otherwise it could be done in the table builder itself, at the cost of bumping the pandoc-types version and introducing a little unnecessary work in the readers of other formats.

Since AlignDefault is supposed to mean "no alignment preference", I think, should the rule be relaxed to include cases where a column has a mix of AlignDefault cells and cells of some common explicit alignment?

I assume that cells that span multiple columns will not influence this process at all, since their alignment tends to be unrelated to the alignment of the columns they span.

It will be slightly easier to keep the alignments on cells in the affected columns than to remove them, since the entire column needs to be laid out before we can decide whether or not its cells share the same alignment. The writers for formats that have both column and cell alignments already have to deal with superfluous explicit cell alignments, so keeping the alignments shouldn't introduce problems that aren't already present.

tarleb · 2020-11-24T13:28:15Z

Thanks for the feedback! I preserved the old behavior for now (i.e., take column-wide alignments from cells in first row), but plan to improve that later.

tarleb · 2020-11-26T07:48:11Z

Implemented:

Multi-row heads
Multi-body tables
Proper table footers
Attributes on thead, tbody, tfoot, and table cells

Missing:

support for body header rows
support for row header cells
attributes on tr elements

tarleb · 2020-11-27T09:40:35Z

Done. One final question: if there is no <thead> element, then the reader used the first body row as header iff it contained only <th> cells. I've kept this behavior for backwards compatibility, but maybe it would better to make a clean slate and do away with special cases?

jgm · 2020-11-27T17:21:48Z

I think it's good to keep backwards compatibility here -- at least until we have a specific reason to do otherwise.

jgm · 2020-11-27T17:24:32Z

Do we have good test coverage for all these features?

tarleb · 2020-11-27T17:31:59Z

Most features are tested with what's already there in other tests, mostly in test/html-reader.{native,html}. The only exceptions are colspans/rowspans, which I wanted to add to these files but forgot. I will do so later.

jgm added format:HTML reader labels Apr 24, 2020

jgm mentioned this issue Apr 24, 2020

List of projects #5581

Closed

9 tasks

reagle mentioned this issue May 15, 2020

Support for table column spans, table attributes in AST #1024

Closed

mb21 added the good first issue label Jun 10, 2020

tarleb mentioned this issue Aug 24, 2020

HTML writer - support new table features #6314

Closed

jgm pushed a commit that referenced this issue Sep 10, 2020

Support colspans and rowspans in HTML tables (#6644)

9423b4b

* HTML writer: add support for row headers, colspans, rowspans * Add planet table tests See #6312

jgm mentioned this issue Sep 10, 2020

Support colspans and rowspans in HTML tables #6644

Merged

jgm mentioned this issue Sep 27, 2020

Document which table features are supported in which formats #6701

Open

tarleb self-assigned this Nov 23, 2020

tarleb removed the good first issue label Nov 23, 2020

tarleb closed this as completed in a9c7662 Nov 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML reader - support new table features #6312

HTML reader - support new table features #6312

jgm commented Apr 24, 2020

danlobo02 commented Aug 24, 2020

jgm commented Aug 24, 2020

tarleb commented Nov 23, 2020 •

edited

Loading

jgm commented Nov 23, 2020

jgm commented Nov 23, 2020

ickc commented Nov 23, 2020

despresc commented Nov 24, 2020 •

edited

Loading

tarleb commented Nov 24, 2020

tarleb commented Nov 26, 2020 •

edited

Loading

tarleb commented Nov 27, 2020

jgm commented Nov 27, 2020

jgm commented Nov 27, 2020

tarleb commented Nov 27, 2020 •

edited

Loading

HTML reader - support new table features #6312

HTML reader - support new table features #6312

Comments

jgm commented Apr 24, 2020

danlobo02 commented Aug 24, 2020

jgm commented Aug 24, 2020

tarleb commented Nov 23, 2020 • edited Loading

jgm commented Nov 23, 2020

jgm commented Nov 23, 2020

ickc commented Nov 23, 2020

despresc commented Nov 24, 2020 • edited Loading

tarleb commented Nov 24, 2020

tarleb commented Nov 26, 2020 • edited Loading

tarleb commented Nov 27, 2020

jgm commented Nov 27, 2020

jgm commented Nov 27, 2020

tarleb commented Nov 27, 2020 • edited Loading

tarleb commented Nov 23, 2020 •

edited

Loading

despresc commented Nov 24, 2020 •

edited

Loading

tarleb commented Nov 26, 2020 •

edited

Loading

tarleb commented Nov 27, 2020 •

edited

Loading