-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Markdown reader - support new table features #6317
Comments
For captions and table attributes, inspired by #3177 (comment), we could use the syntax of a native div wrapping nothing but a table:
This would be mostly backwards-compatible with pandoc-crossref I think? @lierdakil ? Placement of the short caption is trickier though... |
I would have to modify pandoc-crossref to work with the new AST anyway, so might as well adapt to the new syntax, whatever it ends up being. That said, I'm not exactly a fan of overloading the native div syntax, it can lead to some surprising behaviour, and will likely break some workflows. Perhaps we could use something like this instead? : {#tableId}
+---------------+---------------+
| Fruit | Price |
+===============+===============+
| Bananas | $1.34 |
| | |
+---------------+---------------+
: long caption is backward-compatible
:
: but now, just like with blockquotes, it can contain blocks.
and it can wrap lazily The lack of empty line between |
but users wouldn't have to change their markdown files? or am I mistaken or is a rare case anyway? |
Internally, pandoc-crossref represents a table-with-attributes as a table-in-a-div, and that works on the syntax level, too. However, I believe most users use the short-cut syntax of adding As for table-in-a-div, it's debatable whether to keep it or not, but probably I'll keep it as a variant syntax for the foreseeable future, because backward-compatibility is a thing I think about sometimes. |
There is also the simple table and multiline table syntax, which is independent of the syntax for the overall table attributes and caption. I posted this in my pull request before, but something like this:
which should be parsed like an existing simple table, except that multiple header lines are allowed, and the alignments of columns are determined by the last header line. The parser would have to go back and fill in the cell dimensions after header parsing, but if the existing rule that cells cannot cross column boundaries were kept for the other header lines, then this would be easier. That would mean this table:
might have a second header row with two cells This (and the multiline table version) would allow for multiple table head lines and row spans in the table head, in addition to whatever table caption or attribute syntax is allowed. |
There are some suggestions for extensions to pipe table syntax in the commonmark forum: see especially
Extending grid table syntax as suggested above makes sense. For the caption, I think we'd want a syntax that can allow arbitrary block-level content. Making it like definition list definitions might make sense (with the 4-space indent).
But I am also somewhat tempted by the "overloading fenced div" approach, which gives us a uniform way to add table attributes and also degrades nicely. (Everything after the table itself could be considered the caption.) If there's going to be a special way to add attributes to the table, why not just
on a line by itself right before the table? (NB in my commonmark-hs I've implemented an extension allowing attributes to be placed on any block level element this way.) We need a solution for short captions. A simple thing would be to take the first sentence of the caption, but that's probably not robust enough. |
Works for me, if it works. I was just being wary of potential ambiguities, but now that I think about it, those are probably not an issue.
It's not a great solution, because then there's no concise way to have a table in a div. Which might be used for styling purposes or marking parts for filters. Most notably, this breaks syntactical backward compatibility -- granted, probably for a minority of edge cases, but I would argue it's a bad idea overall to tack on unintuitive contextual semantics onto an existing syntax that has (in theory) a very specific meaning, from my experience, it will just lead to surprises down the line, and not the good kind.
This would be especially painful in some cases. FWIW, I do this for code blocks in pandoc-crossref (with some limitations), but that's because it's one of the few bad options I have, and not because it's a good idea. |
One way to reduce this impact would be to require the table divs to be marked up somehow, e.g. with class |
Which we're generally trying to avoid due to i18n concerns IIRC. So it'd be at best a stopgap. |
yeah, or like blockquotes, but with the
ah yes, if that's a general principle that works, that's great as well. About overloading the div syntax: I guess to make a final decision, that should be done as part of the figure syntax? #3177 For me, we could also decide to go ahead implementing the grid table I posted in the original post of this issue, and worry about attributes and long captions later. Or should we do this directly in commonmark-hs? I'm not so up to date what's the state of progress is there...? |
Yes, if someone wants to work on allowing col/rowspans in grid table syntax, that's fine and it can be done without deciding about captions and identifiers. The syntax you propose looks okay to me. I agree that the issues about captions and identifiers should be thought about in connection with figures. commonmark-hs currently has pipe tables but I haven't tried to implement grid tables there. It would be good to do this, though! |
just keep in mind that grid tables are really bad for multi-line cells. Pipe tables (ala ASCIIDoc) is probably a better approach. |
See above for a link to some suggestions for pipe tables, which pandoc supports too. There's no reason we couldn't find a raw to do col/rowspans in both kinds of tables. |
Just for the record the correct word for "row header" is stub. |
Any plan to support markdown writer for new table feature? |
Yes, of course we'll need to support whatever formats we decide on in the writer too. I opened a new issue for that. |
To be honest, tables are some of the most annoying issues in Markdown, in particular if the table gets complex
I think there are contradicting requirements:
I therefore propose to support at least one Table format which does not request that the table table shall appear as tabular in the source text and use a more appropriate table format such as:
|
I tend to agree. While the original impetus of Markdown might have been to have a format that is simple enough to publish as-is, Pandoc Markdown is also meant to capture sufficient complexity to be the authoring format for conversion into multiple formats. That having been said, pipe_tables (unlike grid_tables and simple_tables) allows for "compressed" or "non-aligned" tables, and so is easy enough to write as it doesn't require a "tabular"-looking table. And unlike a format like CSV, which is also easy to write, pipe_tables has the potential to allow for cell-level alignment, multiple header-rows, colpsans/rowspans, captions, multi-line cells (to support unnumbered and numbered lists). In particular, I like this proposal on a sufficiently-complex pipe_tables format, and think discussion around it would be beneficial: https://talk.commonmark.org/t/tables-in-pure-markdown/81/145 I also wouldn't be opposed to Pandoc Markdown natively supporting HTML5 tables syntax, since those too are simple to write and most end tags aren't required: https://talk.commonmark.org/t/tables-in-pure-markdown/81/124 I think it is also noteworthy that column spans and row spans are normally discouraged if your document is to be rendered accessibly by screen readers. So complex tables should generally be avoided whenever accessibility is a concern (as it usually should be). |
Here are my some other thoughts on the issue of pipe table extensions: https://talk.commonmark.org/t/tables-in-pure-markdown/81/134 |
@jgm thanks for the pointer to some other thoughts. I see tables being subject of a long discussion. But I also do not see any practical progress with this respect. How bad ... So I really wish pandoc would support native html5 tables with markdown as table cell content. then we would have a solution to solution to the issue until the discussion converges. |
As far as I can see, the main feature-level differences (i.e., non-syntactical difference) between @jgm's proposal and aoudad's proposal are that aoudad's proposal provides for:
Features that neither proposal has:
The various features they both have in common are:
It seems to me that syntactically they are mostly similar, with a couple of differences: multi-line cells ( I hope I didn't miss anything important differences. Do folks think it is worth having per-cell alignment and row headers? At any rate, would it make sense to have a feature-rich non-graphical table syntax (such as HTML5's, which seems to be both easy to type since it can do away with most end tags and has all the required features) be readily understood by Pandoc such that it is convertible into multiple formats without needing a separate filter to accomplish this? |
As for a more powerful grid/pipe table syntax to me it is important that there is an easy way to mark a column as a stub (often erroneously called "row header") column or more generally to mark a cell as what in HTML terms is a TH element. I'm thinking perhaps replace the pipe(s) to the left (or to the right in an RTL document) with (a) bang(s). | | Head 1 | Head 2 | Head 3
|--------|--------|--------|--------
! Stub 1 | | |
! Stub 2 | | |
! Stub 3 | | | Ideally the broken bar character ¦ U+00A6 could be used or even the double vertical line ‖ U+2016 to the right. Personally I see no problem with using non-ASCII — at least Latin-1 — punctuation for syntax but I can understand that there might be disagreement; I have all Latin-1 punctuation characters available on my Swedish Linux keyboard but not everyone may be so lucky. Whichever characters are used for syntax it is important that they can be backslash escaped inside cell content. |
As for more powerful syntaxes which clash with the "tables-should look like tables" principle the most common requirement is probably the ability to write a table as a list of lists. I have written a filter which converts lists of lists into tables. Note that it currently only works with pandoc < 2.10 (if anybody understands the pandoc 2.10 table model a pull request is most welcome! :-), but it shows that the filter approach to this works well. |
Any news here? |
I think that a more powerful grid table format would be a good first step. Something like |
I agree use html format, then we can use tui.editor https://github.com/nhn/tui.editor to render |
Hi @jgm. Since the pipe_tables format extension isn't yet settled, and grid_tables format needs extending too, how about the HTML5 suggestion?
|
HTML5 tables: it's an interesting idea, but one must think about how this would interact with the way raw HTML currently works in pandoc's markdown. The current expectation is that raw HTML will be passed through verbatim to HTML (and other formats that accept HTML, like markdown ande pub), and that it will be ignored by other formats. Parsing HTML tables as native Table elements would violate that expectation and could lead to problems (e.g. for people who include both an HTML and a LaTeX version of a table to cover both formats). There's also the issue of how it would interact with Just to throw out an idea that would avoid these issues, one could introduce an explicit fencing syntax that means: parse the following chunk of HTML (or whatever other format) using the appropriate pandoc reader, and include the result into the AST. This would differ from our current "raw attribute" syntax, which always creates a RawBlock. Example:
Of course, this would not degrade well in implementations that didn't support the special syntax. A sneakier approach would be to use HTML comments or processing instructions:
The "read" instruction would tell pandoc to try to parse a following raw block (which could be raw latex, raw html, or raw anything using a fence and a raw attribute) and parse it from its native format. The advantage of this is that the instruction would just be ignored by implementations that don't support this feature (e.g. on GitHub), so you could at least get the HTML table out in HTML output, while with pandoc you'd have the increased power of being able to convert it to any format. |
Alternatively we could have a special attribute in the HTML, e.g.
|
For me it would be important that the tables cells could be markdown (with lists and multiple paragraphs, even images) Nested tables is IMHO less important I like the approach with the processing instruction. <?pandoc table="parse-markdown"?>
<table style="width:100%">
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Age</th>
<th>Bio</th>
</tr>
<tr>
<td>Jill</td>
<td>Smith</td>
<td>50</td>
<td>
Jill was born and had a good childhood. Then she
* went to school
* went to university
* got familiar with Pandoc
now she is a happy user of [pandoc](www.pandoc.org)
</td>
</tr>
<tr>
<td>Eve</td>
<td>Jackson</td>
<td>94</td>
<td>
Eve was born and had a good childhood. Then she
* went to school
* went to university
* got familiar with Pandoc
now she is a happy user of [pandoc](www.pandoc.org)
</td>
</tr>
|
I don't think it would be easy to support markdown inside the table cells, if we did this. |
I understand. Neverthless my main issue is, to support tables with complex content in its cells. Therefore I proposed to indicate such a table by So without markdown-support within table cells I need to write plain HTML - which would any be at least a solution to create complex tables in a markdown Document. |
Just my personal usage: I would want to use grid table syntax for smallish tables, and place a csv file for bigger tables (using a pandoc filter, or would be cool if built-in, see #553). |
In my playing around with tables, I found that pipe tables and csv were roughly equivalent, with the BTW, this discussion on a
Ineed, this is more complicated than I'd foolishly anticipated. But it seems as though it may be prove to be a more easily solvable issue compared to settling on new extensions to grid/pipe table formats! :-) |
@the-solipsist Bigger tables, I keep in external files and edit with spreadsheet software, that's why they need to be csv. Smaller tables I keep in the markdown file and edit manually or with vim. Could be pipe tables or grid tables for me for that case, but as jgm mentioned, seems easier to add the new table features to grid tables. |
@Mg21 if you edit the table e.g. i Excel and have rich text with multiple paragraphs in a cell in combination with column/rowspans ... this is the use case where we struggle in Markdown. |
Guys, I highly appreciate all your work here and I've had a look at all the relevant issues and came up here and it seems like this is the only issue left to have colspan and rowspan tables when converting from MD to HTML, is that assumption correct? If yes, what is missing to get it integrated to pandoc? Thanks a lot :) |
That's correct. What's missing is
If your question was meant as an offer to participate, you could add support for reST-style grid tables, as the Markdown and reStructuredText parsers share this code. See function Help is welcome. |
Thank you very much for your confirmation. While I'd love to contribute, I'm afraid that Julia and Matlab skills don't help much here, and there's no time to dig into Haskell while writing a thesis I'm sorry :( While there seem to be some good ideas in the thread you mentioned, there's no discussion for grid tables anymore, i.e. we "just" need to find somebody who is capable of writing the parser for grid tables, right? I'm going to ask my colleagues but don't have high hopes to find a Haskell guy :( |
Yes, that's right. BTW, if you just need some way to add tables to your Markdown, then you could write the table as HTML (or LaTeX, if you prefer) and use a Lua filter to turn it into a full table. function RawBlock(raw)
if raw.format:match 'html' and raw.text:match '%<table' then
return pandoc.read(raw.text, raw.format).blocks
end
end The table would be embedded like this:
|
I wasn't aware of the Regarding Latex: Right now, I'm including a (SVG) table which actually is created by Latex :D Unfortunately, I'm using Katex which does not support Thanks! |
Regarding the row header / stub feature (btw, what's the rationale for the "stub" name, @bpj?), instead of the syntax proposed in the opening comment by @mb21:
...I find the syntax proposed here to be more readable, intuitive, and consistent with grid tables' column header syntax:
Unicode character side note
I also think this addresses a point raised by @the-solipsist above, about support for multiple column/row headers within the same table. It seems to me that marking row headers at the division rather than at the start of the line would help with that:
Note how this is also consistent with how multiple column headers are implemented in grid tables today. |
@waldyrious It is the proper term for it in typography. If you wonder how the term arose I don't know, but I see no reason for inventing half-baked new terms when there already is one since at least two centuries. |
Add support for (at least some of) the new table features introduced in pandoc-types/pull/66.
It would be good if at least one of pandoc markdown's table syntax would support that: grid tables seem like the obvious candidate. Something like:
This would roughly tick off the following of the new table features:
It does have the disadvantage that if the last rows look like header rows, they are simply treated as the table foot.
The text was updated successfully, but these errors were encountered: