Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add syntax for tables #11

Merged
merged 22 commits into from
Feb 27, 2023
Merged

Add syntax for tables #11

merged 22 commits into from
Feb 27, 2023

Conversation

gpetiot
Copy link

@gpetiot gpetiot commented Sep 9, 2022

This is a first draft, I'm asking for feedback before going further. @jonludlam @Julow

I chose this syntax:

{t
  {tr
     {td ...}
     ...
     {td ...}
  }
  ...
  {tr
     {td ...}
     ...
     {td ...}
  }
}

The markup is ovsiously inspired from the <tr></tr><td></td> HTML tags.
The generated output is straightforward in HTML and Latex, at first I wanted to add header cells (<th></th>) but I'm not aware of anything similar in Latex but we could apply a different style.

Some improvements:

  • add the ability to say that a cell will span over N>1 columns. In HTML it would be <td colspan="N">...</td>, in Latex it would be \multicolumn{N}{c}{...}
  • check that each row has the same number of columns? (I don't remember if that will cause issues for Latex, but HTML should be more lenient).

What do you think?

@jonludlam
Copy link
Collaborator

Thanks @gpetiot !

This is definitely an approach that could work very well. However, I think we ought to consider the possibility of having more visual way to layout tables, perhaps something similar to how rst or orgmode does it - as an example both of the following are valid rst tables:

+------------+------------+-----------+
| Header 1   | Header 2   | Header 3  |
+============+============+===========+
| body row 1 | column 2   | column 3  |
+------------+------------+-----------+
| body row 2 | Cells may span columns.|
+------------+------------+-----------+
| body row 3 | Cells may  | - Cells   |
+------------+ span rows. | - contain |
| body row 4 |            | - blocks. |
+------------+------------+-----------+

and

=====  =====  ======
   Inputs     Output
------------  ------
  A      B    A or B
=====  =====  ======
False  False  False
True   False  True
False  True   True
True   True   True
=====  =====  ======

and here's an org-mode table:

| N | N^2 | N^3 | N^4 | sqrt(n) | sqrt[4](N) |
|---+-----+-----+-----+---------+------------|
| / |  <  |     |  >  |       < |          > |
| 1 |  1  |  1  |  1  |       1 |          1 |
| 2 |  4  |  8  | 16  |  1.4142 |     1.1892 |
| 3 |  9  | 27  | 81  |  1.7321 |     1.3161 |
|---+-----+-----+-----+---------+------------|

The reason being that more often than not right now I'm still looking at docs in my editor buffer, or on github, as opposed to nicely rendered on a web page. We'd also get to use pre-existing table-editing modes in various editors.

@Julow
Copy link
Contributor

Julow commented Sep 9, 2022

Features that other languages have: text alignment of the column, header rows.

The short syntax from RST is interesting: https://docutils.sourceforge.io/docs/ref/rst/directives.html#list-table if we drop the structured directive and attributes and make it less ambiguous, we get a syntax easy to type that is similar to lists and might be enough in most cases.

I suggest this syntax, which works the same way as lists:

++ - header 1
   - header 2
** - cell 1
   - cell 2

The long syntax that you suggest is needed, like we have for lists, to support nesting of bigger elements.

To leave room for the colspan feature and text alignment, we need some kind of attributes: {td[colspan=2,align=center] ...} (similar to CSS selectors).

Header row similar to HTML:

{table
 {tr
  {th header 1}
  {th header 2}}
 {tr
  {td cell 1}
  {td cell 2}}}

The main element can be {table ...}, it's more readable and similar to HTML.

@dbuenzli
Copy link
Contributor

dbuenzli commented Sep 9, 2022

Most lightweight markup languages add two things, headers and specification of column alignement (see for example the github flavoured markdown table definition). I think it would be nice to have these. I wouldn't bother too much with multi-columns.

I have mixed feelings about what @jonludlam suggests. Of course these tables look nicer but the given examples are also not very content heavy.

Once you start adding links, references and ocamldoc markup in the cell contents I have the impression that the regular, well-delimited, syntax of ocamldoc scales better.

@Julow
Copy link
Contributor

Julow commented Sep 9, 2022

The reason being that more often than not right now I'm still looking at docs in my editor buffer, or on github, as opposed to nicely rendered on a web page.

I've always hated both reading and writing tables drawn and aligned like that. The editor plugins to draw them are not worth learning for the few tables I ever needed to make so I end up not aligning them, they become write-only. Also, multi line cells are a pain.

@jonludlam
Copy link
Collaborator

@dbuenzli agreed that larger-scale tables won't look as nice as those examples! Nothing stopping us from doing both though - we already have precedent in lists, where we can write both

(** Here is a {b list}
- item 1
- item 2
- item 3

The list is ended by the blank line.*)

and

(** Here is a {b list}
{ul {- item 1}
{- item 2}
{- item 3}}
The list is ended by the blank line.*)

we could simply choose based on the content of the {table ...} element - e.g.

{table
 {tr
  {th header 1}
  {th header 2}}
 {tr
  {td cell 1}
  {td cell 2}}}

vs

{table
+==========+==========+
| header 1 | header 2 |
+==========+==========+
| cell 1   | cell 2   |
+----------+----------+
}

@jonludlam
Copy link
Collaborator

I've always hated both reading and writing tables drawn and aligned like that. The editor plugins to draw them are not worth learning for the few tables I ever needed to make so I end up not aligning them, they become write-only. Also, multi line cells are a pain.

I've never hated reading them, but writing is a bit of a pain without editor support :-)

@jonludlam
Copy link
Collaborator

Actually, choosing based on contents is not a good idea. We could use a different tag though - {t vs {table for example.

@dbuenzli
Copy link
Contributor

dbuenzli commented Sep 9, 2022

To motivate with concrete examples here's a sample of tables that I currently write as code and that would benefit from the proposal:

  1. https://erratique.ch/software/b0/doc/B00_ocaml/Mod/Name/index.html#mangled
  2. https://erratique.ch/software/brr/doc/ffi_cookbook.html#names
  3. https://erratique.ch/software/brr/doc/ffi_manual.html#equality
  4. https://erratique.ch/software/gg/doc/Gg/Float/index.html#floatrecall
  5. https://erratique.ch/software/uutf/doc/Uutf/index.html#val-decoder
  6. https://erratique.ch/repos/inane/tree/src/b36uid.mli#n12

Scroll a bit if you don't see the table, which reminds me: it would be nice to be able to reference them via a label like we do for sections, i.e. {table:label (and why not cells and/or rows, but let's start simple).

@dbuenzli
Copy link
Contributor

dbuenzli commented Sep 9, 2022

Regarding a possible lightweight notation. I think it's good at least have a look at what the experts do to when they can start designing from scratch.

@gpetiot
Copy link
Author

gpetiot commented Sep 9, 2022

For the light syntax, what do you think about something like:

{t
+-----------------------------
|{l}  xx |{c}  yy  |{r}   zz
+---------------------------
|   xx   |{r}   yy  |{l}    zz
+-----------------------------
}
  • No need to finish the line (just +- at the beginning of the line to indicate a new row)
  • {l}/{r}/{c} could indicate the alignment (optional), but should be stuck to the | to be a special token

@dbuenzli
Copy link
Contributor

dbuenzli commented Sep 9, 2022

For the light syntax, what do you think about something like:

If you really want to add this then I strongly suggest not to invent anything new here. Have a look at pandoc's markdown table extensions for a few alternatives.

Personally I think you should just go with pipe_tables which is popular among a few extended markdown dialects including github's and gitlab's ones.

@gpetiot
Copy link
Author

gpetiot commented Sep 19, 2022

I implemented the light and heavy syntaxes, this PR is ready to be reviewed.

Copy link
Contributor

@Julow Julow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parses as two columns, a and b |c| d, which is not intuitive:

{t
 | a | b |c| d |
 |---|--:|:--|:-:|
}

@jonludlam
Copy link
Collaborator

I assume that, since pipetables are pretty standard, there are a set of nasty test cases somewhere that we could use?

@dbuenzli
Copy link
Contributor

Would it please be possible to update the first comment with the implemented syntaxes. For example how do I specify alignement in the "heavy" syntax ?

@gpetiot
Copy link
Author

gpetiot commented Sep 20, 2022

Would it please be possible to update the first comment with the implemented syntaxes. For example how do I specify alignement in the "heavy" syntax ?

Nothing decided yet, I'm torn between doing something similar to the light syntax:

{table
   {tr {th ...} [th ...}}
   {align left right}
   ...
}

or something that would be easier to read to my opinion but that would make both implementations diverge:

{table
  {tr {th{left} ...} {th{right} ...}} or {tr {th:left ...} {th:right ...}}
  ...
}

@dbuenzli
Copy link
Contributor

Note that even though IIRC odoc dropped support for it. There was a syntax in ocamldoc for text alignement. Might be worth resurecting it for th. Something like: a bare th is left-aligned (or whathever the default is of the light syntax) and otherwise alignement is specified by "direct" nesting:

{tr
{th {L Left aligned}} {th {R Right aligned}} {th {C Center aligned}} {th default align}}

@gpetiot
Copy link
Author

gpetiot commented Sep 21, 2022

Note that even though IIRC odoc dropped support for it. There was a syntax in ocamldoc for text alignement. Might be worth resurecting it for th. Something like: a bare th is left-aligned (or whathever the default is of the light syntax) and otherwise alignement is specified by "direct" nesting:

{tr
{th {L Left aligned}} {th {R Right aligned}} {th {C Center aligned}} {th default align}}

Sounds good to me, @jonludlam what do you think?

@jonludlam
Copy link
Collaborator

Reusing the syntax we previously had makes sense to me.

@Julow
Copy link
Contributor

Julow commented Sep 21, 2022

On the other hand, an attribute-like syntax would allow colspan and keep room for new things later.

Copy link
Contributor

@Julow Julow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of the pipe syntax at all but using the most common syntax can't hurt Odoc.

@gpetiot
Copy link
Author

gpetiot commented Dec 12, 2022

I fixed the parsing issues with the alignment tags for the heavy syntax. Let me know what you think @Julow @jonludlam

@gpetiot gpetiot requested a review from Julow December 12, 2022 20:45
Copy link
Contributor

@Julow Julow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Adding a block element into a light table makes quite a mess of warnings and continue the parsing in an inconsistent way:

# p {|
  {t
   | {[ a ]} | b |
   |---|---|
  }
  |} ;;
File "", line 2, characters 5-12:
'{[...]}' (code block) is not allowed in '{t ...}' (table).
File "", line 2, characters 13-14:
'|' is not allowed in top-level text.
Suggestion: move '|' into '{t ...}' (table).
File "", line 2, characters 15-16:
Paragraph should begin on its own line.
File "", line 2, characters 17-18:
'|' is not allowed in top-level text.
Suggestion: move '|' into '{t ...}' (table).
File "", line 3, characters 3-4:
'|' is not allowed in top-level text.
Suggestion: move '|' into '{t ...}' (table).
File "", line 3, characters 7-8:
'|' is not allowed in top-level text.
Suggestion: move '|' into '{t ...}' (table).
File "", line 3, characters 8-11:
Paragraph should begin on its own line.
File "", line 3, characters 11-12:
'|' is not allowed in top-level text.
Suggestion: move '|' into '{t ...}' (table).
File "", line 4, characters 2-3:
Unpaired '}' (end of markup).
Suggestion: try '\}'.
- : Ast.t =
[`Table (Light {header = []; data = []; align = []});
 `Paragraph [`Word "b"; `Space " "]; `Paragraph [`Word "---"];
 `Paragraph [`Word "---"]; `Paragraph [`Word "}"]]

The parser has the goal of not loosing text in case of bad syntax so the problematic block element shouldn't be removed. It would be nice to handle the conversion from table to paragraphs without generating too much warnings.

@gpetiot gpetiot requested a review from Julow January 17, 2023 15:50
Copy link
Contributor

@panglesd panglesd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice !

I do have change requested for some things:

  • I think that the heavy syntax should allow us to define any cell to be a header cell (just like in html). This allows for several use cases, with two common one being: "vertical" tables (tables where the header is a column) and tables with two "axis" (both a header line and a header column).
    Fortunately, I think this will actually simplify the code a lot.
  • I believe the lexer should do more work in the case of the light table. Currently, as mentioned by @Julow, opening a forbidden block inside a light table will generate a lot of warnings (where one would have sufficed) and the next closing will be wrongly paired. That's a major issue!
    Using a specific lexer for the inside of light tables (which is what is done for verbatim for example) also removes the constraint added by this PR to escape | characters outside of tables.
  • I don't think the AST type should make such a clear distinction between light and heavy syntax. Similarly to `` List, a variant on the side could distinguish between them, the information is almost always irrelevant for consumers of the parser, it has to be easy to ignore.

Another thing: I am not a huge fan of the alignment syntax for heavy tables.
To me, it seems that {th {R text}} will only have an effect on the cell, by the nesting. (The fact that it has an effect on the whole columns can only be understood with the light syntax in mind).
Moreover, it prevents possible future uses of {L } to specify the alignment on single cell, especially on header cell (if we want that, not sure...).
Instead, we could take inspiration from latex's table (or the col html element), where the fact that it acts on columns is evident:

{table {col L C R}
  {tr
    {th left-aligned} {th centered} {th right-aligned}}
  }
}

This leaves room for deciding the line borders in odoc (but maybe that's not something we want):

{table {col | L C | R}
  {hline}
  {tr
    {th left-aligned} {th centered} {th right-aligned}}
  }
  {hline}
}

gpetiot and others added 7 commits February 10, 2023 16:08
New lines inside markup (such as `{e ...}`) were not detected, although it
breaks the light table format.

Add a warning for such cases.

Signed-off-by: Paul-Elliot <peada@free.fr>
Signed-off-by: Paul-Elliot <peada@free.fr>
Signed-off-by: Paul-Elliot <peada@free.fr>
Signed-off-by: Paul-Elliot <peada@free.fr>
contribution to the table PR
@jonludlam
Copy link
Collaborator

I had a quick look to see whether there were any differences running over the set of libraries in odoc's driver with the ODOC_BENCHMARK var set to true. I found a two differences - here:

In both instances they are actually code snippets without being enclosed in any sort of verbatim/code tags, but I was slightly surprised nevertheless. It's not a show stopper but I'd be quite interested to know why the differences are there.

panglesd and others added 4 commits February 14, 2023 11:33
Signed-off-by: Paul-Elliot <peada@free.fr>
Fix new paragraph starting on `|`
This commit fixes several alignment issues.

1. No alignment in a cell was encoded as an alignment to `Center.

"Centering" was certainly a typo, but there is also the issue that "no specified
alignment" should really be different than "specified to ..."

For instance (assuming `{R ...}` exists):
```
{R
  {t
    |aaa|bbb|
    |---|---|
    |xxx|yyy|}}
```
would not be right-aligned inside the table, with the previous behaviour.

This commit allows for "unspecified alignment" using an option.

2. No alignment in a table was encoded as an empty list of alignment.

This was wrong, since at some point we might want to raise warnings in case the
number of alignment and the number of columns differ.

Signed-off-by: Paul-Elliot <peada@free.fr>
@panglesd
Copy link
Contributor

Thanks @jonludlam for spotting a bug! It has been fixed in 97fd2a3.

I've updated ocaml/odoc#893 to be in sync with this PR, in case you want to review them as a whole.

src/syntax.ml Outdated
@@ -209,9 +324,11 @@ and delimited_inline_element_list :
parent_markup:[< Token.t ] ->
parent_markup_location:Loc.span ->
requires_leading_whitespace:bool ->
context:inline_context ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit of threaded state is annoying. It's also very similar to the requires_leading_whitespace state - I wonder if the two could be combined into a single Context.t or similar? I suppose the new bit is relevant to both delimited_inline_element_list and inline_element so maybe not worth it? It's also possibly worth considering getting rid of the context and just checking in the light_table code after we've called this function that the start and end lines are the same?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick go: jonludlam@5c49f98 - I think it's a bit less invasive and keeps the check within the light_table_row function. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also my opinion that having the check in light_table_row makes the code clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a small remark, the location warning in the current PR and jonludlam@5c49f98 will be different:

  • In the current PR, it is only the newline. There is one warning per newline.
  • In jonludlam@5c49f98 there is only one warning, with location being the whole inline item.

I think both solutions have different advantages/disadvantages (one is clearer, the other more general), and I don't have a preference. So let's go with @jonludlam's idea!

@jonludlam
Copy link
Collaborator

Looking good now, thanks @gpetiot!

@jonludlam jonludlam merged commit 370f1cf into ocaml-doc:main Feb 27, 2023
@gpetiot gpetiot deleted the tables branch February 27, 2023 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants