remark-grid-table: not breakable spaces break the parsing #369

artragis · 2020-01-08T12:16:43Z

given

+-----------+--------------------------------------------+
| a         | b                                          |
+-----------+--------------------------------------------+
|       c   |  ‌                              e           |
+-----------+--------------------------------------------+

I expect

root[1] (1:1-5:59, 0-295)
└─ element[2] (1:1-5:59, 0-295) [tagName="WrapperBlock"]
   ├─ gridTable[1] [data={"hName":"table"}]
   │  └─ tableHeader[1] [data={"hName":"tbody"}]
   │     └─ tableRow[2] [data={"hName":"tr"}]
   │        ├─ tableCell[1] [data={"hName":"td","hProperties":{"colspan":1,"rowspan":1}}]
   │        │  └─ paragraph[1] (1:1-1:2, 0-1)
   │        │     └─ text: "a" (1:1-1:2, 0-1)
   │        └─ tableCell[1] [data={"hName":"td","hProperties":{"colspan":1,"rowspan":1}}]
   │           └─ paragraph[1] (1:1-1:2, 0-1)
   │              └─ text: "b" (1:1-1:2, 0-1)
   └─ paragraph[1] (1:1-2:59, 0-117)
      └─ text: "|       c   |  ‌                              e           |\n+-----------+--------------------------------------------+" (1:1-2:59, 0-117)

But I get

root[1] (1:1-5:59, 0-294)
└─ element[1] (1:1-5:59, 0-294) [tagName="WrapperBlock"]
   └─ gridTable[1] [data={"hName":"table"}]
      └─ tableHeader[2] [data={"hName":"tbody"}]
         ├─ tableRow[2] [data={"hName":"tr"}]
         │  ├─ tableCell[1] [data={"hName":"td","hProperties":{"colspan":1,"rowspan":1}}]
         │  │  └─ paragraph[1] (1:1-1:2, 0-1)
         │  │     └─ text: "a" (1:1-1:2, 0-1)
         │  └─ tableCell[1] [data={"hName":"td","hProperties":{"colspan":1,"rowspan":1}}]
         │     └─ paragraph[1] (1:1-1:2, 0-1)
         │        └─ text: "b" (1:1-1:2, 0-1)
         └─ tableRow[2] [data={"hName":"tr"}]
            ├─ tableCell[1] [data={"hName":"td","hProperties":{"colspan":1,"rowspan":1}}]
            │  └─ paragraph[1] (1:1-1:2, 0-1)
            │     └─ text: "c" (1:1-1:2, 0-1)
            └─ tableCell[1] [data={"hName":"td","hProperties":{"colspan":1,"rowspan":1}}]
               └─ paragraph[1] (1:1-1:2, 0-1)
                  └─ text: "e" (1:1-1:2, 0-1)

extra

there are some special chars on the second row :

>>> a="""|       c   |  ‌                              e           |"""
>>> a
'|       c   |  \xe2\x80\x8c                              e           |'

The text was updated successfully, but these errors were encountered:

artragis · 2020-01-08T12:44:39Z

It appears this char is zero-width non joiner.

On zmarkdown server 6.1 it generates a "Incorrectly eaten value: please report this warning on http://git.io/vg5Ft" error.

StaloneLab · 2020-01-08T18:24:48Z

Will be working on it next week if no one has solved it by then.

StaloneLab · 2020-01-17T11:17:46Z

Okay, I took a look this morning, and the problem is complicated in fact... Here's what happens:

for a table to be correct, the grid-table tokenizer expect all it's lines to be visually equal in width;
the first line is 58 visual width, but for some unknown reasons, the fourth line is considered 59 because the zero width character is counted has being 1 in width by the string-width package;
therefore, the table is considered to end at third line: it is tokenized, and the remaining of the string is tokenized by the remark's paragraph tokenizer.

The be convinced of this fact, notice how the following works while not having visually equal lines (I removed a space from the fourth line:

+-----------+--------------------------------------------+
| a         | b                                          |
+-----------+--------------------------------------------+
|       c   |  ‌                              e          |
+-----------+--------------------------------------------+

To me, the issue is upfront on string-width wrongly detecting line width; someone already opened an issue there, but it seems the maintainer doesn't want to solve it because the length depends on the context and he's lacking test cases.

Maybe we could update the issue mentionned above to see if the maintainer of the package could do something about it; or we will have to write our own package...

vhf · 2020-01-18T16:58:28Z

@TitiAlone what about fixing it on our side by counting how many of these (except the only joiner one) there is in each cell (or line, I didn't check how grid-table is tokenized) and subtracting this number from the character count?

StaloneLab · 2020-01-18T17:26:42Z

We could, I thought about it, but found it very hacky; IMO it is a bad solution because if string-width corrects the bug one day, we will have the opposite problem (counting less characters than there really are).

vhf · 2020-01-18T18:12:00Z

Makes sense, and I think there will never be a definitive fix to this issue.

The 👨‍👨‍👧‍👦 emoji is 7 unicode characters. With the current version of remark-grid-table, we consider it to have width 2. This is arbitrary, it could also be seen as having width 1, 4, 7 or 11. It depends on the editor (or browser) in which you are writing your markdown grid-table, the font you are using, and other factors potentially.

Same goes for ZWNJ, it could either be 0 or 1 width (some editors display "weird" unicode characters as symbols (here it could be a tiny u+200c.))

IMO the priority should be fixing the crash, the string containing the ZWNJ should be eaten properly.

StaloneLab · 2020-01-18T19:14:47Z

Do we really need to check for lines width, by the way? We could simply disable the check and then have no issues anymore with the length of unicode characters, relying only on the count of |.

vhf · 2020-01-19T13:05:59Z

I'm not an expert on grid tables (I always avoided using them because it's too much work for me to write) but I'm a bit worried about cells spanning multiple columns.

Examples (paste it here: https://zestedesavoir.github.io/zmarkdown/ ):

+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
|   | a     |
+---+---+---+
|     b |   |
+-------+---+


+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
|   | a     |
+---+-------+
| b |       |
+---+-------+

What you are suggesting is worth trying IMO (if you have time), all I'm saying is that I cannot tell off the top of my head if it could still handle the above examples.

StaloneLab · 2020-01-19T13:58:58Z

I think you're right, we can't distinguish between the two examples without visual alignment... I'll try disabling the global check to see what it gives, and fall back to compensating the offset if it doesn't work.
I also noticed Pandoc was supporting grid tables, so I might have a look at how they're handling this case.

StaloneLab · 2020-01-20T10:09:11Z

I've got no failing tests when removing global length test, and it resolves the bug; question is about whether we want to allow cases like this one:

+---+---+---+
|      1 |       2 |         3 |
+---+---+---+

It renders ok, but we might consider it a bad practice. What do you think @artragis ?

vhf · 2020-01-20T22:48:59Z

I've got no failing tests when removing global length test, and it resolves the bug

Great! Could you please add a test for this case?

+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
|   | a     |
+---+---+---+
|     b |   |
+-------+---+


+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
|   | a     |
+---+-------+
| b |       |
+---+-------+

Because of the current cell-width constraints there was no reason to check this but we should make sure it works properly with your fix. (Many ways to test it, not sure what's the best one, checking that they don't return the same HTML might be enough.)

question is about whether we want to allow cases like this one:
+---+---+---+
|      1 |       2 |         3 |
+---+---+---+
It renders ok, but we might consider it a bad practice. What do you think @artragis ?

My vote goes to allowing it but I don't feel strongly about this. For some reason I doubt people would go out of their way to write something this ugly. 🙃

artragis · 2020-01-21T12:02:57Z

I think no crash is the only thing that zds needs. This width varying chars are a pain to manage so if you have something stable and predictible I take.
As @vhf is zmd maintainer let's go ahead as he told.

Drulac · 2020-07-01T08:56:10Z

Maybe we could introduce an other char like # (or another) to make a difference between these cases :

+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
|   | a     |
+---+---+---+
|     b |   |
+-------+---+


+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
|   | a     |
+---+-------+
| b |       |
+---+-------+

the new char could tell wich case take more than one column space :

+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
|   | a #   |
+---+---+---+
|   # b |   |
+-------+---+


+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
|   | a #   |
+---+-------+
| b |   #   |
+---+-------+

the user could put it everywhere in the cell (start, in part of the content, end of cell) ?

+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
|   |# a    |
+---+---+---+
| b # b |   |
+---+-------+
| c |      #|
+---+-------+

If we want a 3 columns large cell, juste add an #

+---+---+---+---+
| 1 | 2 | 3 | 4 |
+---+---+---+---+
|   | a #   #   |
+---+---+---+---+
| b # b |   |   |
+---+-------+---+
| c |   #   |   |
+---+-------+---+

# is maybe not the best char for this job 🤷‍♀️

What do you think of this idea ?

StaloneLab · 2020-07-01T09:06:21Z

@Drulac : the problem you're refering to is not clear to me; the two test cases you submitted do render differently by now, so I don't know why we would bother adding an extra character for something that already works (you can test it on the live demo).

Drulac · 2020-07-01T09:24:40Z

for example in this case the render is not what I've expected :S

+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| **gh**  | **a**     |
+---+-------+
| b |       |
+---+-------+

of course if I format the table "strictly" it will render as expected :

+---------+---+---+
| 1       | 2 | 3 |
+---------+---+---+
| **gh**  | **a** |
+---------+---+---+
| b       |       |
+---------+---+---+

but it's so longer to write and align perfectly at each modification of the cells

The use of an other char could really improve the speed of writing complex tables

It also could allow to put inline math formules inside complex tables ?

StaloneLab · 2020-07-01T09:32:32Z

To me, your first usage is incorrect; we simply allowed it because it was easier for Unicode manipulation of varying-width characters; this is not a bug, and could hardly argue this is a great feature request...

Right now, I'm against having this kind of character that would complexify the parsing (already way too complex). Tables are expected to be written correctly, and your first example is behaviour undefined, maybe one day it'll render the other way around, no one knows 'cause no one is expected to write this kind of stuff.

StaloneLab mentioned this issue Jan 21, 2020

Allow varying width table #370

Merged

vhf closed this as completed Jan 21, 2020

vhf added this to the zmarkdown@7.0.0 milestone Jan 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remark-grid-table: not breakable spaces break the parsing #369

remark-grid-table: not breakable spaces break the parsing #369

artragis commented Jan 8, 2020 •

edited

Loading

artragis commented Jan 8, 2020 •

edited

Loading

StaloneLab commented Jan 8, 2020

StaloneLab commented Jan 17, 2020

vhf commented Jan 18, 2020

StaloneLab commented Jan 18, 2020 •

edited

Loading

vhf commented Jan 18, 2020 •

edited

Loading

StaloneLab commented Jan 18, 2020

vhf commented Jan 19, 2020

StaloneLab commented Jan 19, 2020

StaloneLab commented Jan 20, 2020

vhf commented Jan 20, 2020 •

edited

Loading

artragis commented Jan 21, 2020

Drulac commented Jul 1, 2020

StaloneLab commented Jul 1, 2020

Drulac commented Jul 1, 2020 •

edited

Loading

StaloneLab commented Jul 1, 2020

remark-grid-table: not breakable spaces break the parsing #369

remark-grid-table: not breakable spaces break the parsing #369

Comments

artragis commented Jan 8, 2020 • edited Loading

given

I expect

But I get

extra

artragis commented Jan 8, 2020 • edited Loading

StaloneLab commented Jan 8, 2020

StaloneLab commented Jan 17, 2020

vhf commented Jan 18, 2020

StaloneLab commented Jan 18, 2020 • edited Loading

vhf commented Jan 18, 2020 • edited Loading

StaloneLab commented Jan 18, 2020

vhf commented Jan 19, 2020

StaloneLab commented Jan 19, 2020

StaloneLab commented Jan 20, 2020

vhf commented Jan 20, 2020 • edited Loading

artragis commented Jan 21, 2020

Drulac commented Jul 1, 2020

StaloneLab commented Jul 1, 2020

Drulac commented Jul 1, 2020 • edited Loading

StaloneLab commented Jul 1, 2020

artragis commented Jan 8, 2020 •

edited

Loading

artragis commented Jan 8, 2020 •

edited

Loading

StaloneLab commented Jan 18, 2020 •

edited

Loading

vhf commented Jan 18, 2020 •

edited

Loading

vhf commented Jan 20, 2020 •

edited

Loading

Drulac commented Jul 1, 2020 •

edited

Loading