Enable correct whitespace handling in write_html() #547

gmischler · 2022-09-18T10:23:51Z

Technically, this is also a bug report...

Problem

It is one of the characteristics of the HTML format that you can add all kinds of whitespace (spaces, tabs, newlines, etc.) in any amount pretty much anywhere. When the data is rendered by a conforming web browser, all that whitespace gets collapsed into a single space character between the actually printing data to the left and right of it.

The fpdf2 HTML parser currently completely ignores this rather important rule.
Consequently, HTML documents rendered with fpdf2 tend to look very ugly.

Sub-Problem

The render_toc() method even makes deliberate use of this un-feature, and uses sequences of space characters to seperate the labels and associated page numbers from each other. Of course, this only "kind of" works with a monospaced font, and ends up looking horribly out of whack with any normal font.

Solution

Within a single data sequence, a regex might be able to do the collapsing.
Beween seperate data sequences (ie. next to any HTML tag), it is probably necessary to maintain a flag, which stores whether the previous data sequence ended with a space. If so, then any space at the beginning of the next data sequence should be skipped.
Note that eg. <pre>...</pre> and <code>...</code> sequences are exceptions that need to be passed through unchanged.

Sub-Solution

The render_toc() method needs to be changed, so that it renders the text and page number of each entry as a seperate [multi_]cell() in its own well-defined horizontal position.

Bonus points for

enabling multi-column TOC layout
an option to draw a dotted line between labels and page numbers in the TOC
Correct handling of non-breaking space characters and other special cases

Anyone up for a cleanup task like this?
PS.: The TOC could be fixed idependently of the general whitespace issue, but if so, then that must happen first.

The text was updated successfully, but these errors were encountered:

ehildebrandtrojo · 2022-10-01T06:27:38Z

Hey! I'm down to give it a shot; can you assign me the task.

FYI I'm new to contributing to open source, so don't be surprised if I end up asking some basic questions along the way.

Lucas-C · 2022-10-03T08:04:04Z

Hi and welcome @ehildebrandtrojo !
Task assigned 😊
Open source newcomers are welcome here, feel free to asky any question!

Lucas-C · 2022-10-17T06:58:09Z

Hi @ehildebrandtrojo!
Have you started codding on this?
If you face any difficulty, feel free to report them and ak for help!

Those pages are good starting points if you are new to open source:

ehildebrandtrojo · 2022-10-17T14:44:55Z

Thank you for the resources Lucas!
Unfortunately I've been really busy with clases and I still haven't had the chance to start.
In the next few days I should be able to get started though. If I can't, I'll let you know, and you can let someone else take the ticket.
Sorry for any inconveniences caused.

Lucas-C · 2022-10-17T16:24:53Z

Unfortunately I've been really busy with clases and I still haven't had the chance to start.
In the next few days I should be able to get started though. If I can't, I'll let you know, and you can let someone else take the ticket.
Sorry for any inconveniences caused.

Don't worry, that's perfectly fine 😊
This is volunteer work, no pressure!

ehildebrandtrojo · 2022-10-24T02:49:42Z

Hey @Lucas-C!
Unfortunately, I don't think I have the time to work on this issue.
I'm graduating this semester and so, classes, along with job search, are taking up a lot of my time.
Still, I wanted to thank you for creating such a welcoming environment for first time contributors.
I hope that once I'm done with the semester, I can start contributing to more open source projects like this one!

Lucas-C · 2022-10-24T09:06:21Z

Unfortunately, I don't think I have the time to work on this issue.

Ok, no worries!

I'm graduating this semester and so, classes, along with job search, are taking up a lot of my time.

Good look with all of that 😊

I hope that once I'm done with the semester, I can start contributing to more open source projects like this one!

🤞

seanpmulholland · 2022-11-11T19:14:08Z

Hi all, working on this now. Relatively new to OS contributions so let me know if anything looks incorrect!

I have the whitespace issue mostly sorted. Had a question on tests though - would it be preferred to replace the test.pdf files with updated outcomes or to force the code in the tests to display the old extra whitespace?
example:

<h1>H1 {long_title*2}</h1> <- test case with 3 spaces

old html_headings_line_height test

updated html_headings_line_height test <- trimmed to 1 space

Sub-Problem

Just getting familiar with the code so I might be mistaken but it looks like the ToC rendering happens outside of html processing and writes directly to the pdf. So this change does not break ToC rendering, except for the indentation in the test case 'Table of content' header, since that is now trimmed html whitespace.

before:

after:

Let me know how to best handle the affected tests and I'll sort that out and open a pull req!

gmischler · 2022-11-12T12:08:32Z

would it be preferred to replace the test.pdf files with updated outcomes or to force the code in the tests to display the old extra whitespace?

The current tests just verify the existing behaviour, which at the time was considered "good enough" but is technically incorrect.
Once the code is fixed to produce the correct result, the tests should of course also reflect that and verify the corrected output.
There is no point in "veryfying" the previously incorrect output even after it has been corrected. So yes, you'll have to adapt all the affected tests.

seanpmulholland · 2022-11-12T15:39:52Z

Got it, thanks! PR opened

Fixes #547

gmischler added the enhancement label Sep 18, 2022

Lucas-C added bug html good first issue up-for-grabs hacktoberfest labels Sep 18, 2022

Lucas-C assigned ehildebrandtrojo Oct 3, 2022

Lucas-C unassigned ehildebrandtrojo Oct 25, 2022

seanpmulholland mentioned this issue Nov 12, 2022

Enable correct whitespace handling in write_html() #602

Merged

5 tasks

Lucas-C closed this as completed in #602 Nov 18, 2022

Lucas-C pushed a commit that referenced this issue Nov 18, 2022

Enable correct whitespace handling in write_html() (#602)

3a8d446

Fixes #547

dmail00 mentioned this issue Dec 13, 2023

Leading spaces on new lines ignored in <pre><code> tags #1063

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable correct whitespace handling in write_html() #547

Enable correct whitespace handling in write_html() #547

gmischler commented Sep 18, 2022 •

edited

Loading

ehildebrandtrojo commented Oct 1, 2022

Lucas-C commented Oct 3, 2022

Lucas-C commented Oct 17, 2022

ehildebrandtrojo commented Oct 17, 2022

Lucas-C commented Oct 17, 2022

ehildebrandtrojo commented Oct 24, 2022

Lucas-C commented Oct 24, 2022

seanpmulholland commented Nov 11, 2022 •

edited

Loading

gmischler commented Nov 12, 2022

seanpmulholland commented Nov 12, 2022

Enable correct whitespace handling in write_html() #547

Enable correct whitespace handling in write_html() #547

Comments

gmischler commented Sep 18, 2022 • edited Loading

Problem

Sub-Problem

Solution

Sub-Solution

Bonus points for

ehildebrandtrojo commented Oct 1, 2022

Lucas-C commented Oct 3, 2022

Lucas-C commented Oct 17, 2022

ehildebrandtrojo commented Oct 17, 2022

Lucas-C commented Oct 17, 2022

ehildebrandtrojo commented Oct 24, 2022

Lucas-C commented Oct 24, 2022

seanpmulholland commented Nov 11, 2022 • edited Loading

Sub-Problem

gmischler commented Nov 12, 2022

seanpmulholland commented Nov 12, 2022

gmischler commented Sep 18, 2022 •

edited

Loading

seanpmulholland commented Nov 11, 2022 •

edited

Loading