-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable correct whitespace handling in write_html() #547
Comments
Hey! I'm down to give it a shot; can you assign me the task. FYI I'm new to contributing to open source, so don't be surprised if I end up asking some basic questions along the way. |
Hi and welcome @ehildebrandtrojo ! |
Hi @ehildebrandtrojo! Those pages are good starting points if you are new to open source: |
Thank you for the resources Lucas! |
Don't worry, that's perfectly fine 😊 |
Hey @Lucas-C! |
Ok, no worries!
Good look with all of that 😊
🤞 |
The current tests just verify the existing behaviour, which at the time was considered "good enough" but is technically incorrect. |
Got it, thanks! PR opened |
Technically, this is also a bug report...
Problem
It is one of the characteristics of the HTML format that you can add all kinds of whitespace (spaces, tabs, newlines, etc.) in any amount pretty much anywhere. When the data is rendered by a conforming web browser, all that whitespace gets collapsed into a single space character between the actually printing data to the left and right of it.
The fpdf2 HTML parser currently completely ignores this rather important rule.
Consequently, HTML documents rendered with fpdf2 tend to look very ugly.
Sub-Problem
The
render_toc()
method even makes deliberate use of this un-feature, and uses sequences of space characters to seperate the labels and associated page numbers from each other. Of course, this only "kind of" works with a monospaced font, and ends up looking horribly out of whack with any normal font.Solution
Within a single data sequence, a regex might be able to do the collapsing.
Beween seperate data sequences (ie. next to any HTML tag), it is probably necessary to maintain a flag, which stores whether the previous data sequence ended with a space. If so, then any space at the beginning of the next data sequence should be skipped.
Note that eg.
<pre>...</pre>
and<code>...</code>
sequences are exceptions that need to be passed through unchanged.Sub-Solution
The
render_toc()
method needs to be changed, so that it renders the text and page number of each entry as a seperate[multi_]cell()
in its own well-defined horizontal position.Bonus points for
Anyone up for a cleanup task like this?
PS.: The TOC could be fixed idependently of the general whitespace issue, but if so, then that must happen first.
The text was updated successfully, but these errors were encountered: