[#60] Fix html anchors #61

gitsandi · 2023-01-02T18:48:54Z

For the very first rst file toc entry, Sphinx doesn't add an anchor, but instead uses file's title.
Because of that, some links in Table of contents of PDF document are not correct, resulting in wrong page numbers.
Fix replaces all such anchors with the correct ones, after the creation of HTML document.

For the very first rst file toc entry, Sphinx doesn't add an anchor, but instead uses file's title. Because of that, some links in Table of contents of PDF document are not correct, resulting in wrong page numbers. Fix replaces all such anchors with the correct ones, after the creation of HTML document.

danwos · 2023-01-02T19:20:17Z

Thanks for the PR. 👍
Looks really clean and compact :)

Can you mention the fix in docs/changelog.rst?
Feel also free to add yourself to the AUTHORS file.

kreuzberger · 2023-07-27T11:27:48Z

Currently i ran into problems with this fix due to Chapters/Sections containing ReST substitutions.
Eg.

===========================
|doc_type_name| |project|
============================

Will lead to ids "doc-type-name-project", but the "parsing" of the index html leads to "substituted" text. E.g. link to "Help TrafficLight" if doc_type_name is substitued by "Help" etc.

I think the linking is no problem, the problem is that the id generated and refered is the id of a span element and the heading is on a new page due to the page-break-before css property in sphinx-simplepdf main.css

So using substitutions in headers should be usable cause they are a sphinx feature and the toctree fix should not be applied, maybe the css should be improved.

kreuzberger · 2023-07-27T12:41:14Z

The toctree fix also breaks documents / ReST files with "sections" not made in pure ReST, e.g if the section comes from "custom directives".

E.g. in a test procedure document, i add sections via my own directives.

The document itself has no title / section in ReST. So the first title tocfix does generate a an invalid link

kreuzberger · 2023-07-28T05:55:04Z

During investigation of this issue i found it somewhere difficult to handle properly the page-breaks for h1/h2 for all purposes (cover yes/no, sidebar yes/no) for the first occuring headings of this levels in the body.

Therefore i would suggest to enumerate them to handle the page-breaks properly in the css.

diff --git a/sphinx_simplepdf/builders/simplepdf.py b/sphinx_simplepdf/builders/simplepdf.py
index 7ac4427..d4502d5 100644
--- a/sphinx_simplepdf/builders/simplepdf.py
+++ b/sphinx_simplepdf/builders/simplepdf.py
@@ -7,8 +7,6 @@ import weasyprint
 import sass
 
 from bs4 import BeautifulSoup
-from docutils.nodes import make_id
-
 
 from sphinx import __version__
 from sphinx.application import Sphinx
@@ -170,8 +168,16 @@ class SimplePdfBuilder(SingleFileHTMLBuilder):
             links = sidebar.find_all("a", class_="reference internal")
             for link in links:
                 link["href"] = link["href"].replace(f"{self.app.config.root_doc}.html", "")
-                if link["href"].startswith("#document-"):
-                    link["href"] = "#" + make_id(link.text)
+
+        for heading_tag in ['h1', 'h2']:
+            logger.debug(f"search heading {heading_tag}")
+            headings = soup.find_all(heading_tag,  class_="")
+            number = 0
+            for heading in headings:
+                logger.debug(f"found heading {heading.attrs}")
+                if not heading.has_attr("id"):
+                    heading.attrs["id"]=f"{heading_tag}-{number}"
+                    number = number + 1
 
         return soup.prettify(formatter="html")

This would ensure to properly identify the headings in css and handle the prage-breaks properly

As an example my custom css for page page handling

/*break before body after toc to ensure toc page fix */
div.body {
    break-before: always;
}

/* do not repeat title in body, already in cover */
div.body h1{
  display: none;
}

/*no additional page breaks for first h1 in body */
#h1-0 {
    page-break-before: avoid;
    break-before:avoid;
}

/*no additional page breaks for first h2 in body */
#h2-0 {
    page-break-before: avoid;
    break-before:avoid;
}

css selectors like h2:first-of-type and others (nth-child) didnt work for me.

Maybe the handling could implemented in an other ways (class instead of id, stop enumeration after first element etc).

danwos · 2023-07-28T06:07:11Z

Can you create a new issue for this?
It's hard to follow a discussion in an already merged PR :)

I'm not sure if I like the idea of enumerating the headings, as it is not a fixed, reproducible ID.
If a new heading gets added, all the following numbers will change and the CSS is no longer valid.

But I don't have a good idea for a different solution.
Maybe a hash value could be calculated based on the heading text and then set as ID.
But this is also not really a fixed ID.

[useblocks#60] Update changelog.rst

16f1b7b

danwos merged commit 5cefc5e into useblocks:main Jan 2, 2023

gitsandi deleted the fix-html-anchors branch January 2, 2023 22:48

gitsandi restored the fix-html-anchors branch January 2, 2023 22:49

gitsandi deleted the fix-html-anchors branch January 2, 2023 22:49

kreuzberger mentioned this pull request Jul 28, 2023

TOC Tree Fix and Page Break Handling for document handling #82

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#60] Fix html anchors #61

[#60] Fix html anchors #61

gitsandi commented Jan 2, 2023

danwos commented Jan 2, 2023

kreuzberger commented Jul 27, 2023 •

edited

Loading

kreuzberger commented Jul 27, 2023

kreuzberger commented Jul 28, 2023

danwos commented Jul 28, 2023 •

edited

Loading

[#60] Fix html anchors #61

[#60] Fix html anchors #61

Conversation

gitsandi commented Jan 2, 2023

danwos commented Jan 2, 2023

kreuzberger commented Jul 27, 2023 • edited Loading

kreuzberger commented Jul 27, 2023

kreuzberger commented Jul 28, 2023

danwos commented Jul 28, 2023 • edited Loading

kreuzberger commented Jul 27, 2023 •

edited

Loading

danwos commented Jul 28, 2023 •

edited

Loading