Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#60] Fix html anchors #61

Merged
merged 2 commits into from
Jan 2, 2023
Merged

[#60] Fix html anchors #61

merged 2 commits into from
Jan 2, 2023

Conversation

gitsandi
Copy link
Contributor

@gitsandi gitsandi commented Jan 2, 2023

For the very first rst file toc entry, Sphinx doesn't add an anchor, but instead uses file's title.
Because of that, some links in Table of contents of PDF document are not correct, resulting in wrong page numbers.
Fix replaces all such anchors with the correct ones, after the creation of HTML document.

For the very first rst file toc entry, Sphinx doesn't add an anchor,
but instead uses file's title.
Because of that, some links in Table of contents of PDF document
are not correct, resulting in wrong page numbers.
Fix replaces all such anchors with the correct ones, after the
creation of HTML document.
@danwos
Copy link
Member

danwos commented Jan 2, 2023

Thanks for the PR. 👍
Looks really clean and compact :)

Can you mention the fix in docs/changelog.rst?
Feel also free to add yourself to the AUTHORS file.

@danwos danwos merged commit 5cefc5e into useblocks:main Jan 2, 2023
@gitsandi gitsandi deleted the fix-html-anchors branch January 2, 2023 22:48
@gitsandi gitsandi restored the fix-html-anchors branch January 2, 2023 22:49
@gitsandi gitsandi deleted the fix-html-anchors branch January 2, 2023 22:49
@kreuzberger
Copy link
Contributor

kreuzberger commented Jul 27, 2023

Currently i ran into problems with this fix due to Chapters/Sections containing ReST substitutions.
Eg.

===========================
|doc_type_name| |project|
============================

Will lead to ids "doc-type-name-project", but the "parsing" of the index html leads to "substituted" text. E.g. link to "Help TrafficLight" if doc_type_name is substitued by "Help" etc.

I think the linking is no problem, the problem is that the id generated and refered is the id of a span element and the heading is on a new page due to the page-break-before css property in sphinx-simplepdf main.css

So using substitutions in headers should be usable cause they are a sphinx feature and the toctree fix should not be applied, maybe the css should be improved.

@kreuzberger
Copy link
Contributor

The toctree fix also breaks documents / ReST files with "sections" not made in pure ReST, e.g if the section comes from "custom directives".

E.g. in a test procedure document, i add sections via my own directives.

The document itself has no title / section in ReST. So the first title tocfix does generate a an invalid link

@kreuzberger
Copy link
Contributor

During investigation of this issue i found it somewhere difficult to handle properly the page-breaks for h1/h2 for all purposes (cover yes/no, sidebar yes/no) for the first occuring headings of this levels in the body.

Therefore i would suggest to enumerate them to handle the page-breaks properly in the css.

diff --git a/sphinx_simplepdf/builders/simplepdf.py b/sphinx_simplepdf/builders/simplepdf.py
index 7ac4427..d4502d5 100644
--- a/sphinx_simplepdf/builders/simplepdf.py
+++ b/sphinx_simplepdf/builders/simplepdf.py
@@ -7,8 +7,6 @@ import weasyprint
 import sass
 
 from bs4 import BeautifulSoup
-from docutils.nodes import make_id
-
 
 from sphinx import __version__
 from sphinx.application import Sphinx
@@ -170,8 +168,16 @@ class SimplePdfBuilder(SingleFileHTMLBuilder):
             links = sidebar.find_all("a", class_="reference internal")
             for link in links:
                 link["href"] = link["href"].replace(f"{self.app.config.root_doc}.html", "")
-                if link["href"].startswith("#document-"):
-                    link["href"] = "#" + make_id(link.text)
+
+        for heading_tag in ['h1', 'h2']:
+            logger.debug(f"search heading {heading_tag}")
+            headings = soup.find_all(heading_tag,  class_="")
+            number = 0
+            for heading in headings:
+                logger.debug(f"found heading {heading.attrs}")
+                if not heading.has_attr("id"):
+                    heading.attrs["id"]=f"{heading_tag}-{number}"
+                    number = number + 1
 
         return soup.prettify(formatter="html")

This would ensure to properly identify the headings in css and handle the prage-breaks properly

As an example my custom css for page page handling

/*break before body after toc to ensure toc page fix */
div.body {
    break-before: always;
}

/* do not repeat title in body, already in cover */
div.body h1{
  display: none;
}

/*no additional page breaks for first h1 in body */
#h1-0 {
    page-break-before: avoid;
    break-before:avoid;
}

/*no additional page breaks for first h2 in body */
#h2-0 {
    page-break-before: avoid;
    break-before:avoid;
}

css selectors like h2:first-of-type and others (nth-child) didnt work for me.

Maybe the handling could implemented in an other ways (class instead of id, stop enumeration after first element etc).

@danwos
Copy link
Member

danwos commented Jul 28, 2023

Can you create a new issue for this?
It's hard to follow a discussion in an already merged PR :)

I'm not sure if I like the idea of enumerating the headings, as it is not a fixed, reproducible ID.
If a new heading gets added, all the following numbers will change and the CSS is no longer valid.

But I don't have a good idea for a different solution.
Maybe a hash value could be calculated based on the heading text and then set as ID.
But this is also not really a fixed ID.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants