Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOC Tree Fix and Page Break Handling for document handling #82

Closed
kreuzberger opened this issue Jul 28, 2023 · 7 comments
Closed

TOC Tree Fix and Page Break Handling for document handling #82

kreuzberger opened this issue Jul 28, 2023 · 7 comments

Comments

@kreuzberger
Copy link
Contributor

Issue #61 breaks some documents with following conditions:

  • The heading in the rst file has substitutions
  • The heading in the rst file comes from a directive.

So the fix breaks all documents there anchor and text do not match.

Intentionaly the fix tried to solve the issue that the first toctree entry points to the file itself, not to the first section in the file.
From Sphinx point of view this is ok, cause the file is not required to have an heading, but should appear in the toc.

In my opinion the fix could be also done by breaking the document after the body

element or at least at the toctree-wrappped div.

Cause under some circumstances (cover page yes/no, sidebar yes/no) this could lead to additional blank pages due to h1/h2 page-breaks (from simplepdf's main.css).

Therefore following solution works for me:

  1. break the document BEFORE the first heading, e.g. in <div class="body">.
  2. Disable page breaks on first occurence of h1/h2 in the document body.

For Step2 is was not able to identify the first h1 / h2 properly e.g. with h2:first-of-type or other selectors.
Therefore i suggest do add a unique id to the first h1 / h2 elements in the body.

This is for handling the document page-breaks after toc/cover only.

Further h1 h2 elements are / should not be covered. Page breaking should then be done generally (like from main.css) or individual by adding page breaks in the rst document itself.

@kreuzberger
Copy link
Contributor Author

kreuzberger commented Jul 28, 2023

The following code shows the enumeration of the headings and the use of them in the css.
It is not necessary to enumerate ALL headings, in my opinion the first one should be sufficient.
OR anybody has a working css solution with some selectors that selects the first element in the body with weasyprint properly.

diff --git a/sphinx_simplepdf/builders/simplepdf.py b/sphinx_simplepdf/builders/simplepdf.py
index 7ac4427..2ff5c8f 100644
--- a/sphinx_simplepdf/builders/simplepdf.py
+++ b/sphinx_simplepdf/builders/simplepdf.py
@@ -7,8 +7,6 @@ import weasyprint
 import sass
 
 from bs4 import BeautifulSoup
-from docutils.nodes import make_id
-
 
 from sphinx import __version__
 from sphinx.application import Sphinx
@@ -170,8 +168,13 @@ class SimplePdfBuilder(SingleFileHTMLBuilder):
             links = sidebar.find_all("a", class_="reference internal")
             for link in links:
                 link["href"] = link["href"].replace(f"{self.app.config.root_doc}.html", "")
-                if link["href"].startswith("#document-"):
-                    link["href"] = "#" + make_id(link.text)
+
+        for heading_tag in ['h1', 'h2']:
+            logger.debug(f"search heading {heading_tag}")
+            heading = soup.find(heading_tag,  class_="")
+            logger.debug(f"found heading {heading.attrs}")
+            if not heading.has_attr("id"):
+                heading.attrs["id"]=f"{heading_tag}-0"
 
         return soup.prettify(formatter="html")

This would ensure to properly identify the headings in css and handle the prage-breaks properly

As an example my custom css for page page handling

/*break before body after toc to ensure toc page fix */
div.body {
    break-before: always;
}

/* do not repeat title in body, already in cover */
div.body h1{
  display: none;
}

/*no additional page breaks for first h1 in body */
#h1-0 {
    page-break-before: avoid;
    break-before:avoid;
}

/*no additional page breaks for first h2 in body */
#h2-0 {
    page-break-before: avoid;
    break-before:avoid;
}

kreuzberger added a commit to procitec/sphinx-simplepdf that referenced this issue Jul 28, 2023
@kreuzberger
Copy link
Contributor Author

kreuzberger commented Jul 28, 2023

reply to @danwos comments: #61 (comment) to have all in this new issue

The main issue for me here is to have e.g. identical layouts in css files of same types, i.e. all of my "Online" help files (help.css) or my specification documents (specification.css). I would not like to rely on the content and there for a generic id for the first header in the body would help me to adress this correctly.

If i would require individual, content based identifiers i would use css selectors from section.ids > h2 or something like that. The sections have ids with content dependent ids.

As i stated above, this could maybe also done via other methods, but all of my tries with selectors
first-of-type, nth, nth-child on h2 or nested with the divs and sections weren't successfull.

More or less important is to "revert" the #61 cause this breaks the tocs and get weasyprint warnings due to unresolved anchors. But reverting "without" any other changes would lead to the original problem again.

@danwos
Copy link
Member

danwos commented Jul 28, 2023

Good point.
Maybe instead of using the id, we could set classes for:

  • first
  • last
  • even/odd

@kreuzberger
Copy link
Contributor Author

id or classes i don't care, i am no html expert which could start a discussion of what is better/semantical correct.

Headings with even/odd may help for two page layout. So ok, i would implement it as classes.

kreuzberger added a commit to procitec/sphinx-simplepdf that referenced this issue Jul 28, 2023
kreuzberger added a commit to procitec/sphinx-simplepdf that referenced this issue Jul 28, 2023
kreuzberger added a commit to procitec/sphinx-simplepdf that referenced this issue Jul 28, 2023
kreuzberger added a commit to procitec/sphinx-simplepdf that referenced this issue Jul 28, 2023
kreuzberger added a commit to procitec/sphinx-simplepdf that referenced this issue Jul 28, 2023
danwos pushed a commit that referenced this issue Aug 14, 2023
* [#82] add identifier for first h1/h2 headings

* [#82] use class attribute instead of id, at class for first, even, odd, last

* [#82] change page break behaviour in css for fix of #60
@kreuzberger
Copy link
Contributor Author

During merging those several PR's something went wrong in conflict resolving. There are missing two lines from the original fix. I open a new PR to include this again

@kreuzberger
Copy link
Contributor Author

related last commit is 5991ed4.

Issue could now be closed. Improvement / Wishes could be to extend this to more than h1, h2 levels. Maybe this should not be "hardcoded", maybe this should be configurable or just applied to ALL headers. Waiting for feature request 😄

@danwos
Copy link
Member

danwos commented Aug 24, 2023

Ok, I close this issue. Feel free to reopen if feature requests show up :)
And thanks for the implementation.

@danwos danwos closed this as completed Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants