Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ebook fix: remove stray markup from the text #172

Merged
merged 1 commit into from
Aug 27, 2024
Merged

ebook fix: remove stray markup from the text #172

merged 1 commit into from
Aug 27, 2024

Conversation

jeremyschlatter
Copy link

Hello! Thanks for putting so much work into formatting HPMOR nicely. I'm re-reading it in ebook format in Apple Books right now and really enjoying it.

I noticed some stray markup in chapter 23:

Screenshot 2024-08-26 at 7 53 19 PM

"plus .5minus 1" clearly doesn't belong there. I looked into it and you can see the source of this markup in the original latex:

And Harry brought out the original parchment with the hypotheses, and began scribbling.
\vskip 1\baselineskip plus .5\textheight minus 1\baselineskip
\savetrivseps
\setlength{\topsep}{0pt}
\setlength{\partopsep}{0pt}
\begin{centering}
\begin{samepage}
\scshape Observation:
\itshape Wizardry isn’t as powerful now as it was when Hogwarts was founded.
\end{samepage}
\vskip 1\baselineskip plus .5\textheight minus 1\baselineskip

The problem happens in step_3.py, which tries to remove this markup completely:

# \vskip 1\baselineskip plus .5\textheight minus 1\baselineskip
cont = re.sub(r"\\vskip .*?\\baselineskip", "", cont)

But because the regex uses a minimal quantifier instead of a greedy one, it only matches "\vskip 1\baselineskip", stopping at the first "\baselineskip" instead of the one at the end of the line. This leaves the errant bit of markup in the text:

plus .5\textheight minus 1\baselineskip

Which, because it does not start with a backslash, ends up inserted into the content.

The fix is simple: remove the '?', turning the ".*" into a greedy match instead of a minimal match. This matches to the end of the last "\baselineskip", completely removing this bit of markup from the text as intended.


Note: I only made this change because of the issue in chapter 23, but it also comes up in a few other places. Here's the full diff of tmp/hpmor-epub-3-flatten-mod.tex. I think these are all good changes, but I'm not sure. I'm not that familiar with the markup and I haven't figured out how to fully render the ebook yet to check how it looks:

# diff new/hpmor-epub-3-flatten-mod.tex old/hpmor-epub-3-flatten-mod.tex
164c164
< \newenvironment{writtenNote}{\fontspec[ExternalLocation]{Graphe_Alpha_alt.ttf}\scriptsize \renewcommand{\emph}{\uline}}\itshape }
---
> \newenvironment{writtenNote}{\fontspec[ExternalLocation]{Graphe_Alpha_alt.ttf}\scriptsize \renewcommand{\emph}{\uline} plus .1\baselineskip minus .1\baselineskip \begin{adjustwidth}{\parindent}{\parindent}\par\setlength{\parindent}{0pt}\setlength{\parskip}{\baselineskip}\itshape }
166c166
< \end{adjustwidth} }
---
> \end{adjustwidth} plus 1\baselineskip minus 1\baselineskip }
9255c9255
<
---
>  plus .5\textheight minus 1\baselineskip
9268c9268
<
---
>  plus .5\textheight minus 1\baselineskip
9287c9287
<
---
>  plus .5\textheight minus 1\baselineskip
9306c9306
<
---
>  plus .5\textheight minus 1\baselineskip
23525c23525
< \newcommand{\OmakeIVspecialsection}[2][1.5]{\vspace*{2\baselineskip plus 1\baselineskip minus 1\baselineskip}\noindent\hfill\scalebox{#1}{#2}\hfill\mbox{} \@afterindentfalse\@afterheading
---
> \newcommand{\OmakeIVspecialsection}[2][1.5]{\vspace*{2\baselineskip plus 1\baselineskip minus 1\baselineskip}\noindent\hfill\scalebox{#1}{#2}\hfill\mbox{} plus 1\baselineskip \@afterindentfalse\@afterheading

Before this commit, chapter 23 of the ebook had some spurious markup
that found its way into the readable text:

    And Harry brought out the original parchment with the hypotheses,
    and began scribbling.

    plus .5minus 1

    Observation:

    Wizardry isn’t as powerful now as it was when Hogwarts was founded.

    plus .5minus 1

You can see the source of this markup in the comment immediately above
the line changed in this commit:

    # \vskip 1\baselineskip plus .5\textheight minus 1\baselineskip

The problem is this regex:

    "\\vskip .*?\\baselineskip"

In the example above, it matches `"\vskip 1\baselineskip"`, stopping at
the first `"\baselineskip"` instead of the one at the end of the line.
This leaves the errant bit of markup in the text:

    plus .5\textheight minus 1\baselineskip

Which, because it does not start with a backslash, ends up inserted into
the content.

The fix is simple: remove the `'?'`, turning the `".*"` into a greedy
match instead of a minimal match. This matches to the end of the last
`"\baselineskip"`, completely removing this bit of markup from the text
as intended.
@entorb
Copy link
Collaborator

entorb commented Aug 27, 2024

Dear @jeremyschlatter, thank you very much for the fix of yet another Ubuntu 24.04 update leftover! Looks great, will merge now.

PS: The easiest way to create the eBook is via Docker, see comments at the end of https://github.com/rrthomas/hpmor/blob/main/Dockerfile .

@entorb entorb merged commit 9c0b22c into rrthomas:main Aug 27, 2024
@jeremyschlatter jeremyschlatter deleted the fix-vskip branch August 27, 2024 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants