-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DocBook reader misses some character entities like $
#7938
Comments
I think |
Right, |
@tarleb wrote:
It is actually defined, in file: https://docbook.org/xml/4.2/ent/iso-num.ent Is the DTD given in the xml file ignored by pandoc? |
Oh, I missed that. Yes, I think we ignore currently ignore the DTD. |
Ok, thanks @tarleb, for the investigation. Then I suppose the short-term solution is to work around this by search-and-replace... |
We can hard-code support for these entities for docbook: <!-- iso-num.ent (initially distributed with DocBook XML DTD V4.1.1beta1) -->
<!-- Derived from the corresponding ISO 8879 standard entity set
and the Unicode character mappings provided by Sebastian Rahtz -->
<!ENTITY half "½"> <!-- VULGAR FRACTION ONE HALF -->
<!ENTITY frac12 "½"> <!-- VULGAR FRACTION ONE HALF -->
<!ENTITY frac14 "¼"> <!-- VULGAR FRACTION ONE QUARTER -->
<!ENTITY frac34 "¾"> <!-- VULGAR FRACTION THREE QUARTERS -->
<!ENTITY frac18 "⅛"> <!-- -->
<!ENTITY frac38 "⅜"> <!-- -->
<!ENTITY frac58 "⅝"> <!-- -->
<!ENTITY frac78 "⅞"> <!-- -->
<!ENTITY sup1 "¹"> <!-- SUPERSCRIPT ONE -->
<!ENTITY sup2 "²"> <!-- SUPERSCRIPT TWO -->
<!ENTITY sup3 "³"> <!-- SUPERSCRIPT THREE -->
<!ENTITY plus "+"> <!-- PLUS SIGN -->
<!ENTITY plusmn "±"> <!-- PLUS-MINUS SIGN -->
<!ENTITY lt "&#60;"> <!-- LESS-THAN SIGN -->
<!ENTITY equals "="> <!-- EQUALS SIGN -->
<!ENTITY gt ">"> <!-- GREATER-THAN SIGN -->
<!ENTITY divide "÷"> <!-- DIVISION SIGN -->
<!ENTITY times "×"> <!-- MULTIPLICATION SIGN -->
<!ENTITY curren "¤"> <!-- CURRENCY SIGN -->
<!ENTITY pound "£"> <!-- POUND SIGN -->
<!ENTITY dollar "$"> <!-- DOLLAR SIGN -->
<!ENTITY cent "¢"> <!-- CENT SIGN -->
<!ENTITY yen "¥"> <!-- YEN SIGN -->
<!ENTITY num "#"> <!-- NUMBER SIGN -->
<!ENTITY percnt "%"> <!-- PERCENT SIGN -->
<!ENTITY amp "&#38;"> <!-- AMPERSAND -->
<!ENTITY ast "*"> <!-- ASTERISK OPERATOR -->
<!ENTITY commat "@"> <!-- COMMERCIAL AT -->
<!ENTITY lsqb "["> <!-- LEFT SQUARE BRACKET -->
<!ENTITY bsol "\"> <!-- REVERSE SOLIDUS -->
<!ENTITY rsqb "]"> <!-- RIGHT SQUARE BRACKET -->
<!ENTITY lcub "{"> <!-- LEFT CURLY BRACKET -->
<!ENTITY horbar "―"> <!-- HORIZONTAL BAR -->
<!ENTITY verbar "|"> <!-- VERTICAL LINE -->
<!ENTITY rcub "}"> <!-- RIGHT CURLY BRACKET -->
<!ENTITY micro "µ"> <!-- MICRO SIGN -->
<!ENTITY ohm "Ω"> <!-- OHM SIGN -->
<!ENTITY deg "°"> <!-- DEGREE SIGN -->
<!ENTITY ordm "º"> <!-- MASCULINE ORDINAL INDICATOR -->
<!ENTITY ordf "ª"> <!-- FEMININE ORDINAL INDICATOR -->
<!ENTITY sect "§"> <!-- SECTION SIGN -->
<!ENTITY para "¶"> <!-- PILCROW SIGN -->
<!ENTITY middot "·"> <!-- MIDDLE DOT -->
<!ENTITY larr "←"> <!-- LEFTWARDS DOUBLE ARROW -->
<!ENTITY rarr "→"> <!-- RIGHTWARDS DOUBLE ARROW -->
<!ENTITY uarr "↑"> <!-- UPWARDS ARROW -->
<!ENTITY darr "↓"> <!-- DOWNWARDS ARROW -->
<!ENTITY copy "©"> <!-- COPYRIGHT SIGN -->
<!ENTITY reg "®"> <!-- REG TRADE MARK SIGN -->
<!ENTITY trade "™"> <!-- TRADE MARK SIGN -->
<!ENTITY brvbar "¦"> <!-- BROKEN BAR -->
<!ENTITY not "¬"> <!-- NOT SIGN -->
<!ENTITY sung "♩"> <!-- -->
<!ENTITY excl "!"> <!-- EXCLAMATION MARK -->
<!ENTITY iexcl "¡"> <!-- INVERTED EXCLAMATION MARK -->
<!ENTITY quot """> <!-- QUOTATION MARK -->
<!ENTITY apos "'"> <!-- APOSTROPHE -->
<!ENTITY lpar "("> <!-- LEFT PARENTHESIS -->
<!ENTITY rpar ")"> <!-- RIGHT PARENTHESIS -->
<!ENTITY comma ","> <!-- COMMA -->
<!ENTITY lowbar "_"> <!-- LOW LINE -->
<!ENTITY hyphen "-"> <!-- HYPHEN-MINUS -->
<!ENTITY period "."> <!-- FULL STOP -->
<!ENTITY sol "/"> <!-- SOLIDUS -->
<!ENTITY colon ":"> <!-- COLON -->
<!ENTITY semi ";"> <!-- SEMICOLON -->
<!ENTITY quest "?"> <!-- QUESTION MARK -->
<!ENTITY iquest "¿"> <!-- INVERTED QUESTION MARK -->
<!ENTITY laquo "«"> <!-- LEFT-POINTING DOUBLE ANGLE QUOTATION MARK -->
<!ENTITY raquo "»"> <!-- RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK -->
<!ENTITY lsquo "‘"> <!-- -->
<!ENTITY rsquo "’"> <!-- RIGHT SINGLE QUOTATION MARK -->
<!ENTITY ldquo "“"> <!-- -->
<!ENTITY rdquo "”"> <!-- RIGHT DOUBLE QUOTATION MARK -->
<!ENTITY nbsp " "> <!-- NO-BREAK SPACE -->
<!ENTITY shy "­"> <!-- SOFT HYPHEN --> |
Note to self: we need to use the conduit-xml option psDecodeEntities -- possibly adding a new function to T.P.XML.Light. |
From what I understand after reading a bit more on the topic: DocBook 5 only defines the standard XML entities (hence my original statement); DocBook 4 however contains all of these: https://www.w3.org/2003/entities/2007doc/byalpha.html |
Thank you mucho mucho! I confirm the problem to be fixed for |
The DocBook reader does not seem to understand some ampersand codes, like
$
:Input file is: https://github.com/haskell/happy/blob/934763408f8df29180c63d7a2c69be0b84030967/doc/happy.xml
Adding option
-s
does not help.MWE is:
Pandoc version?
What version of pandoc are you using, on what OS?
The text was updated successfully, but these errors were encountered: