-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
html to fb2 omits <h2> elements #8123
Comments
@astanin - can you see what is happening here? |
@jgm I'm not using this feature anymore but I suppose that the problem is how Wikipedia HTML is parsed. FB2 does not have an equivalent of the
FB2 Writer apparently assumes that a section is represented by a From what I can see from the
So it appears that As a workaround I would suggest to click on the page Edit link in Wikipedia, copy the mediawiki markup to a file, and try to convert that file instead of the rendered HTML. As a more permanent solution, the assumption about how a section is represented may have to be revised. |
@astanin I don't think that's the heart of it. The FB2 writer has never expected the AST to be structured into sections. It starts by applying a function |
It seems that this happens whenever the content is wrapped in a div and doesn't start with a header. Example: ::: wrapper
hello
# MISSING
section one
::: Output of <?xml version="1.0" encoding="utf-8"?>
<FictionBook xmlns="http://www.gribuser.ru/xml/fictionbook/2.0"
xmlns:l="http://www.w3.org/1999/xlink">
<description>
<title-info>
<genre>unrecognised</genre>
</title-info>
<document-info>
<program-used>pandoc</program-used>
</document-info>
</description>
<body>
<title>
<p />
</title>
<section>
<p>hello</p>
<p>section one</p>
</section>
</body>
</FictionBook> |
I think what's going on is that Here's the result of putting a trace on the block structure produced by makeSections in the FB2 writer: [ Div
( "" , [ "section" ] , [] )
[ Header 1 ( "" , [] , [] ) []
, Div
( "" , [ "wrapper" ] , [] )
[ Para [ Str "hello" ]
, Div
( "missing" , [ "section" ] , [] )
[ Header 1 ( "" , [] , [] ) [ Str "MISSING" ]
, Para [ Str "section" , Space , Str "one" ]
]
]
]
] |
This allows the writer to recurse into those Divs and find new sections inside them. See #8123.
Pushed a potential fix, but I don't know enough about FB2 to know if this is right. <?xml version="1.0" encoding="UTF-8"?>
<FictionBook xmlns="http://www.gribuser.ru/xml/fictionbook/2.0" xmlns:l="http://www.w3.org/1999/xlink"><description><title-info><genre>unrecognised</genre></title-info><document-info><program-used>pandoc</program-used></document-info></description><body><title><p /></title><section><p>hello</p><section id="missing"><title><p>MISSING</p></title><p>section one</p></section></section></body></FictionBook> |
Is FB2 okay with a section element without a title element? |
The FB2 schema that I found says |
Closing this, then. |
Explain the problem.
Try some Wikipedia article:
and view the fb2 file: The subheaders (e.g. "Rock-hewn castles") are missing.
Pandoc version?
2.17.1.1 on Manjaro (Arch) Linux
Side notes:
Another problem: Image captions are missing but since they don't follow semantics standards and are plain
<div>
text elements, this is Wikipedia to be blamed.Finally, it would be nice if
<img>
width
andheight
attributes were respected, or maybe via css somehow?Locally, I solved 1. and 2. via regex hacking and 3. with imagemagick cmd line tools before converting.
The text was updated successfully, but these errors were encountered: