-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openxml raw inline/block without parsing? #6933
Comments
The problem is that the writer is building up a data structure that represents an XML document tree. We can't just insert a string into this -- only a node of an XML document -- an element, or some plain text.
To the contrary, the docx writer does take into account width information on table cells, emitting a |
Okay, I thought it might be something like that so no real way around that constraint. I'll do some additional experimentation with the table columns. One issue may be that while the columns are sized correctly if |
Just recalled that another issue I was trying to solve for in emitting raw openxml was a table which has distinct column sizing per-row (for more sophisticated figure layouts). With the new table AST this will be possible but until that's supported in the docx writer the best solution might be to use native pandoc column sizing and create a separate table per row of figures. Again, will report back after further experimentation. |
It turns out that using a combination of columns widths and underlying figure widths you can indeed do arbitrary horizontal layout w/ in-cell alignment. The key is to distribute the columns widths evenly to add up to 1.0 (so that the full horizontal width of the page is occupied) and then to explicitly size the figures using physical units (inches). Word ends up resizing the columns but this is done so w/r/t to the content widths so it comes out the way you want. Here's an example of what a so composed figure panel looks like rendered in docx: A couple of related requests (LMK if I should open separate issues for these):
Unfortunately the actual caption text had to be inserted using It's not a huge problem to lose the markdown in the caption as it's probably somewhat rare (although for some disciplines it seems like math would be a frequent requirement). |
For (1) above, inserting an empty "openxml" RawBlock between the tables seems to be enough to prevent the automatic insertion of an empty paragraph. |
See #4315 and commit 93e3d46 for the motivation for this. Your workaround seems good. I don't currently see a good way around the other issue. In principle we could add Lua support for rendering -- I think there may be an issue for this already -- but we don't even expose an openxml writer currently. |
My initial workaround ended up with the combined tables noted in #4315. Here is where I landed (a zero-height text frame): <w:p>
<w:pPr>
<w:framePr w:w="0" w:h="0" w:vAnchor="margin" w:hAnchor="margin" w:xAlign="right" w:yAlign="top"/>
</w:pPr>
</w:p> |
I had a look at the Docx writer and believe that the initial request, i.e., adding raw openxml blocks without parsing, would be feasible with moderate effort. The writer currently uses lists of Elements as building blocks. I believe that it wouldn't be too hard to generalize this and use Content instead. This would then allow to pack the raw blocks into raw CData elements. Is this worthwhile, and should the issue be reopened? |
In my view it's extremely powerful to be able to intermix raw markup with pandoc tokens. This is used to great effect in pandoc-crossref figure layout, where arbitrary raw tex composes a structure (a subfigure grid) but then allows Pandoc to render the actual figures. The alternative if this weren't possible would be to emit the figures using additional raw markup, but then they are essentially lost from the AST for downstream processing by other filters (and we lose whatever other desirable native behaviors pandoc has). This also becomes relevant for captions, as you really want to allow markup in captions (again, emitting the caption entirely using raw markup requires In LaTeX or HTML it's straightforward enough to emit raw markup for figures, so if are you willing to accept the tradeoffs of erasing the figure from the AST and not supporting markup in the caption you can at least do it. For docx though, emitting figures is more complex (they need to be properly embedded in the zip file) so we really need pandoc to do this processing from the AST. You can imagine other scenarios where emitting raw openxml would be desirable: for example, in a PowerPoint writer you might want to emit multiple "frames" of content on a slide. If we could emit partial XML structures then this would be possible for a filter (it could emit the begin and end frame xml literally, and let pandoc fill in the middle with standard markup processing). |
Reopened. I may be able to give it a try later this week.
Lua filters have access to pandoc's "mediabag", so I believe it might be (well, become) possible to handle that in a filter. |
If you do give it a try then LMK and I'll test immediately with our use case.
That's true, but the Pandoc code required to emit docx images is from what I can see quite a bit more involved than for LaTeX or HTML: pandoc/src/Text/Pandoc/Writers/Docx.hs Line 1374 in 5bbd5a9
So it's a huge bonus to have pandoc write the image directly from an |
Right, I had misunderstood what you meant. I wanted to get a first draft done while this was still fresh in my mind, so here we go: #6941. I may rewrite some details and didn't add tests yet, but it should work as desired. |
Thanks again, this is a really terrific advance for creating sophisticated docx/pptx output! |
Most welcome. This doesn't work with pptx output yet, but a similar change should be possible to enable it. I'm a bit short on time next week, but I could take another look later this month. |
Okay, LMK if you do take a run at pptx and I will put it through it's paces. |
I've noticed that when creating a
RawInline
orRawBlock
of type "openxml" that the XML is actually parsed into a valid xml fragment (closing tags as necessary). Looks like that is happening here:pandoc/src/Text/Pandoc/Writers/Docx.hs
Lines 974 to 978 in 5bbd5a9
I'm wondering if this is a hard requirement or if an option to avoid this could be provided? The use case is constructing more elaborate structures (e.g. tables) where we embed standard markdown tokens inside raw structures. This is for example done in the pandoc-crossref filter to provide LaTeX figure/subfigure layout (where the raw tokens provide the figure/subfigure LaTeX structure and then the markdown tokens are used to actually render the figures.
I was hoping to create a similar feature that enabled grid based figure layouts for docx, but am stuck on this constraint as e.g this token:
Ends up in the document like this:
The issue w/ just using a
pandoc.Table
is that it doesn't appear as if you can get precise percentage-based layout for markdown tables emitted to docx, nor can you (currently) set cell based alignment. So I don't think you can do something like this: http://lierdakil.github.io/pandoc-crossref/#subfigure-gridI may be missing something here but just wanted to record the constraints I'm seeing and hoping there is a way around them either w/ an existing behavior or perhaps a new one.
The text was updated successfully, but these errors were encountered: