-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support to disable East Asian font hints in docx output #9910
Comments
Do you mean disable them globally or in a fine-grained way (e.g. don't put a font hint inside this specially marked span) ? |
Disable East Asian font hints globally would be fine, just like the previous version (like Pandoc 3.2). |
I'm confused, because you requested this feature in the first place, but when I implemented it you immediately asked for a way to disable it. Is it actually a useful feature? |
I understand your confusion. This is indeed an annoying case, especially for non-CJK users.
Specifying the language manually would be feasible, but it is hard to do so for bibliographies. |
Would it make sense to add the font hints only when the specified language (e.g., metadata |
Sorry, I don't think this is a good idea as So, I think the current implementation of adding East Asian font hints is good and no need to change. Perhaps I could write a Lua filter to remove them when writing English articles if necessary. |
I tried to write a Lua filter as follows: function traverse(elem)
if elem.t == "RawBlock" or elem.t == "RawInline" then
if elem.format == "openxml" then
elem.text = elem.text:gsub('<w:rFonts w:hint="eastAsia" />', '')
end
end
return elem
end
return {
{ RawBlock = traverse },
{ RawInline = traverse }
} But it didn't work. Could you please help to diagnose it or give some guidance? |
A lua filter can't remove these because they are added in the writer. Lua filters only affect the AST (which is the input to the writer). |
Thanks for your guidance. Are there any alternative ways? |
Nothing will work but postprocessing the docx. (It wouldn't be that hard to find and remove the offending elements from the context in the container.) Again, I'm open to providing this flexibility in pandoc, but I need to figure out what the best way to do it would be. |
You needn't set the document-wide lang. We could have the feature be sensitive to a lang on a div, for example. So you could put Chinese content inside
and the Word writer could be trained to add the font hints inside that context (unless overridden by an interior span or div with lang=en). |
Thanks. I believe the step for post-processing the docx is feasible. Regarding the language attribute, I think there is no need to change the current implementation as the East Asian Languages should always be enclosed with |
Quick suggestion for post-processing: Using a binary custom Lua writer, i.e., a custom writer that defines a |
The difficulty is determining whether quotation marks surrounding a Chinese phrase should themselves be considered East Asian or not. As you've noted, that depends on the context. Hence my suggestion to make this sensitive to language tagging. |
Sounds a promising method. But I cannot fully understand it. Could you please provide more details? For example, I'd like to remove |
All issues come from that simplified Chinese and English use the same quotation mark (Traditional Chinese does not). I think Pandoc does't need to try to handle this tricky issue further. A Japanese designer has submitted a proposal to add standardized variation sequences for four quotation marks. I hope it can be adopted as soon as possible:
|
Sure, here we go: --- file: docx-no-eahints.lua
-- Copyright: © 2024 Albert Krewinkel
-- License: MIT
local mediabag = require 'pandoc.mediabag'
local path = require 'pandoc.path'
local zip = require 'pandoc.zip'
function ByteStringWriter(doc, opts)
local docx = pandoc.write(mediabag.fill(doc), 'docx', opts)
local archive = zip.Archive(docx)
for i, entry in ipairs(archive.entries) do
if path.filename(entry.path) == 'document.xml' then
local pattern = '<w:rPr><w:rFonts w:hint="eastAsia" /></w:rPr>'
local newcontent = entry:contents():gsub(pattern, '')
archive.entries[i] = zip.Entry(entry.path, newcontent)
end
end
return archive:bytestring()
end Use with pandoc --to=docx-no-eahints.lua -o my-outfile.docx … It's not really well-tested, but should work. Or, at the very least, should give a better idea of what I meant, and how this could work. |
Thanks @tarleb, it works. But I encounter an issue that the page size was changed from A4 to US Letter after applying the Lua filter. The original XML tags in <w:sectPr w:rsidR="00D3414C">
<w:pgSz w:h="16840" w:w="11900" />
<w:pgMar w:bottom="1440" w:footer="720" w:gutter="0" w:header="720" w:left="1440" w:right="1440" w:top="1440" />
<w:cols w:space="720" />
<w:docGrid w:linePitch="360" />
</w:sectPr> BTW, is it possible to use this Lua filter with Quarto? |
Using
I don't know, sorry. |
I've uploaded a folder with files for testing: lua-custom-writer-test.zip With the same source input file pandoc test.md -o test.docx --reference-doc custom.docx generated pandoc test.md -o test.docx --reference-doc custom.docx -t docx-no-eahints.lua would generate <w:sectPr w:rsidR="005F2E0E" w:rsidSect="002F2276">
<w:pgSz w:h="16840" w:w="11900" />
<w:pgMar w:bottom="1440" w:footer="720" w:gutter="0" w:header="720" w:left="1440" w:right="1440" w:top="1440" />
<w:cols w:space="720" />
<w:docGrid w:linePitch="326" />
</w:sectPr> This behavior seems weird and I have no idea what's the problem, could you please help to diagnose the issue @tarleb |
Weird. I currently don't have time to debug this, but it would be nice to get to the bottom of this. Does the reference doc get applied at all? |
Never mind, it's not urgent. The reference doc was applied in both conversions. You can see them in the folder above. |
@tarleb Can you kindly help to debug the page size issue above? |
New issue from #9817.
In my filed, we tend to cite the Chinese sources in articles but they are relatively small in the entire document. So the English journals expect the typesetting to be in line with English instead of Chinese, particularly the quotation mark. In this context, could Pandoc provide an option to disable East Asian font hints?
The text was updated successfully, but these errors were encountered: