PDF language #1165

merenyia · 2023-10-10T11:42:33Z

merenyia
Oct 10, 2023

Hi,
I would like to propose expanding the metadata options with a '--language' parameter to set the document's default language.
This addition would be very useful for specifying the reading language of the resulting document, and it aligns with the guidelines at https://www.w3.org/WAI/WCAG22/Techniques/pdf/PDF16.
Thank you for considering this suggestion.

jbarlow83 · 2023-10-13T07:41:22Z

jbarlow83
Oct 13, 2023
Maintainer

I will implement this shortly - it will set the default language to the language used for OCR, unless the field was already set to something else. I think if someone is concerned about distinguishing the OCR language and document language, I imagine they need code to manage other things too.

Or is there a common use case for making this distinction?

0 replies

merenyia · 2023-10-13T14:28:09Z

merenyia
Oct 13, 2023
Author

Hi James,
That's great news.

Regarding the suggested solution, I think it's better not to mismatch the OCR language and the document language. I propose that we make it clear that the OCR language parameter (-l) is for defining the character set(s) for recognition, while the document language is for accessibility purposes.

On the other hand, I can imagine languages that contain identical character sets but with different pronunciations. This means I can use the same OCR language, but I need different document languages.
Thanks again.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF language #1165

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

PDF language #1165

merenyia Oct 10, 2023

Replies: 2 comments

jbarlow83 Oct 13, 2023 Maintainer

merenyia Oct 13, 2023 Author

merenyia
Oct 10, 2023

jbarlow83
Oct 13, 2023
Maintainer

merenyia
Oct 13, 2023
Author