Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec violation TrueType without Encoding entry #71

Open
sftse opened this issue Oct 16, 2023 · 0 comments
Open

Spec violation TrueType without Encoding entry #71

sftse opened this issue Oct 16, 2023 · 0 comments

Comments

@sftse
Copy link

sftse commented Oct 16, 2023

This codepath in PdfSimpleFont::new() is not standard compliant

            None => {
                if let Some(type1_encoding) = type1_encoding {
                    let mut table = Vec::from(PDFDocEncoding);
                    dlog!("type1encoding");
                    for (code, name) in type1_encoding {
                        let unicode = glyphnames::name_to_unicode(&pdf_to_utf8(&name));
                        if let Some(unicode) = unicode {
                            table[code as usize] = unicode;
                        } else {
                            dlog!("unknown character {}", pdf_to_utf8(&name));
                        }
                    }
                    encoding_table = Some(table)
                } else if subtype == "TrueType" {
                    encoding_table = Some(encodings::WIN_ANSI_ENCODING.iter()
                        .map(|x| if let &Some(x) = x { glyphnames::name_to_unicode(x).unwrap() } else { 0 })
                        .collect());
                }
            }

p.267 PDF standard
"When the font has no Encoding entry, or the font descriptor’s Symbolic flag is set (in which case the Encoding
entry is ignored), this shall occur:
• If the font contains a (3, 0) subtable, the range of character codes shall be one of these: 0x0000 - 0x00FF,
0xF000 - 0xF0FF, 0xF100 - 0xF1FF, or 0xF200 - 0xF2FF. Depending on the range of codes, each byte
from the string shall be prepended with the high byte of the range, to form a two-byte character, which shall
be used to select the associated glyph description from the subtable.
• Otherwise, if the font contains a (1, 0) subtable, single bytes from the string shall be used to look up the
associated glyph descriptions from the subtable.
If a character cannot be mapped in any of the ways described previously, a conforming reader may supply a
mapping of its choosing."

On all documents I've tested, the encoding_table is never used when the font is TrueType without an encoding because the unicode_map is present, so supplying WIN_ANSI_ENCODING as a fallback makes no difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant