Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

luaotfload.aux.slot_of_name not working with Harfbuzz #268

Open
niruvt opened this issue Nov 2, 2023 · 7 comments
Open

luaotfload.aux.slot_of_name not working with Harfbuzz #268

niruvt opened this issue Nov 2, 2023 · 7 comments

Comments

@niruvt
Copy link

niruvt commented Nov 2, 2023

I am using this code for calling and printing a non-standard glyph which is composed of three Unicode characters, i.e., ("092D), ("094D) and ("0930) respectively from the Shobhika font. These three together result in one single shape. The code used and the result obtained are as follows:

\documentclass[border=1cm]{standalone}
\usepackage{fontspec}
\setmainfont{Shobhika}

\begin{document}
\directlua{%
  tex.sprint(
    "\\char"..
    luaotfload.aux.slot_of_name(
      font.current(),[[BhaRa.dv]]
    )
  )%
}
\end{document}

1

This doesn't work if I add Renderer=Harfbuzz parameter to the \setmainfont command. Is this a Harfbuzz problem or a luaotfload one?

@zauguin
Copy link
Member

zauguin commented Nov 2, 2023

slot_of_name is working, but it doesn't work with \char when using HarfBuzz mode since it produces slot numbers which are outside of \chars range. We could consider adding a helper to simplify the "insert a font glyph directly by name", but why do you even need that here? Why aren't you just inputting the glyph directly as a combination of it's components?

\documentclass[border=1cm]{standalone}
\usepackage{fontspec}
\setmainfont[Renderer=HarfBuzz, Script=Devanagari]{Shobhika}

\begin{document}
भ्र
\end{document}

@niruvt
Copy link
Author

niruvt commented Nov 3, 2023

but why do you even need that here?

The glyph I chose here was absolutely random and yes, I know it can be typed directly as input, but there are some shapes which cannot be obtained merely by passing Unicode-input. E.g.:

\documentclass[border=1cm]{standalone}
\usepackage{fontspec}
\setmainfont{Shobhika}

\begin{document}
\directlua{%
  tex.sprint(
    "\\char"..
    luaotfload.aux.slot_of_name(
      font.current(),[[ardhaKaTa.dv]]
    )
  )%
}
\end{document}

image

Why I actually need this is because currently I am testing a WIP font in which I need to type all the possible glyphs and many of them aren't getting printed merely by inputting Unicode.

We could consider adding a helper to simplify the "insert a font glyph directly by name"

If that could be done, I would really appreciate it.

Since we are on it, I would like to mention a further issue that I faced with the XeLaTeX-parallel of this. I have tried the same with luaotfload and as of now I am successful with it, but I still want to mention it and request you that if for inserting glyph-shapes directly you are developing something, then it would be great if this XeLaTeX-issue is not seen in it. I don't know what is happening internally, so I will try to explain in layperson's language. Kindly bare with it.

When I insert a glyph directly from the font with XeLaTeX, it is not understood as a normal input string. Devanagari, being a complex script, places some vowel diacritics before the consonants and some after them. Let's look at the following example to see what I mean:

\documentclass[border=1cm,varwidth]{standalone}
\usepackage{fontspec}
\setmainfont[Script=Devanagari]{Shobhika}
\NewDocumentCommand{ \printwithxe }{ m }{%
  \XeTeXglyph\XeTeXglyphindex"#1"%
}

\begin{document}
भ\quad
\printwithxe{Bha.dv}% Works

भिं\quad
\printwithxe{Bha.dv}िं % Doesn't work
\end{document}

image

I am not able to make this work in any which way. I tried adding space after \XeTeXglyph\XeTeXglyphindex"Bha.dv" and failed, I tried adding \relax and that too didn't work. I don't know what I am missing.

This doesn't happen with luaotfload:

\documentclass[border=1cm,varwidth]{standalone}
\usepackage{fontspec}
\setmainfont[Script=Devanagari,Renderer=Harfbuzz]{Shobhika}
\NewDocumentCommand{ \printwithlua }{ m }{%
  \directlua{%
    tex.sprint(
      "\\char"..
      luaotfload.aux.slot_of_name(
        font.current(),[[#1]]
      )
    )%
  }%
}

\begin{document}
भ
\printwithlua{Bha.dv}% Works

भिं
\printwithlua{Bha.dv}िं% Works
\end{document}

image

This is one single Unicode character, i.e., 092D, but I hope, after adding Harfbuzz support, the requested mechanism works for complex glyph-shapes like, say, NgaKaSsaYa.dv too.

Thanks for the prompt response.

@zauguin
Copy link
Member

zauguin commented Nov 3, 2023

Since we are on it, I would like to mention a further issue that I faced with the XeLaTeX-parallel of this. I have tried the same with luaotfload and as of now I am successful with it, but I still want to mention it and request you that if for inserting glyph-shapes directly you are developing something, then it would be great if this XeLaTeX-issue is not seen in it.

That's a direct consequence of HarfBuzz' interface and therefore also happens with luaotfload's harf mode. Basically the input we are providing to HarfBuzz is always a Unicode string and as far as I am aware it can't include direct glyphs. So what happens if you provide a glyph directly is that we are basically inserting it as a foreign object which appears in the middle of the string which does not get passed to the shaper. In some cases we could circumvent that by detecting glyphs which can be input as Unicode sequences, but that would be potentially slow and would only work for characters which already can be input directly.

@niruvt
Copy link
Author

niruvt commented Nov 3, 2023

That's a direct consequence of HarfBuzz' interface and therefore also happens with luaotfload's harf mode. Basically the input we are providing to HarfBuzz is always a Unicode string and as far as I am aware it can't include direct glyphs. So what happens if you provide a glyph directly is that we are basically inserting it as a foreign object which appears in the middle of the string which does not get passed to the shaper. In some cases we could circumvent that by detecting glyphs which can be input as Unicode sequences, but that would be potentially slow and would only work for characters which already can be input directly.

Oh! Okay. So if I understand correctly, the glyphs which don't have Unicode numbers can not be printed when Harfbuzz is active and that's a design problem of Harfbuzz, unrelated to luaotfload, right? Do you think reporting this to Harfbuzz is a good idea?

In some cases we could circumvent that by detecting glyphs which can be input as Unicode sequences.

If there is a reader who can read the script then this would be unnecessary, but yes, could potentially benefit font-developers who aren't able to read the script on which they are working on.

@khaledhosny
Copy link
Contributor

Basically the input we are providing to HarfBuzz is always a Unicode string and as far as I am aware it can't include direct glyphs.

You can fake this by inserting a code point that is outside of valid Unicode range (that is already what we feed LuaTeX with harf mode), and implement HarfBuzz’s get_nominal_glyph font callback and use it to map these invalid code points to glyph indices. HarfBuzz would then proceed as normal.

This, however, still wouldn’t behave the same as Unicode input since any shaping behavior that depends on character properties will not work, unless you would also provide Unicode functions callbacks (I haven’t tried that one).

@zauguin
Copy link
Member

zauguin commented Nov 4, 2023

Basically the input we are providing to HarfBuzz is always a Unicode string and as far as I am aware it can't include direct glyphs.

You can fake this by inserting a code point that is outside of valid Unicode range (that is already what we feed LuaTeX with harf mode), and implement HarfBuzz’s get_nominal_glyph font callback and use it to map these invalid code points to glyph indices. HarfBuzz would then proceed as normal.

This, however, still wouldn’t behave the same as Unicode input since any shaping behavior that depends on character properties will not work, unless you would also provide Unicode functions callbacks (I haven’t tried that one).

I experimented with that a bit earlier, but as far as I could tell at least for the indic shaper it doesn't seem to help much because I couldn't find a way to set the indic properties through the Unicode callbacks (it might be that I missed something though). For other scripts we could do that but it seems hard to get reasonable character data for isolated glyphs, especially since the documentation indicates that the Unicode data usually should be consistent across buffers, so it might not be feasible to set that in a font specific way. At least for Latin scripts it seems to work though since the properties don't matter much anyway.

@khaledhosny The HarfBuzz documentation mostly claims that the values passed in in input buffers should be Unicode codepoints but my experiments also suggested that other values seem to work. Do you know if we should expect any issues when passing arbitrary numbers there (assuming that corresponding data is provided)?

@khaledhosny
Copy link
Contributor

khaledhosny commented Nov 4, 2023

Basically the input we are providing to HarfBuzz is always a Unicode string and as far as I am aware it can't include direct glyphs.

You can fake this by inserting a code point that is outside of valid Unicode range (that is already what we feed LuaTeX with harf mode), and implement HarfBuzz’s get_nominal_glyph font callback and use it to map these invalid code points to glyph indices. HarfBuzz would then proceed as normal.
This, however, still wouldn’t behave the same as Unicode input since any shaping behavior that depends on character properties will not work, unless you would also provide Unicode functions callbacks (I haven’t tried that one).

I experimented with that a bit earlier, but as far as I could tell at least for the indic shaper it doesn't seem to help much because I couldn't find a way to set the indic properties through the Unicode callbacks (it might be that I missed something though). For other scripts we could do that but it seems hard to get reasonable character data for isolated glyphs, especially since the documentation indicates that the Unicode data usually should be consistent across buffers, so it might not be feasible to set that in a font specific way. At least for Latin scripts it seems to work though since the properties don't matter much anyway.

Yes, of course. Many Unicode properties are hard-coded in HarfBuzz and can’t be provided by the Unicode callbacks, so this will work only for the most simple cases.

@khaledhosny The HarfBuzz documentation mostly claims that the values passed in in input buffers should be Unicode codepoints but my experiments also suggested that other values seem to work. Do you know if we should expect any issues when passing arbitrary numbers there (assuming that corresponding data is provided)?

HarfBuzz accepts the full range of 32 bit integers for input, and that font functions callback hack is already in use by some applications/libraries, so I don’t see HarfBuzz intentionally breaking it.

As long as you eventually return a valid glyph id, it should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants