Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modern Composition #426

Open
wismill opened this issue Jan 4, 2024 · 10 comments
Open

Modern Composition #426

wismill opened this issue Jan 4, 2024 · 10 comments
Labels
compose Indicates a need for improvements or additions to Compose handling discussion: backward compatibility enhancement Indicates new feature requests

Comments

@wismill
Copy link
Member

wismill commented Jan 4, 2024

Modern Composition

NOTE: This document is a draft.

Introduction

The current power of Compose sequences is great but looks limited compared to macOS.

macOS uses a state machine, which is quite powerful. In fact, the current implementation of Compose in xkbcommon also uses a state machine internally, but we do not use its full power.

I propose we change that and create a new format in order to:

  • Avoid repetition of sequences sharing same prefix. It may speed up parsing.
  • Simplify the sequences: implicit keysyms.
  • Allow to set a custom “feedback” string while composing.
  • Allow to have recursive sequences.
  • Allow to output and then continue composing. This feature could be interpreted as a custom locked layer.
  • Allow to define a “terminator” to control the behavior when there is no matching sequence:
    • Control whether to output something or not.
    • Control whether to continue composing or not.
    • Do the previous depending on predicates on the input (see “filter”).

Proposed changes

  • Start using versions for Compose file format. The current (legacy) format has implicit version 1.
  • For the newer format, an explicit version number is required.
  • Use an additional new environment variable XKB_COMPOSE_FILE to detect what Compose file to load. This way we can keep compatibility with X11 and its XCOMPOSEFILE variable. XKB_COMPOSE_FILE has precedence over XCOMPOSEFILE.
  • Some features of the new format may no be supported by apps using xkbcommon. Thus we should guard them with flags.
  • Refactor the Compose table to handle the new format.

New Compose file format

The new Compose file format is based on a restricted set of features of YAML 1.2.

Documented example:

# First document is reserved for configuration
compose version: 2  # mandatory format version. Legacy files have implicit version: 1.
--- # Start a new YAML document
# States are identified by a name. TODO: recommendations for standard dead keys
acute:
  # Optional corresponding keysyms. If none: custom state
  keysym: dead_acute
  # If set, the following string is displayed while composing
  feedback: "´"
  # State transitions
  transitions:
    # Implicit entry of one character.
    # Equivalent to legacy: <dead_acute> <a>: "á" aacute
    # Equivalent to new: {char: á, keysym: aacute, next: __none__}
    a: "á"
    # Implicit entry of multiple characters.
    # Equivalent to legacy: <dead_acute> <q>: "q́"
    # Equivalent to new: {string: "q́", keysym: __none__, next: __none__}
    q: ""
    # Explicit entry of one character without keysym.
    # Equivalent to legacy: <dead_acute> <e>: "é" eacute
    # Equivalent to new: {char: é, keysym: eacute, next: __none__}
    e: {char: "é"}
    # Explicit entry of one character with keysym.
    # Equivalent to legacy: <dead_acute> <i>: "í" iacute
    # Equivalent to new: {char: í, keysym: iacute, next: __none__}
    i: {char: "í", keysym: iacute}
    # Explicit entry of multiple characters.
    # Equivalent to legacy: <dead_acute> <x>: "x́"
    # Equivalent to new: {string: "x́", keysym: __none__, next: __none__}
    x: {string: "x́"}
    # Chained dead key
    # Equivalent to legacy:
    #   <dead_acute> <dead_macron> <e>: U1E17 "ḗ"
    #   <dead_acute> <dead_macron> <o>: U1E53 "ṓ"
    # Equivalent to new: {char: __none__, next: macron_and_acute}
    dead_macron: {next: macron_and_acute}
    # Sequences (avoid creating explicit intermediate states, e.g. “double_acute”)
    # Equivalent to legacy: <dead_acute> <dead_acute> <o>: "ő" odoubleacute
    dead_acute o: "ő" # U+0151 LATIN SMALL LETTER O WITH DOUBLE ACUTE
    # Equivalent to legacy: <dead_acute> <dead_acute> <u>: "ű" udoubleacute
    dead_acute u: "ű" # U+0171 LATIN SMALL LETTER U WITH DOUBLE ACUTE
    # Loop. Equivalent to: {next: acute}
    # No legacy equivalent
    dead_acute dead_acute: {next: __loop__}
    # TODO: how to handle overlaps?
    dead_acute dead_acute o: 🦧
    # Wildcard (aka “terminator”): match any input.
    # Here we match any input, then discard it and stop.
    # This is the default behaviour (no need to set it) and
    # correspond to the legacy behaviour.
    _: {next: __none__}
macron_and_acute:
  # NOTE: custom state (no associated keysym)
  feedback: "\u02DD" # U+02DD DOUBLE ACUTE ACCENT
  transitions:
    e: "" # U+1E17 LATIN SMALL LETTER E WITH MACRON AND ACUTE
    o: "" # U+1E53 LATIN SMALL LETTER O WITH MACRON AND ACUTE
    # Wildcard: match any input, discard it, output "\u02DD" and stop
    _: {char: "\u02DD"}
compose:
  keysym: Multi_key
  transitions:
    # Some classical XCompose sequences
    period period: ""
    period minus: "·"
    period equal: ""
    f o r a l l: "" # U+2200 FOR ALL
    # Chained dead key (level 1)
    m: {next: math}
math:
  keysym: 0x11000000 # custom keysym
  transitions:
    # Chained dead keys (level 2)
    i: {next: math-italic}
    b: {next: math-bold}
    s: {next: math-double-struck}
    # Wildcard: match any input, output it unchanged, then stop
    _: {keysym: __input__}
math-italic:
  transitions:
    a: {char: "𝑎", next: __loop__}
    i: {char: "𝑖", next: __loop__}
    # Wildcard: match any input, output it unchanged, then loop
    _: {keysym: __input__, next: __loop__}
math-bold:
  transitions:
    a: "𝐚"
    i: "𝐢"
    # Wildcard with built-in filters
    _:
      # Discard but keep looping
      - {filter: __letter__, next: __loop__}
      # Output unchanged and loop
      - {filter: __number__, keysym: __input__, next: __loop__}
      - {filter: __punctuation__, keysym: __input__, next: __loop__}
      # Output unchanged and stop
      - {keysym: __input__}
math-double-struck:
  feedback: 𝔸
  transitions:
    e: {char: 𝕖}
    E: {char: 𝔼, keysym: U1D53C}
--- # Start a new YAML document
# Include locale Compose
!include "%L"
--- # Start a new YAML document
# Include custom Compose file
!include "%H/path/to/other-compose-file"

Partially converted en_US.UTF8/Compose:

compose version: 2
---
acute:
  keysym: dead_acute
  feedback: "´"
  transitions:
    space: "'"
    dead_acute: "´"
    A: Á
    E: É
    I: Í
    J: # LATIN CAPITAL LETTER J plus COMBINING ACUTE
    O: Ó
    #
    dead_diaeresis: {next: diaeresis_and_acute}
    Multi_key quotedbl: {next: diaeresis_and_acute}
    Udiaeresis: Ǘ # LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE
    #
    dead_abovering: {next: abovering_and_acute}
    Multi_key o: {next: abovering_and_acute}
    Aring: "Ǻ" # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE
    #
diaeresis_and_acute:
  transitions:
    space: "΅" # GREEK DIALYTIKA TONOS
    U: Ǘ # LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE
    #
abovering_and_acute:
  transition:
    A: "Ǻ" # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE
    #
cedilla:
  keysym: 
  transitions:
    space: "¸"
    c: ç
    C: Ç
    #
compose:
  keysym: Multi_key
  transitions:
    apostrophe: {next: acute}
    comma: {next: cedilla}
    # TODO: Check how to handle thee following (overlapping with previous, because unrelated)
    #       Maybe use: `comma: {filter: __letter__, next: cedilla}` ?
    comma apostrophe: "" # SINGLE LOW-9 QUOTATION MARK
    comma quotedbl: "" # DOUBLE LOW-9 QUOTATION MARK
    comma minus: "¬" # NOT SIGN
    #
#

X11 data

We could reuse the new format for compose files templates in the libX11 repository:

  • We convert all the legacy files into the new format.
  • We write a script to translate the new format to the legacy format.
  • We could add additional fields for development; they are then filtered to output the new format.
@wismill wismill added enhancement Indicates new feature requests compose Indicates a need for improvements or additions to Compose handling labels Jan 4, 2024
@whot
Copy link
Contributor

whot commented Jan 9, 2024

A fairly high-level question: at what point does compose become an input method? And would a more complex compose be better left to an input method implementation (e.g. ibus)?

AFAICT XKB compose sequences pretty much pre-date input methods because back in the early 1990s there was little consideration of CJK languages (and others that need IM). But modern desktops enable IM by default, e.g. I always get annoyed when I have a new GNOME session and my shortcuts produce emojis instead.

So, without going into technical details I would probably argue that putting compose into IM implementations might be a more scalable approach?

@wismill
Copy link
Member Author

wismill commented Jan 9, 2024

@whot

A fairly high-level question: at what point does compose become an input method?

It is indeed an input method (see definition on Wikipedia). Maybe the oldest one?

And would a more complex compose be better left to an input method implementation (e.g. ibus)?

In fact it is already implemented as an input method in Gtk and Qt. But while Qt uses xkbcommon implementation underneath, Gtk decided to go its own way in ibus. I am not sure why they took this decision and if this is a definitive one. Could it be that compose support in ibus predates compose support in xkbcommon?

e.g. ibus

I really dislike ibus. I use the Plasma desktop and it does not integrate well. It is really Gnome-focused. Not mentioning that it is not efficient (cpu, memory). But I admit it does a few things better: support for overlapping Compose sequences (see #398 to implement this in xkbcommon and in Qt) and support of Ctrl+U for Unicode code points input.

So, without going into technical details I would probably argue that putting compose into IM implementations might be a more scalable approach?

I wish we could have a reference implementation of Compose machinery for all input methods frameworks. I think my proposal is not disruptive (apart the new text format): Compose feature is by essence a state machine; I would like to lift some of its current limitations.

I see the following next steps:

  • Clarify the differences of the implementation of Compose in ibus and xkbcommon.
  • Ask if ibus devs would be interested in a shared implementation with the current features.
  • Ask if ibus devs would be interested in a shared implementation with the new features, maybe independent of the current respective implementations.
  • Is https://github.com/xkbcommon a good place to develop this hypothetical new library?

@whot
Copy link
Contributor

whot commented Jan 15, 2024

Gtk decided to go its own way in ibus [...] Could it be that compose support in ibus predates compose support in xkbcommon?

yep, GTK compose handling pre-dates libxkbcommon by... quite a number of years :)

Is https://github.com/xkbcommon a good place to develop this hypothetical new library?

AIUI libxkbcommon is on github because at the time it was the only git forge (freedesktop was still on bugzilla + ssh-git). xkbcommon is also severely lacking developer time, so it may be better hosting this "closer" to the users to take advantage of the user set (or even freedesktop gitlab). But otherwise I don't see a reason why not to host this in this namespace.

@wismill
Copy link
Member Author

wismill commented Jan 15, 2024

@whot Has anyone proposed a unified implementation (parsing, state handling)? I could not find evidence after a quick check. Any advice how to start the discussion with Gtk devs?

@whot
Copy link
Contributor

whot commented Jan 15, 2024

Has anyone proposed a unified implementation (parsing, state handling)?

Note that I know of but let's see if @ebassi is listening (and can answer the second question) :)

@ebassi
Copy link

ebassi commented Jan 15, 2024

Input methods are an area of computing and UX where people have Strong Opinions™, especially when it comes to workflow issues; for instance, you'll often hear something to the effect of "I really dislike [project]", for one reason or another.

[Ibus] is really Gnome-focused.

It's actually the other way around: about 12 years ago, GNOME picked a single input method framework for a variety of reasons, and then designed the whole thing around it, instead of just letting people choose and avoiding to commit to a specific UX. Of course, it's not without strife: ibus has its own shortcomings, mainly at the intersection of deeply entrenched workflows (see above, re: xkcd) and UI design.

Has anyone proposed a unified implementation (parsing, state handling)?

Not that I know of.

GTK has its own XCompose file parser, because it can handle compose sequences internally and people wanted to keep their custom files from 30 years ago working even without ibus; internally it's implemented as part of the "simple" input method object, which is used to handle things like Unicode and dead keys when ibus is not available, or on non-Unix platforms, like Windows and macOS.

@ebassi
Copy link

ebassi commented Jan 15, 2024

As a side note: GTK isn't going to drop parsing the existing XCompose files, but we're not going to add a new compose file format, especially one using YAML. If a new format is defined, support for it will have to be implemented inside ibus or inside a separate, out of tree input method module for GTK.

@wismill
Copy link
Member Author

wismill commented Jan 15, 2024

@ebassi thank you, this is insightful!

you'll often hear something to the effect of "I really dislike [project]", for one reason or another.

Sorry about that. It was not constructive.

GTK has its own XCompose file parser, because it can handle compose sequences internally and people wanted to keep their custom files from 30 years ago working even without ibus; internally it's implemented as part of the "simple" input method object, which is used to handle things like Unicode and dead keys when ibus is not available, or on non-Unix platforms, like Windows and macOS.

So the conclusion is probably: “if it is not broken, do not fix it”. Fair enough, let’s keep multiple implementations.

but we're not going to add a new compose file format, especially one using YAML.

What is the issue with YAML in this case? I would not mind to use another format. Have you another format in mind?

If a new format is defined, support for it will have to be implemented inside ibus or inside a separate, out of tree input method module for GTK.

So the only way to enhance Compose seems to develop a dedicated library for the new format, then build an engine for both Qt IM and iBus?

@ebassi
Copy link

ebassi commented Jan 15, 2024

What is the issue with YAML in this case? I would not mind to use another format. Have you another format in mind?

The issue with YAML is that we don't have a parser for it, and adding libyaml as a dependency to GTK is not going to happen.

For out of tree input methods we don't have the same restrictions, of course.

So the only way to enhance Compose seems to develop a dedicated library for the new format, then build an engine for both Qt IM and iBus?

That would be my recommendation.

@wismill
Copy link
Member Author

wismill commented Jan 20, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compose Indicates a need for improvements or additions to Compose handling discussion: backward compatibility enhancement Indicates new feature requests
Projects
None yet
Development

No branches or pull requests

3 participants