Skip to content
Stephen Brennan edited this page Apr 13, 2024 · 7 revisions

Performance

JVM startup

Every invocation of signal-cli ... starts a Java Virtual Machine instance. This can take anywhere from a second to a long time, depending on the system.

To avoid it, signal-cli can be started as a daemon, with subsequent commands sent to it through DBus. See using dbus-send.

There is a also an experimental native library build using GraalVM that avoids JVM startup. See GraalVM section in README.

System entropy

Some signal-cli's operations require that enough system entropy / randomness is available.

Available entropy can be monitored with

watch cat /proc/sys/kernel/random/entropy_avail

If it is hitting zero during signal-cli ... execution, signal-cli will block until enough entropy is available.

System entropy is more likely to get depleted on headless servers. There are programs that allow to increase the amount of available entropy, e.g. haveged.

See also

#351 with links to past discussions.

Error parsing arguments

The signal-cli program has a set of options, and each subcommand may have its own options. The correct order is:

signal-cli [SIGNAL-CLI_OPTIONS] [SUBCOMMAND] [SUBCOMMAND_OPTIONS]

So, for example:

$ signal-cli receive --output=json
signal-cli: error: unrecognized arguments: '--output=json'

The --output=... option belongs to the main program, and needs to be in front of the receive subcommand:

$ signal-cli --output=json receive 

This is Argparse4j's behavior.

https://github.com/AsamK/signal-cli/issues/1504

DBus errors when starting daemon

See Troubleshooting DBus.

String Indexing Units

String indexing is required to properly interpret text formatting as well as mentions. Both involve specifying a substring using start and length. The units for string indexing are UTF-16 code units, not Unicode code points! This comes from the Signal protocol and the behavior of Android/Java.

Each Unicode character whose code point is within the Basic Multilingual Plane (that is, whose code point is less than 0x10000) is represented by one UTF-16 code unit. Characters with code point greater or equal to 0x10000 (such as Emoji) are represented by two UTF-16 code units. To illustrate, consider the string 0πŸ’©1πŸ’©2πŸ’©3. This string consists of 7 Unicode code points, but 10 UTF-16 code units (because the emoji U+1F4A9 is beyond the BMP, so each instance counts for two code units). So the following are substrings:

  • start: 0, length: 3 - 0πŸ’©
  • start: 1, length: 3 - πŸ’©1
  • start: 2, length: 3 - invalid

For users of programming languages which index strings by Unicode code points (e.g. Python), you will need to carefully convert indices. For example, this Python function properly converts UTF-16 string indices to Unicode indices:

>>> def utf16_to_unicode(string: str, utf16_index: int) -> int:
...     for unicode_index, c in enumerate(string):
...         if utf16_index <= 0:
...             break
...         utf16_index -= 2 if ord(c) >= 0x10000 else 1
...     if utf16_index < 0:
...         raise IndexError("UTF-16 index breaks surrogate pair")
...     elif utf16_index > 0:
...         raise IndexError("UTF-16 index past end of string")
...     else:
...         return unicode_index
>>> utf16_to_unicode("0πŸ’©1πŸ’©2πŸ’©3", 0)
0
>>> utf16_to_unicode("0πŸ’©1πŸ’©2πŸ’©3", 9)
6
>>> utf16_to_unicode("0πŸ’©1πŸ’©2πŸ’©3", 1)
1
>>> utf16_to_unicode("0πŸ’©1πŸ’©2πŸ’©3", 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 7, in utf16_to_unicode
IndexError: UTF-16 index breaks surrogate pair

>>> utf16_to_unicode("0πŸ’©1πŸ’©2πŸ’©3", 3)
2