Skip to content

Bug: Variable escaping corrupts four-byte unicode characters, e.g. emojis. 🐓 #171

@Taitava

Description

@Taitava

Discussed in #170 (reply in thread)

I happened to find out something interesting in shell command preview:

kuva

The shell command does not do anything useful, it just echoes the file path plus the emoji. But the preview text is interesting: The emoji that I manually types into the end of the command, is displayed correctly in the gray preview text. Then again, the emoji that comes from the {{file_path:relative}} variable, is corrupted. I suspect the problem is somewhere in the variable parsing logic, something there does not support this kind of special characters.

Now that I've inspected more, I've found out this regex is not so kind to unicode characters that are encoded with more than two bytes:

return this.raw_value.replace(/[^\w\d]/g, (special_character: string) => { // /g means to replace all occurrences instead of just the first one.

The regex splits e.g. 🐓 to two characters and escapes them with two backquotes ` (PowerShell) or two backslashes \ (Bash/Dash/Zsh).
So, 🐓 becomes: `�`� (PowerShell) or \�\� (Bash/Dash/Zsh).
The correct result would be: `🐓 (PowerShell) or \🐓 (Bash/Dash/Zsh).

The problem can be fixed by adding a unicode flag to the regex pattern:

- return this.raw_value.replace(/[^\w\d]/g, (special_character: string) => {  // /g means to replace all occurrences instead of just the first one.
+ return this.raw_value.replace(/[^\w\d]/gu, (special_character: string) => {  // /g means to replace all occurrences instead of just the first one. /u means to handle four-byte unicode characters correctly as one character, not as two separate characters.

This bug was born in version 0.7.0 when implementing #11 . So, unescaped variable values (the {{! exclamation mark variable }} syntax ) are not affected by this bug.


I'll add the unicode flag to all regex patterns in the whole plugin. I'll compile a list of all the changed regex patterns here.

Commit ffcedc0 fixes the original bug.

Commit b496091 adds the /u modifier to the following other regex patterns:

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions