feat: normalize emoji characters for consistency #37

christianbundy · 2019-06-13T16:53:21Z

this means that both :coffee: and ☕ will render the same way! Previously :coffee: was an image and ☕ wasn't, which caused some funky bugs.

cinnamon-bun · 2019-06-13T17:59:33Z

Summary - replace literal emojis with :emojicodes: when converting markdown to HTML, and then the codes all get rendered as images?

Pro:

Will render the same on all platforms, probably solves some Linux emoji font problems
All emojis look the same now
Matches the emojis in our emoji autocompletion widget

Possible Cons:

Copying text now gets you :codes: instead of literal emojis. Not a dealbreaker.
Possible impact on screen readers? I tested OSX's text to speech on Patchwork and it works on literal and image emojis.

Big Worry:

node-emoji doesn't support skintone or gender modifiers (#57) and it fails to detect emojis with modifiers (#59). It might erase any skintone-modified emojis

We don't have skintone modifiers in our emojicode completion widget so those can only be entered by people typing them as literal emojis. I usually type emojis as literals on OSX (with ctrl-cmd-space) and people on phones probably do also.

I'm not set up to build and test this code right now - can someone see what happens with compound emojis? Here's one to test with which is a sequence of [man, skintone-modifier, microphone]: Dark Skinned Male Singer 👨🏾‍🎤

On recent platforms it renders as a single character with rockstar hair. It's ok enough if it renders as 2 emojis [dark-skinned-man, microphone].

christianbundy · 2019-06-13T18:13:35Z

Summary - replace literal emojis with :emojicodes: when converting markdown to HTML, and then the codes all get rendered as images?

Sorry, I could've been more specific in my description. This replaces literal emojis with :emojicodes: on input, that way they get handled the same way that emojicodes do. This module has hooks so that the client can decide what to do with these emoji codes -- currently Patchwork and Patchbay are converting to images, but I'm working on an improvement to switch them to use Unicode characters (yay, accessibility!).

It might erase any skintone-modified emojis

This is a huge concern. thanks for pointing this out! I was able to test and I can confirm that it does not erase compound emoji, it just leaves them alone as-is.

Before

Independent emoji are rendered by the system font
Compound emoji are rendered by the system font
Emojicodes are rendered by the client

After

Compound emoji are rendered by the system font
Independent emoji and emojicodes are rendered by the client

christianbundy · 2019-06-13T21:12:19Z

Just pushed a commit that now supports compound emoji!

Input

Dark Skinned Male Singer 👨🏾‍🎤

ssbMd.block(input, { emoji: e => `<span class="Emoji">${e}</span>` })

Output

<p>Dark Skinned Male Singer <span class="Emoji">👨🏾‍🎤</span></p>

Dark Skinned Male Singer 👨🏾‍🎤

🎉 🎉 🎉

cinnamon-bun · 2019-06-14T02:03:03Z

🎉 Awesome, yay! Which OS did you test on? I'm worried that this might break on one of the OSs for some reason but I don't understand the layers very well.

Besides that, 👍

Powersource · 2019-06-14T10:09:38Z

I'm confused by this. I'm admittedly pretty tired today. Is "This replaces literal emojis with :emojicodes: on input, that way they get handled the same way that emojicodes do." anything more than an implementation detail? And I don't like that they're all converted to images (instead of emoji+noto) but that's what the patchwork PR is "fixing", if I understand correctly?

christianbundy · 2019-06-14T14:40:45Z

Which OS did you test on?

I tested on Linux, but I since the tests are happening on string literals I don't think those should change between operating systems. It's possible that there's some operating system that will generate a funky byte sequence that we don't read as emoji, but worse case scenario we should fall back to the system emoji font.

[...] anything more than an implementation detail?

I think that sentence had info about both. Doing a transform on input is an implementation detail, but unicode characters being handled the same way as shortcodes is a behavior change.

And I don't like that they're all converted to images (instead of emoji+noto) but that's what the patchwork PR is "fixing", if I understand correctly?

Yep! This PR gives the client the option to handle unicode characters as emoji, which they could use to reference an image (boo) or they could just make sure the emoji font is available for those characters (woo).

cinnamon-bun · 2019-06-14T18:29:43Z

I won't block this change ❤️ but I'm realizing don't understand it yet.

Is this is the data flow for post content? Where is this new conversion-to-shortcodes happening?

It seems safer to me to normalize only during display and not during input (e.g. on edges e and d, not a b or c).

If we're changing the input (affecting the data that gets into the SSB message) it seems safer to use unicode chars as the canonical format because

The shortcode libraries generally lag behind the unicode emoji standard
The unicode emoji standard is more well-defined than shortcode names, which might not match across different clients

But I just showed up here with no context and you've been thinking about this for a while! Don't let me stop you 🌱

digraph G {
  rankdir=LR;
  Emoji_widget -> Input_box_memory [label=a]
  User_typing -> Input_box_memory [label=b]
  Input_box_memory -> SSB_message [label=c]
  SSB_message -> Post_display [label=d]
  Input_box_memory -> Input_box_display [label=e]
}

christianbundy · 2019-06-14T19:25:10Z

Thanks for being so thoughtful about this change!

This library only renders Markdown posts as HTML, it doesn't change anything on text input.When I mentioned "on input" above I only meant "on input into this library", although I totally understand how that sounds like "on input into an SSB composer". Using your graph, I think this library only handles c and d, where it just displays Markdown as HTML in the client.

I agree that it would be best to normalize as actual emoji characters in the composer before publishing, but I don't think there's a way to do that in this module.

cinnamon-bun · 2019-06-14T21:00:22Z

Aha, I was confused! Sorry for derailing. I've read the code now.

So it's about to render Markdown to HTML. First it converts unicode emoji to shortcodes...

const normalized = emoji.unemojify('' + text || '')

Renders to HTML...

let result = md.render(normalized)

Then it looks for emojis using emojiRegex, and runs the custom emoji handler function (from opts) on them. But I think emojiRegex only looks for unicode emoji chars, and we converted all of those to shortcodes already, so does this part ever run?

if (config.emoji) {
    const regex = emojiRegex()
    result = result.replace(regex, x => config.emoji(emoji.which(x) || x))
}

It would be nice to document in README what kind of input the emoji handler function is taking, unicode or shortcodes, and to explain that if you don't provide an emoji handler then the output will have everything converted to shortcodes. (Is that a change to previous behavior when there's no emoji handler?)

I'm afraid I'm being a drag on the process here, please ignore me if you want ❤️

@cinnamon-bun

This wasn't actually being used like I thought it was, huge thanks to @cinnamon-bun for taking the time to point this out! Yay, fewer moving parts. Warning: super similar code in lib/block and lib/inline that should probably be refactored in the future. Not similar enough that it's easy to refactor right now. (Read: energy dwindling at the expense of best practices.)

christianbundy · 2019-06-14T23:35:41Z

You're right! That part wasn't being used at all, I've just removed it and pushed a commit with comments.

There was previously an emoji handler that worked the same way, but it would be really nice to have more of that info in the readme. Previously config.emoji() took a shortcode name like "coffee" as input, but for compound emoji without shortcode names this just passes the emoji itself. Maybe this should be pushed as a major version just in case?

cinnamon-bun · 2019-06-15T01:08:24Z

Major version bump sounds like a good idea. There are a lot of things depending on this package

christianbundy · 2019-06-15T02:27:44Z

Cool! I'll merge and publish as a major version. Thanks for all of your feedback on this PR! I know it takes loads of time to discuss code and I really appreciate how much time you've spent thinking about this. ❤️

@cinnamon-bun

Previously when you selected a suggestion it would add the shortcode to the composer, which was *fine* but means that we all have to agree on shortcodes. This changes it so that the emoji suggestions put the actual emoji into the composer. Note that this doesn't stop someone from manually typing in `:ghost:` or something, but maybe in the future we could replace those shortcodes with actual emoji so that we're never publishing shortcodes to our feeds. Originally brought up by @cinnamon-bun here: > If we're changing the input (affecting the data that gets into the SSB > message) it seems safer to use unicode chars as the canonical format because > > 1. The shortcode libraries generally lag behind the unicode emoji standard > 2. The unicode emoji standard is more well-defined than shortcode names, > which might not match across different clients > > -- ssbc/ssb-markdown#37 (comment)

feat: normalize emoji characters for consistency

3bb5811

christianbundy requested review from Powersource and cinnamon-bun June 13, 2019 16:53

christianbundy self-assigned this Jun 13, 2019

christianbundy added 2 commits June 13, 2019 10:53

fix: package-lock.json wasn't updating

b8992a1

refactor: use nodeEmoji.unemojify()

dd57f55

christianbundy mentioned this pull request Jun 13, 2019

feat: improve emoji support ssbc/patchwork#1094

Merged

feat: support compound emoji and fix tests

6d68c1c

cinnamon-bun approved these changes Jun 14, 2019

View reviewed changes

christianbundy merged commit 56b2bb6 into master Jun 15, 2019

christianbundy mentioned this pull request Jun 16, 2019

refactor: encode emoji suggestion as character, not shortcode ssbc/patchwork#1105

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: normalize emoji characters for consistency #37

feat: normalize emoji characters for consistency #37

christianbundy commented Jun 13, 2019 •

edited

Loading

cinnamon-bun commented Jun 13, 2019 •

edited

Loading

christianbundy commented Jun 13, 2019

christianbundy commented Jun 13, 2019 •

edited

Loading

cinnamon-bun commented Jun 14, 2019

Powersource commented Jun 14, 2019

christianbundy commented Jun 14, 2019

cinnamon-bun commented Jun 14, 2019

christianbundy commented Jun 14, 2019

cinnamon-bun commented Jun 14, 2019

christianbundy commented Jun 14, 2019

cinnamon-bun commented Jun 15, 2019

christianbundy commented Jun 15, 2019

feat: normalize emoji characters for consistency #37

feat: normalize emoji characters for consistency #37

Conversation

christianbundy commented Jun 13, 2019 • edited Loading

cinnamon-bun commented Jun 13, 2019 • edited Loading

christianbundy commented Jun 13, 2019

Before

After

christianbundy commented Jun 13, 2019 • edited Loading

Input

Output

cinnamon-bun commented Jun 14, 2019

Powersource commented Jun 14, 2019

christianbundy commented Jun 14, 2019

cinnamon-bun commented Jun 14, 2019

christianbundy commented Jun 14, 2019

cinnamon-bun commented Jun 14, 2019

christianbundy commented Jun 14, 2019

cinnamon-bun commented Jun 15, 2019

christianbundy commented Jun 15, 2019

christianbundy commented Jun 13, 2019 •

edited

Loading

cinnamon-bun commented Jun 13, 2019 •

edited

Loading

christianbundy commented Jun 13, 2019 •

edited

Loading