Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: normalize emoji characters for consistency #37

Merged
merged 5 commits into from
Jun 15, 2019

Conversation

christianbundy
Copy link
Contributor

@christianbundy christianbundy commented Jun 13, 2019

this means that both :coffee: and will render the same way! Previously :coffee: was an image and wasn't, which caused some funky bugs.

@cinnamon-bun
Copy link

cinnamon-bun commented Jun 13, 2019

Summary - replace literal emojis with :emojicodes: when converting markdown to HTML, and then the codes all get rendered as images?

Pro:

  • Will render the same on all platforms, probably solves some Linux emoji font problems
  • All emojis look the same now
  • Matches the emojis in our emoji autocompletion widget

Possible Cons:

  • Copying text now gets you :codes: instead of literal emojis. Not a dealbreaker.
  • Possible impact on screen readers? I tested OSX's text to speech on Patchwork and it works on literal and image emojis.

Big Worry:

  • node-emoji doesn't support skintone or gender modifiers (#57) and it fails to detect emojis with modifiers (#59). It might erase any skintone-modified emojis

We don't have skintone modifiers in our emojicode completion widget so those can only be entered by people typing them as literal emojis. I usually type emojis as literals on OSX (with ctrl-cmd-space) and people on phones probably do also.

I'm not set up to build and test this code right now - can someone see what happens with compound emojis? Here's one to test with which is a sequence of [man, skintone-modifier, microphone]: Dark Skinned Male Singer 👨🏾‍🎤

On recent platforms it renders as a single character with rockstar hair. It's ok enough if it renders as 2 emojis [dark-skinned-man, microphone].

@christianbundy
Copy link
Contributor Author

Summary - replace literal emojis with :emojicodes: when converting markdown to HTML, and then the codes all get rendered as images?

Sorry, I could've been more specific in my description. This replaces literal emojis with :emojicodes: on input, that way they get handled the same way that emojicodes do. This module has hooks so that the client can decide what to do with these emoji codes -- currently Patchwork and Patchbay are converting to images, but I'm working on an improvement to switch them to use Unicode characters (yay, accessibility!).

It might erase any skintone-modified emojis

This is a huge concern. thanks for pointing this out! I was able to test and I can confirm that it does not erase compound emoji, it just leaves them alone as-is.

Before

  • Independent emoji are rendered by the system font
  • Compound emoji are rendered by the system font
  • Emojicodes are rendered by the client

After

  • Compound emoji are rendered by the system font
  • Independent emoji and emojicodes are rendered by the client

@christianbundy
Copy link
Contributor Author

christianbundy commented Jun 13, 2019

Just pushed a commit that now supports compound emoji!

Input

Dark Skinned Male Singer 👨🏾‍🎤
ssbMd.block(input, { emoji: e => `<span class="Emoji">${e}</span>` })

Output

<p>Dark Skinned Male Singer <span class="Emoji">👨🏾‍🎤</span></p>

Dark Skinned Male Singer 👨🏾‍🎤

🎉 🎉 🎉

@cinnamon-bun
Copy link

🎉 Awesome, yay! Which OS did you test on? I'm worried that this might break on one of the OSs for some reason but I don't understand the layers very well.

Besides that, 👍

@Powersource
Copy link
Contributor

I'm confused by this. I'm admittedly pretty tired today. Is "This replaces literal emojis with :emojicodes: on input, that way they get handled the same way that emojicodes do." anything more than an implementation detail? And I don't like that they're all converted to images (instead of emoji+noto) but that's what the patchwork PR is "fixing", if I understand correctly?

@christianbundy
Copy link
Contributor Author

Which OS did you test on?

I tested on Linux, but I since the tests are happening on string literals I don't think those should change between operating systems. It's possible that there's some operating system that will generate a funky byte sequence that we don't read as emoji, but worse case scenario we should fall back to the system emoji font.

[...] anything more than an implementation detail?

I think that sentence had info about both. Doing a transform on input is an implementation detail, but unicode characters being handled the same way as shortcodes is a behavior change.

And I don't like that they're all converted to images (instead of emoji+noto) but that's what the patchwork PR is "fixing", if I understand correctly?

Yep! This PR gives the client the option to handle unicode characters as emoji, which they could use to reference an image (boo) or they could just make sure the emoji font is available for those characters (woo).

@cinnamon-bun
Copy link

I won't block this change ❤️ but I'm realizing don't understand it yet.

Is this is the data flow for post content? Where is this new conversion-to-shortcodes happening?
Screen Shot 2019-06-14 at 11 16 21 AM

It seems safer to me to normalize only during display and not during input (e.g. on edges e and d, not a b or c).

If we're changing the input (affecting the data that gets into the SSB message) it seems safer to use unicode chars as the canonical format because

  1. The shortcode libraries generally lag behind the unicode emoji standard
  2. The unicode emoji standard is more well-defined than shortcode names, which might not match across different clients

But I just showed up here with no context and you've been thinking about this for a while! Don't let me stop you 🌱

digraph G {
  rankdir=LR;
  Emoji_widget -> Input_box_memory [label=a]
  User_typing -> Input_box_memory [label=b]
  Input_box_memory -> SSB_message [label=c]
  SSB_message -> Post_display [label=d]
  Input_box_memory -> Input_box_display [label=e]
}

@christianbundy
Copy link
Contributor Author

Thanks for being so thoughtful about this change!

This library only renders Markdown posts as HTML, it doesn't change anything on text input.When I mentioned "on input" above I only meant "on input into this library", although I totally understand how that sounds like "on input into an SSB composer". Using your graph, I think this library only handles c and d, where it just displays Markdown as HTML in the client.

I agree that it would be best to normalize as actual emoji characters in the composer before publishing, but I don't think there's a way to do that in this module.

@cinnamon-bun
Copy link

Aha, I was confused! Sorry for derailing. I've read the code now.

So it's about to render Markdown to HTML. First it converts unicode emoji to shortcodes...

const normalized = emoji.unemojify('' + text || '')

Renders to HTML...

let result = md.render(normalized)

Then it looks for emojis using emojiRegex, and runs the custom emoji handler function (from opts) on them. But I think emojiRegex only looks for unicode emoji chars, and we converted all of those to shortcodes already, so does this part ever run?

if (config.emoji) {
    const regex = emojiRegex()
    result = result.replace(regex, x => config.emoji(emoji.which(x) || x))
}

It would be nice to document in README what kind of input the emoji handler function is taking, unicode or shortcodes, and to explain that if you don't provide an emoji handler then the output will have everything converted to shortcodes. (Is that a change to previous behavior when there's no emoji handler?)

I'm afraid I'm being a drag on the process here, please ignore me if you want ❤️

This wasn't actually being used like I thought it was, huge thanks to
@cinnamon-bun for taking the time to point this out! Yay, fewer moving
parts.

Warning: super similar code in lib/block and lib/inline that should
probably be refactored in the future. Not similar enough that it's easy
to refactor right now. (Read: energy dwindling at the expense of best
practices.)
@christianbundy
Copy link
Contributor Author

You're right! That part wasn't being used at all, I've just removed it and pushed a commit with comments.

There was previously an emoji handler that worked the same way, but it would be really nice to have more of that info in the readme. Previously config.emoji() took a shortcode name like "coffee" as input, but for compound emoji without shortcode names this just passes the emoji itself. Maybe this should be pushed as a major version just in case?

@cinnamon-bun
Copy link

Major version bump sounds like a good idea. There are a lot of things depending on this package

@christianbundy
Copy link
Contributor Author

Cool! I'll merge and publish as a major version. Thanks for all of your feedback on this PR! I know it takes loads of time to discuss code and I really appreciate how much time you've spent thinking about this. ❤️

@christianbundy christianbundy merged commit 56b2bb6 into master Jun 15, 2019
christianbundy added a commit to ssbc/patchwork that referenced this pull request Jun 16, 2019
Previously when you selected a suggestion it would add the shortcode to
the composer, which was *fine* but means that we all have to agree on
shortcodes. This changes it so that the emoji suggestions put the actual
emoji into the composer.

Note that this doesn't stop someone from manually typing in `:ghost:` or
something, but maybe in the future we could replace those shortcodes
with actual emoji so that we're never publishing shortcodes to our
feeds.

Originally brought up by @cinnamon-bun here:

> If we're changing the input (affecting the data that gets into the SSB
> message) it seems safer to use unicode chars as the canonical format because
>
> 1. The shortcode libraries generally lag behind the unicode emoji standard
> 2. The unicode emoji standard is more well-defined than shortcode names,
>    which might not match across different clients
>
> -- ssbc/ssb-markdown#37 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants