how to deal with emoji's like 👨🏽‍👩🏽‍👧🏽 #256

lsmith77 · 2023-05-11T07:17:17Z

Python's unicode support with str in principle means that any multibyte character is a single character. but it appears that such an emoji breaks this rule, which is complicating dealing with such emoji's.

in my case I want to detect such an emoji and propose alternatives with other gender/skin tone compositions

k = 0
for i in [*"👨🏽‍👩🏽‍👧🏽"]:
    print(str(k) + ": " + i)
    k += 1

results in

0: 👨
1: 🏽
2: ‍
3: 👩
4: 🏽
5: ‍
6: 👧
7: 🏽

The text was updated successfully, but these errors were encountered:

cvzi · 2023-05-11T07:33:21Z

These kind of emoji are displayed together because of \u200d in Unicode, called Zero-width-joiner. At position 2 and 5 there is an invisible \u200d.

If you want to find emoji in a string as a single element, the regex library might be helpful with the \X-"grapheme" pattern:

k = 0
for i in regex.findall("\X", "test👨🏽‍👩🏽‍👧🏽string"):
    print(str(k) + ": " + i)
    k += 1

0: t
1: e
2: s
3: t
4: 👨🏽‍👩🏽‍👧🏽
5: s
6: t
7: r
8: i
9: n
10: g

Finding them with this library is limited, the default skin color works, but others are not supported yet, see #204

cvzi mentioned this issue Jun 6, 2023

Add support for Multi-person skintones #259

Merged

TahirJalilov closed this as completed in #259 Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to deal with emoji's like 👨🏽‍👩🏽‍👧🏽 #256

how to deal with emoji's like 👨🏽‍👩🏽‍👧🏽 #256

lsmith77 commented May 11, 2023

cvzi commented May 11, 2023 •

edited

Loading

how to deal with emoji's like 👨🏽‍👩🏽‍👧🏽 #256

how to deal with emoji's like 👨🏽‍👩🏽‍👧🏽 #256

Comments

lsmith77 commented May 11, 2023

cvzi commented May 11, 2023 • edited Loading

cvzi commented May 11, 2023 •

edited

Loading