Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/make-emojis-larger-in-other-message-contexts #15611

1 change: 1 addition & 0 deletions src/CONST.js
Original file line number Diff line number Diff line change
Expand Up @@ -806,6 +806,7 @@ const CONST = {

// eslint-disable-next-line max-len, no-misleading-character-class
EMOJIS: /[\p{Extended_Pictographic}\u200d\u{1f1e6}-\u{1f1ff}\u{1f3fb}-\u{1f3ff}\u{e0020}-\u{e007f}\u20E3\uFE0F]|[#*0-9]\uFE0F?\u20E3/gu,
EMOJI_SURROGATE: /(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])/,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what this regex represents? I was not able to figure it out

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the algorithm, we were expected to capture emojis in a text, though while doing this we need to be careful when we capture emojis that are surrogate pairs.
Surrogate pair Emojis are emojis that can be made up of multiple values that are overlayed on each other to get the final displayed representation.
An example of this is the cloud face emoji made up of 2 different emoji (face emoji and cloud emoji) combined together.
If we split the cloud face emoji we would end up with both separate emojis, so from our algorithm, we need to identify these sorts of surrogate pairs and have them combined back together to maintain the original emoji

Screenshot 2023-03-03 at 6 58 45 PM

Read more on surrogate pair characters

Copy link
Contributor

@roryabraham roryabraham Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for that explanation.

Maybe what we should do then is:

  1. Update the emoji regex to capture chunks of 1 or more emojis
  2. For each chunk containing emojis, use the emoji surrogate regex to capture and "squash" surrogate pairs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should include some comments to explain what this regex is about

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roryabraham

  1. Update the emoji regex to capture chunks of 1 or more emojis
    • How do I do this from your implementation?

Copy link
Author

@josemak25 josemak25 Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the d flag works for the web but not on android and ios platforms as they both crash

Have logs for this crash? Could this be because the d flag is not implemented in the Hermes JS engine, or something else?

Edit: Yep, it seems that the hasIndices flag is not implemented in Hermes: https://github.com/facebook/hermes/blob/417190297242f14fed540378390d28926c2a7f7e/include/hermes/Regex/RegexTypes.h#L213-L343. Shoot ... did not see that coming.

Edit 2: Created facebook/hermes#932, but we need to decide if we should HOLD this PR on that or not

@roryabraham @parasharrajat

the d flag works for the web but not on android and ios platforms as they both crash

Have logs for this crash? Could this be because the d flag is not implemented in the Hermes JS engine, or something else?

Edit: Yep, it seems that the hasIndices flag is not implemented in Hermes: https://github.com/facebook/hermes/blob/417190297242f14fed540378390d28926c2a7f7e/include/hermes/Regex/RegexTypes.h#L213-L343. Shoot ... did not see that coming.

Edit 2: Created facebook/hermes#932, but we need to decide if we should HOLD this PR on that or not

@roryabraham @parasharrajat this would prolong this task by extra few days should we try to fix this regex issue.
I would suggest we go with the former implementation which worked perfectly avoiding the need for this regex d fix also if it helps I would put a comment on it explaining every step of the algorithm and also remove all the other functions that aren't needed.

/**
 * Get all the emojis in the message
 * @param {String} text
 * @returns {Array}
 */
function getAllEmojiFromText(text) {
    // return an empty array when no text is passed
    if (!text) {
        return [];
    }

    // Unicode Character 'ZERO WIDTH JOINER' (U+200D) is usually used to join surrogate pair together without breaking the emoji
    const zeroWidthJoiner = '\u200D'; // https://codepoints.net/U+200D?lang=en
    const splittedMessage = text.split('');
    const result = [];

    let wordHolder = ''; // word counter
    let emojiHolder = ''; // emoji counter

    const setResult = (word, isEmoji = false) => {
        // for some weird reason javascript sees the empty string `"` as a word with `length = 1`
        // this is caused after splitting the text empty spaces are added to both the start and the end of all emojis and text
        // given the empty space is close to a text then its length is counted as 0
        // while if it's before or after an emoji then it's counted as 1, so we remove the word where word.length equals 1
        // NOTE: this does not affect a single character element example typing `[i | J]` cause after splitting its empty word.length is calculated as 0
        if (!isEmoji && word.length === 1) {
            return;
        }

        result.push({text: word, isEmoji});
    };

    _.forEach(splittedMessage, (word, index) => {
        if (CONST.REGEX.EMOJI_SURROGATE.test(word) || word === zeroWidthJoiner) {
            setResult(wordHolder);
            wordHolder = '';
            emojiHolder += word;
        } else {
            setResult(emojiHolder, true);
            emojiHolder = '';
            wordHolder += word;
        }

        if (index === splittedMessage.length - 1) {
            setResult(emojiHolder, true);
            setResult(wordHolder);
        }
    });

    // remove none text characters like '' only return where text is a word or white space ' '
    return _.filter(result, res => res.text);
}


/**
 * Validates that this message contains has emojis
 *
 * @param {String} message
 * @returns {Boolean}
 */
function hasEmojis(message) {
    const splitText = getAllEmojiFromText(message);
    return _.find(splitText, chunk => chunk.isEmoji) !== undefined;
}

/**
 * Validates that this message contains only emojis
 *
 * @param {String} message
 * @returns {Boolean}
 */
function containsOnlyEmojis(message) {
    const splitText = getAllEmojiFromText(message);
    return _.every(splitText, chunk => chunk.isEmoji);
}

TAX_ID: /^\d{9}$/,
NON_NUMERIC: /\D/g,
EMOJI_NAME: /:[\w+-]+:/g,
Expand Down
3 changes: 2 additions & 1 deletion src/components/MenuItem.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ import SelectCircle from './SelectCircle';
import colors from '../styles/colors';
import variables from '../styles/variables';
import MultipleAvatars from './MultipleAvatars';
import TextEmoji from './TextEmoji';

const propTypes = {
...menuItemPropTypes,
Expand Down Expand Up @@ -134,7 +135,7 @@ const MenuItem = (props) => {
style={titleTextStyle}
numberOfLines={1}
>
{props.title}
<TextEmoji style={[styles.emojiMessageText, styles.profileEmojiText]}>{props.title}</TextEmoji>
</Text>
)}
{Boolean(props.description) && !props.shouldShowDescriptionOnTop && (
Expand Down
52 changes: 52 additions & 0 deletions src/components/TextEmoji.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import React from 'react';
import PropTypes from 'prop-types';
import {View} from 'react-native';
import _ from 'underscore';
import Text from './Text';
import * as EmojiUtils from '../libs/EmojiUtils';
import * as StyleUtils from '../styles/StyleUtils';
import stylePropTypes from '../styles/stylePropTypes';

const propTypes = {
/** The message text to render */
children: PropTypes.string.isRequired,

/** The message text additional style */
style: stylePropTypes,

/** The emoji text additional style */
emojiContainerStyle: stylePropTypes,

/** The plain text additional style */
plainTextContainerStyle: stylePropTypes,
};

const defaultProps = {
style: [],
};

const TextEmoji = (props) => {
const words = EmojiUtils.getAllEmojiFromText(props.children);
const propsStyle = StyleUtils.parseStyleAsArray(props.style);

return _.map(words, ({text, isEmoji}, index) => (isEmoji
? (
<View key={`${text}_${index}`} style={props.emojiContainerStyle}>
<Text style={propsStyle}>
{text}
</Text>
</View>
) : (
<View key={`${text}_${index}`} style={props.plainTextContainerStyle}>
<Text>
{text}
</Text>
</View>
)));
};

TextEmoji.displayName = 'TextEmoji';
TextEmoji.defaultProps = defaultProps;
TextEmoji.propTypes = propTypes;

export default TextEmoji;
244 changes: 190 additions & 54 deletions src/libs/EmojiUtils.js
Original file line number Diff line number Diff line change
@@ -1,48 +1,47 @@
import _ from 'underscore';
import lodashOrderBy from 'lodash/orderBy';
import moment from 'moment';
import Str from 'expensify-common/lib/str';
import CONST from '../CONST';
import * as User from './actions/User';
import emojisTrie from './EmojiTrie';

/**
* Get the unicode code of an emoji in base 16.
* @param {String} input
* @returns {String}
*/
const getEmojiUnicode = _.memoize((input) => {
if (input.length === 0) {
return '';
}
// /**
// * Get the unicode code of an emoji in base 16.
// * @param {String} input
// * @returns {String}
// */
// const getEmojiUnicode = _.memoize((input) => {
// if (input.length === 0) {
// return '';
// }

if (input.length === 1) {
return _.map(input.charCodeAt(0).toString().split(' '), val => parseInt(val, 10).toString(16)).join(' ');
}
// if (input.length === 1) {
// return _.map(input.charCodeAt(0).toString().split(' '), val => parseInt(val, 10).toString(16)).join(' ');
// }

const pairs = [];

// Some Emojis in UTF-16 are stored as pair of 2 Unicode characters (eg Flags)
// The first char is generally between the range U+D800 to U+DBFF called High surrogate
// & the second char between the range U+DC00 to U+DFFF called low surrogate
// More info in the following links:
// 1. https://docs.microsoft.com/en-us/windows/win32/intl/surrogates-and-supplementary-characters
// 2. https://thekevinscott.com/emojis-in-javascript/
for (let i = 0; i < input.length; i++) {
if (input.charCodeAt(i) >= 0xd800 && input.charCodeAt(i) <= 0xdbff) { // high surrogate
if (input.charCodeAt(i + 1) >= 0xdc00 && input.charCodeAt(i + 1) <= 0xdfff) { // low surrogate
pairs.push(
((input.charCodeAt(i) - 0xd800) * 0x400)
+ (input.charCodeAt(i + 1) - 0xdc00) + 0x10000,
);
}
} else if (input.charCodeAt(i) < 0xd800 || input.charCodeAt(i) > 0xdfff) {
// modifiers and joiners
pairs.push(input.charCodeAt(i));
}
}
return _.map(pairs, val => parseInt(val, 10).toString(16)).join(' ');
});
// const pairs = [];

// // Some Emojis in UTF-16 are stored as pair of 2 Unicode characters (eg Flags)
// // The first char is generally between the range U+D800 to U+DBFF called High surrogate
// // & the second char between the range U+DC00 to U+DFFF called low surrogate
// // More info in the following links:
// // 1. https://docs.microsoft.com/en-us/windows/win32/intl/surrogates-and-supplementary-characters
// // 2. https://thekevinscott.com/emojis-in-javascript/
// for (let i = 0; i < input.length; i++) {
// if (input.charCodeAt(i) >= 0xd800 && input.charCodeAt(i) <= 0xdbff) { // high surrogate
// if (input.charCodeAt(i + 1) >= 0xdc00 && input.charCodeAt(i + 1) <= 0xdfff) { // low surrogate
// pairs.push(
// ((input.charCodeAt(i) - 0xd800) * 0x400)
// + (input.charCodeAt(i + 1) - 0xdc00) + 0x10000,
// );
// }
// } else if (input.charCodeAt(i) < 0xd800 || input.charCodeAt(i) > 0xdfff) {
// // modifiers and joiners
// pairs.push(input.charCodeAt(i));
// }
// }
// return _.map(pairs, val => parseInt(val, 10).toString(16)).join(' ');
// });

/**
* Function to remove Skin Tone and utf16 surrogates from Emoji
Expand All @@ -59,27 +58,27 @@ function trimEmojiUnicode(emojiCode) {
* @param {String} message
* @returns {Boolean}
*/
function containsOnlyEmojis(message) {
const trimmedMessage = Str.replaceAll(message.replace(/ /g, ''), '\n', '');
const match = trimmedMessage.match(CONST.REGEX.EMOJIS);
// function containsOnlyEmojis(message) {
// const trimmedMessage = Str.replaceAll(message.replace(/ /g, ''), '\n', '');
// const match = trimmedMessage.match(CONST.REGEX.EMOJIS);

if (!match) {
return false;
}
// if (!match) {
// return false;
// }

const codes = [];
_.map(match, emoji => _.map(getEmojiUnicode(emoji).split(' '), (code) => {
if (!CONST.INVISIBLE_CODEPOINTS.includes(code)) {
codes.push(code);
}
return code;
}));
// const codes = [];
// _.map(match, emoji => _.map(getEmojiUnicode(emoji).split(' '), (code) => {
// if (!CONST.INVISIBLE_CODEPOINTS.includes(code)) {
// codes.push(code);
// }
// return code;
// }));

// Emojis are stored as multiple characters, so we're using spread operator
// to iterate over the actual emojis, not just characters that compose them
const messageCodes = _.filter(_.map([...trimmedMessage], char => getEmojiUnicode(char)), string => string.length > 0 && !CONST.INVISIBLE_CODEPOINTS.includes(string));
return codes.length === messageCodes.length;
}
// // Emojis are stored as multiple characters, so we're using spread operator
// // to iterate over the actual emojis, not just characters that compose them
// const messageCodes = _.filter(_.map([...trimmedMessage], char => getEmojiUnicode(char)), string => string.length > 0 && !CONST.INVISIBLE_CODEPOINTS.includes(string));
// return codes.length === messageCodes.length;
// }

/**
* Get the header indices based on the max emojis per row
Expand Down Expand Up @@ -245,6 +244,141 @@ function suggestEmojis(text, limit = 5) {
return [];
}

// /**
// * Validates that this message contains emojis
// *
// * @param {String} message
// * @returns {Boolean}
// */
// function hasEmojis(message) {
// if (!message) {
// return false;
// }
// const trimmedMessage = Str.replaceAll(message.replace(/ /g, ''), '\n', '');

// // return CONST.REGEX.EMOJIS.test(trimmedMessage);
// return Boolean(trimmedMessage.match(CONST.REGEX.EMOJIS));
// }

/**
* Get all the emojis in the message
* @param {String} text
* @returns {Array}
*/
function getAllEmojiFromText(text) {
// return an empty array when no text is passed
if (!text) {
return [];
}

// Unicode Character 'ZERO WIDTH JOINER' (U+200D) is usually used to join surrogate pair together without breaking the emoji
const zeroWidthJoiner = '\u200D'; // https://codepoints.net/U+200D?lang=en
const splittedMessage = text.split('');
const result = [];

let wordHolder = ''; // word counter
let emojiHolder = ''; // emoji counter

const setResult = (word, isEmoji = false) => {
// for some weird reason javascript sees the empty string `"` as a word with `length = 1`
// this is caused after splitting the text empty spaces are added to both the start and the end of all emojis and text
// given the empty space is close to a text then its length is counted as 0
// while if it's before or after an emoji then it's counted as 1, so we remove the word where word.length equals 1
// NOTE: this does not affect a single character element example typing `[i | J]` cause after splitting its empty word.length is calculated as 0
if (!isEmoji && word.length === 1) {
return;
}

result.push({text: word, isEmoji});
};

_.forEach(splittedMessage, (word, index) => {
if (CONST.REGEX.EMOJI_SURROGATE.test(word) || word === zeroWidthJoiner) {
setResult(wordHolder);
wordHolder = '';
emojiHolder += word;
} else {
setResult(emojiHolder, true);
emojiHolder = '';
wordHolder += word;
}

if (index === splittedMessage.length - 1) {
setResult(emojiHolder, true);
setResult(wordHolder);
}
});

// remove none text characters like '' only return where text is a word or white space ' '
return _.filter(result, res => res.text);
}

// function getAllEmojiFromText(text) {
// if (!text) {
// return [];
// }

// const splitText = [];
// let reResult;
// let lastMatchIndexEnd = 0;
// do {
// // Look for an emoji chunk in the string
// reResult = CONST.REGEX.EMOJIS.exec(text);

// // If we reached the end of the string and it wasn't included in a previous match
// // the chunk between the end of the last match and the end of the string is plain text
// if (reResult === null && lastMatchIndexEnd !== text.length - 1) {
// splitText.push({
// text: text.slice(lastMatchIndexEnd, text.length),
// isEmoji: false,
// });
// // eslint-disable-next-line no-continue
// continue;
// }

// const matchIndexStart = reResult.indices[0][0];
// const matchIndexEnd = reResult.indices[0][1];

// // The chunk between the end of the last match and the start of the new one is plain-text
// splitText.push({
// text: text.slice(lastMatchIndexEnd, matchIndexStart),
// isEmoji: false,
// });

// // Everything captured by the regex itself is emoji + whitespace
// splitText.push({
// text: text.slice(matchIndexStart, matchIndexEnd),
// isEmoji: true,
// });

// lastMatchIndexEnd = matchIndexEnd;
// } while (reResult !== null);

// return _.filter(splitText, res => res.text);
// }

/**
* Validates that this message contains has emojis
*
* @param {String} message
* @returns {Boolean}
*/
function hasEmojis(message) {
const splitText = getAllEmojiFromText(message);
return _.find(splitText, chunk => chunk.isEmoji) !== undefined;
}

/**
* Validates that this message contains only emojis
*
* @param {String} message
* @returns {Boolean}
*/
function containsOnlyEmojis(message) {
const splitText = getAllEmojiFromText(message);
return _.every(splitText, chunk => chunk.isEmoji);
}

export {
getHeaderIndices,
mergeEmojisWithFrequentlyUsedEmojis,
Expand All @@ -253,4 +387,6 @@ export {
replaceEmojis,
suggestEmojis,
trimEmojiUnicode,
getAllEmojiFromText,
hasEmojis,
};
12 changes: 11 additions & 1 deletion src/pages/DetailsPage.js
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import PressableWithoutFocus from '../components/PressableWithoutFocus';
import * as Report from '../libs/actions/Report';
import OfflineWithFeedback from '../components/OfflineWithFeedback';
import AutoUpdateTime from '../components/AutoUpdateTime';
import TextEmoji from '../components/TextEmoji';

const matchType = PropTypes.shape({
params: PropTypes.shape({
Expand Down Expand Up @@ -148,7 +149,16 @@ class DetailsPage extends React.PureComponent {
</AttachmentModal>
{details.displayName && (
<Text style={[styles.textHeadline, styles.mb6]} numberOfLines={1}>
{isSMSLogin ? this.props.toLocalPhone(details.displayName) : details.displayName}
{isSMSLogin
? this.props.toLocalPhone(details.displayName)
: (
<TextEmoji
style={styles.emojiMessageText}
plainTextContainerStyle={styles.messageTextWithoutEmoji}
>
{details.displayName}
</TextEmoji>
)}
</Text>
)}
{details.login ? (
Expand Down
Loading