Skip to content

Commit

Permalink
Add documentation and more specs about FQE vs MQE vs UQE
Browse files Browse the repository at this point in the history
  • Loading branch information
janlelis committed Oct 24, 2024
1 parent f1bd37c commit 8e3819c
Show file tree
Hide file tree
Showing 2 changed files with 69 additions and 32 deletions.
67 changes: 35 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](https://badge.fury.io/rb/unicode-emoji) [![[ci]](https://github.com/janlelis/unicode-emoji/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-emoji/actions?query=workflow%3ATest)

Provides regular expressions to find Emoji in strings, incorporating the latest Unicode and Emoji standards.
Provides regular expressions to find Emoji in strings, incorporating the latest Unicode / Emoji standards.

Additional features:

- A categorized list of recommended Emoji
- A categorized list of Emoji (RGI: Recommended for General Interchange)
- Retrieve Emoji properties info about specific codepoints (Emoji_Modifier, Emoji_Presentation, etc.)

Emoji version: **16.0** (September 2024)
Expand All @@ -24,7 +24,7 @@ The gem includes multiple Emoji regexes, which are compiled out of various Emoji
```ruby
require "unicode/emoji"

string = "String which contains all kinds of emoji:
string = "String which contains all types of Emoji sequences:
- Singleton Emoji: 😴
- Textual singleton Emoji with Emoji variation: ▢️
Expand All @@ -33,7 +33,6 @@ string = "String which contains all kinds of emoji:
- Sub-Region flag: 🏴󠁧󠁒󠁳󠁣󠁴󠁿
- Keycap sequence: 2️⃣
- Sequence using ZWJ (zero width joiner): πŸ€ΎπŸ½β€β™€οΈ
"

string.scan(Unicode::Emoji::REGEX) # => ["😴", "▢️", "πŸ›ŒπŸ½", "πŸ‡΅πŸ‡Ή", "🏴󠁧󠁒󠁳󠁣󠁴󠁿", "2️⃣", "πŸ€ΎπŸ½β€β™€οΈ"]
Expand All @@ -45,29 +44,37 @@ Depending on your exact usecase, you can choose between multiple levels of Emoji

Regex | Description | Example Matches | Example Non-Matches
------------------------------|-------------|-----------------|--------------------
`Unicode::Emoji::REGEX` | **Use this one if unsure!** Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *recommended* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`, `1`, `1⃣`
`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *valid* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`, `1`, `1⃣`
`Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *well-formed* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅` | `😴︎`, `β–Ά`, `🏻`, `1`, `1⃣`
`Unicode::Emoji::REGEX_POSSIBLE` | Matches all singleton Emoji, singleton components, all kinds of Emoji sequences, and even single digits (except for: unqualified keycap sequences) | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅`, `😴︎`, `β–Ά`, `🏻`, `1` | `1⃣`
`Unicode::Emoji::REGEX` | **Use this one if unsure!** Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *recommended* Emoji sequences (RGI/FQE) | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ` | `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`, `1`, `1⃣`
`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *valid* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€` ,`πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`, `1`, `1⃣`
`Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *well-formed* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`,`πŸŒβ€β™‚οΈ` , `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅` | `😴︎`, `β–Ά`, `🏻`, `1`, `1⃣`
`Unicode::Emoji::REGEX_POSSIBLE` | Matches all singleton Emoji, singleton components, all kinds of Emoji sequences, and even single digits (except for: unqualified keycap sequences) | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅`, `😴︎`, `β–Ά`, `🏻`, `1` | `1⃣`

#### Include Text Emoji

By default, textual Emoji (emoji characters with text variation selector or those that have a default text presentation) will not be included in the default regexes (except in `REGEX_POSSIBLE`). However, if you wish to match for them too, you can include them in your regex by appending the `_INCLUDE_TEXT` suffix:

Regex | Description | Example Matches | Example Non-Matches
------------------------------|-------------|-----------------|--------------------
`Unicode::Emoji::REGEX_INCLUDE_TEXT` | `REGEX` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `😴︎`, `β–Ά`, `1⃣` | `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`, `1`
`Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT` | `REGEX_VALID` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`, `😴︎`, `β–Ά`, `1⃣` | `🏻`, `πŸ‡΅πŸ‡΅`, `1`
`Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT` | `REGEX_WELL_FORMED` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅`, `😴︎`, `β–Ά`, `1⃣` | `🏻`, `1`
`Unicode::Emoji::REGEX_INCLUDE_TEXT` | `REGEX` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `😴︎`, `β–Ά`, `1⃣` | `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`, `1`
`Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT` | `REGEX_VALID` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’`, `😴︎`, `β–Ά`, `1⃣` | `🏻`, `πŸ‡΅πŸ‡΅`, `1`
`Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT` | `REGEX_WELL_FORMED` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅`, `😴︎`, `β–Ά`, `1⃣` | `🏻`, `1`

#### Minimally-qualified and Unqualified Sequences

Regex | Description | Example Matches | Example Non-Matches
------------------------------|-------------|-----------------|--------------------
`Unicode::Emoji::REGEX_INCLUDE_MQE` | Like `REGEX`, but additionally includes Emoji with missing Emoji Presentation Variation Selectors, where the first partial Emoji has all required Variation Selectors | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€` | `πŸŒβ€β™‚οΈ`, `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`, `1`, `1⃣`
`Unicode::Emoji::REGEX_INCLUDE_MQE_UQE` | Like `REGEX`, but additionally includes Emoji with missing Emoji Presentation Variation Selectors | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`, `1`, `1⃣`


#### Singleton Regexes

Matches only simple one-codepoint (+ optional variation selector) Emoji:

Regex | Description | Example Matches | Example Non-Matches
------------------------------|-------------|-----------------|--------------------
`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences at all | `😴`, `▢️` | `😴︎`, `β–Ά`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`, `1`
`Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digits) | `😴︎`, `β–Ά` | `😴`, `▢️`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`, `1`
`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences at all | `😴`, `▢️` | `😴︎`, `β–Ά`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’`, `1`
`Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digits) | `😴︎`, `β–Ά` | `😴`, `▢️`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ΎπŸ½β€β™€`, `πŸŒβ€β™‚οΈ`, `πŸ€ β€πŸ€’`, `1`

Here is a list of all Emoji that can be matched using the two regexes: [character.construction/emoji-vs-text](https://character.construction/emoji-vs-text)

Expand All @@ -79,11 +86,11 @@ While `REGEX_BASIC` is part of the above regexes, `REGEX_TEXT` is only included
2) Minimally-qualified RGI Emoji ZWJ sequence (lacks Emoji Presentation Selectors, but not in the first Emoji character)
3) Unqualified RGI Emoji ZWJ sequence (lacks Emoji Presentation Selector, including in the first Emoji character). Unqualified Emoji include all basic Emoji in Text Presentation (see column 11/12).
4) Non-RGI Emoji ZWJ sequence
5) Valid Region made from pair of Regional Indicators
6) Any Region made from pair of Regional Indicators
5) Valid Region made from a pair of Regional Indicators
6) Any Region made from a pair of Regional Indicators
7) RGI Flag Emoji Tag Sequences (England, Scotland, Wales)
8) Valid Flag Emoji Tag Sequences (any known sub-division)
9) Any Flag Emoji Tag Sequences (any tag sequence)
8) Valid Flag Emoji Tag Sequences (any known subdivision)
9) Any Emoji Tag Sequences (any tag sequence with any base)
10) Basic Default Emoji Presentation Characters or Text characters with Emoji Presentation Selector
11) Basic Default Text Presentation Characters or Basic Emoji with Text Presentation Selector
12) Non-Emoji (unqualified) keycap
Expand All @@ -108,23 +115,25 @@ See [spec files](/spec) for detailed examples about which regex matches which ki

### Picking the Right Emoji Regex

- Usually you just want `REGEX` (RGI set)
- Usually you just want `REGEX` (recommended Emoji set, RGI)
- Use `REGEX_INCLUDE_MQE` or `REGEX_INCLUDE_MQE_UQE` if you want to catch Emoji sequences with missing Variation Selectors.
- If you want broader matching (any ZWJ sequences, more sub-region flags), choose `REGEX_VALID`
- If you need to match any region flag and any tag sequence, choose `REGEX_WELL_FORMED`
- Use the `_INCLUDE_TEXT` suffix with any of the above, if you want to also match basic textual Emoji
- Use the `_INCLUDE_TEXT` suffix with any of the above base regexes, if you want to also match basic textual Emoji
- And finally, there is also the option to use `REGEX_POSSIBLE`, which is a simplified test for possible Emoji, comparable to `REGEX_WELL_FORMED*`. It might contain false positives, however, the regex is less complex and [suggested in the Unicode standard itself](https://www.unicode.org/reports/tr51/#EBNF_and_Regex) as a first check.

### Examples

Desc | Emoji | Escaped | `REGEX` (RGI) | `REGEX_VALID` (Valid) | `REGEX_WELL_FORMED` (Well-formed) | `REGEX_POSSIBLE`
Desc | Emoji | Escaped | `REGEX` (RGI/FQE) | `REGEX_INCLUDE_MQE` (RGI/MQE) | `REGEX_VALID` | `REGEX_WELL_FORMED` / `REGEX_POSSIBLE`
-----|-------|---------|---------------|-----------------------|-----------------------------------|-----------------
RGI ZWJ Sequence | πŸ€ΎπŸ½β€β™€οΈ | `\u{1F93E 1F3FD 200D 2640 FE0F}` | βœ… | βœ… | βœ… | βœ…
Valid ZWJ Sequence | πŸ€ β€πŸ€’ | `\u{1F920 200D 1F922}` | ❌ | βœ… | βœ… | βœ…
RGI ZWJ Sequence MQE | πŸ€ΎπŸ½β€β™€ | `\u{1F93E 1F3FD 200D 2640}` | ❌ | βœ… | βœ… | βœ…
Valid ZWJ Sequence, Non-RGI | πŸ€ β€πŸ€’ | `\u{1F920 200D 1F922}` | ❌ | ❌ | βœ… | βœ…
Known Region | πŸ‡΅πŸ‡Ή | `\u{1F1F5 1F1F9}` | βœ… | βœ… | βœ… | βœ…
Unknown Region | πŸ‡΅πŸ‡΅ | `\u{1F1F5 1F1F5}` | ❌ | ❌ | βœ… | βœ…
Unknown Region | πŸ‡΅πŸ‡΅ | `\u{1F1F5 1F1F5}` | ❌ | ❌ | ❌ | βœ…
RGI Tag Sequence | 🏴󠁧󠁒󠁳󠁣󠁴󠁿 | `\u{1F3F4 E0067 E0062 E0073 E0063 E0074 E007F}` | βœ… | βœ… | βœ… | βœ…
Valid Tag Sequence | 🏴󠁧󠁒󠁑󠁧󠁒󠁿 | `\u{1F3F4 E0067 E0062 E0061 E0067 E0062 E007F}` | ❌ | βœ… | βœ… | βœ…
Well-formed Tag Sequence | 😴󠁧󠁒󠁑󠁑󠁑󠁿 | `\u{1F634 E0067 E0062 E0061 E0061 E0061 E007F}` | ❌ | ❌ | βœ… | βœ…
Valid Tag Sequence | 🏴󠁧󠁒󠁑󠁧󠁒󠁿 | `\u{1F3F4 E0067 E0062 E0061 E0067 E0062 E007F}` | ❌ | ❌ | βœ… | βœ…
Well-formed Tag Sequence | 😴󠁧󠁒󠁑󠁑󠁑󠁿 | `\u{1F634 E0067 E0062 E0061 E0061 E0061 E007F}` | ❌ | ❌ | ❌ | βœ…

Please see [the standard](https://www.unicode.org/reports/tr51/#Emoji_Sets) for more details, examples, explanations.

Expand All @@ -140,13 +149,7 @@ See [character.construction/picto](https://character.construction/picto) for a l

### Partial Regexes

**Please note:** Might get removed or renamed in the future. This the same as `\p{Emoji}`

Matches potential Emoji parts (often, this is not what you want):

Regex | Description | Example Matches | Example Non-Matches
------------------------------|-------------|-----------------|--------------------
`Unicode::Emoji::REGEX_ANY` | Matches any Emoji-related codepoint (but no variation selectors, tags, or zero-width joiners). Please not that this will match Emoji-parts rather than complete Emoji, for example, single digits! | `😴`, `β–Ά`, `🏻`, `πŸ›Œ`, `🏽`, `πŸ‡΅`, `πŸ‡Ή`, `2`, `🏴`, `🀾`, `♀`, `🀠`, `🀒` | -
`Unicode::Emoji::REGEX_ANY`, same as `\p{Emoji}`. Deprecated: Will be removed or renamed in the future.

## Usage – List

Expand All @@ -169,7 +172,7 @@ A list of all Emoji (generated from this gem) can be found at [character.constru

## Usage – Properties Data

Allows you to access the codepoint data form Unicode's [emoji-data.txt](https://www.unicode.org/Public/16.0.0/ucd/emoji/emoji-data.txt) file:
Allows you to access the codepoint data for a single character form Unicode's [emoji-data.txt](https://www.unicode.org/Public/16.0.0/ucd/emoji/emoji-data.txt) file:

```ruby
require "unicode/emoji"
Expand Down
34 changes: 34 additions & 0 deletions spec/unicode_emoji_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,16 @@
assert_equal "πŸ€ΎπŸ½β€β™€οΈ", $&
end

it "does not match MQE zwj sequences" do
"πŸ€ΎπŸ½β€β™€ woman playing handball: medium skin tone, missing VS16" =~ Unicode::Emoji::REGEX
refute_equal `πŸ€ΎπŸ½β€β™€`, $&
end

it "does not match UQE emoji" do
"πŸŒβ€β™‚οΈ man golfing, missing VS16" =~ Unicode::Emoji::REGEX
refute_equal `πŸŒβ€β™‚οΈ`, $&
end

it "does not match valid zwj sequences that are not recommended" do
"πŸ€ β€πŸ€’ vomiting cowboy" =~ Unicode::Emoji::REGEX
assert_equal "🀠", $&
Expand Down Expand Up @@ -140,6 +150,30 @@
end
end

describe "REGEX_INCLUDE_MQE" do
it "matches MQE emoji" do
"πŸ€ΎπŸ½β€β™€ woman playing handball: medium skin tone, missing VS16" =~ Unicode::Emoji::REGEX_INCLUDE_MQE
assert_equal `πŸ€ΎπŸ½β€β™€`, $&
end

it "does not match UQE emoji" do
"πŸŒβ€β™‚οΈ man golfing, missing VS16" =~ Unicode::Emoji::REGEX_INCLUDE_MQE
refute_equal `πŸŒβ€β™‚οΈ`, $&
end
end

describe "REGEX_INCLUDE_MQE_UQE" do
it "matches MQE emoji" do
"πŸ€ΎπŸ½β€β™€ woman playing handball: medium skin tone, missing VS16" =~ Unicode::Emoji::REGEX_INCLUDE_MQE_UQE
assert_equal `πŸ€ΎπŸ½β€β™€`, $&
end

it "matches UQE emoji" do
"πŸŒβ€β™‚οΈ man golfing, missing VS16" =~ Unicode::Emoji::REGEX_INCLUDE_MQE_UQE
assert_equal `πŸŒβ€β™‚οΈ`, $&
end
end

describe "REGEX_VALID" do
it "matches most singleton emoji codepoints" do
"😴 sleeping face" =~ Unicode::Emoji::REGEX_VALID
Expand Down

0 comments on commit 8e3819c

Please sign in to comment.