Skip to content

Commit

Permalink
Merge pull request #23 from raphlinus/master
Browse files Browse the repository at this point in the history
New cursor-based implementation of grapheme clusters
  • Loading branch information
Manishearth authored Mar 16, 2017
2 parents e86a69b + deebd8a commit 6fc6815
Show file tree
Hide file tree
Showing 5 changed files with 608 additions and 381 deletions.
12 changes: 2 additions & 10 deletions scripts/unicode.py
Original file line number Diff line number Diff line change
Expand Up @@ -330,21 +330,13 @@ def emit_break_module(f, break_table, break_cats, name):
grapheme_cats = load_properties("auxiliary/GraphemeBreakProperty.txt", [])

# Control
# Note 1:
# Note:
# This category also includes Cs (surrogate codepoints), but Rust's `char`s are
# Unicode Scalar Values only, and surrogates are thus invalid `char`s.
# Thus, we have to remove Cs from the Control category
# Note 2:
# 0x0a and 0x0d (CR and LF) are not in the Control category for Graphemes.
# However, the Graphemes iterator treats these as a special case, so they
# should be included in grapheme_cats["Control"] for our implementation.
grapheme_cats["Control"] = group_cat(list(
(set(ungroup_cat(grapheme_cats["Control"]))
| set(ungroup_cat(grapheme_cats["CR"]))
| set(ungroup_cat(grapheme_cats["LF"])))
set(ungroup_cat(grapheme_cats["Control"]))
- set(ungroup_cat([surrogate_codepoints]))))
del(grapheme_cats["CR"])
del(grapheme_cats["LF"])

grapheme_table = []
for cat in grapheme_cats:
Expand Down
Loading

0 comments on commit 6fc6815

Please sign in to comment.