Skip to content

Latest commit

 

History

History
44 lines (39 loc) · 2.1 KB

TODO.md

File metadata and controls

44 lines (39 loc) · 2.1 KB

TODO

  • rewrite test functions from bool foo(x) ... to bool is_foo(x) ...
  • all about emoji flag sequences
  • bool is_mirrorred(char32_t) noexcept (such as parenthesis, curly braces, brackets, ...)
    • also ability to get the mirrorring codepoint
  • map codepoint to block (enum) - see Blocks.txt
  • map coepoint to plane (enum)
  • map block to codepoint range
  • map plane to codepoint range
  • provide C API binding for basic functionality
  • script_segmenter: add support for commonPreferredScript tracking wrt brackets () [] {}.
  • script_segmenter: test "foo(λ);" -> {Latin, Greek, Latin}
  • orientation_segmenter (and integrate it into run_segmenter as well as its tests)
  • mktables: fmtlib integration into ucd_fmt.h (without actually depending on fmtlib itself)
  • mktables: to_string builder
  • mktables: to_type builder
  • mktables: pylint into CI
  • clang-tidy into CI
  • META: cmake install target (header files and .a file, executable)
  • META: pkg-config file
  • word segmentation (UTS algorithm)
  • generic text segmentation (top level segmentation API suitable for text shaping implementations)
  • CLI tool: unicode-inspect for inspecting input files by code point, grapheme cluster, word, script, ...
  • unit tests for most parts (wcwidth / segmentation)
  • README: list all TRs that are being implemented
  • API for accessing UCD properties
  • UTF8 <-> UTF32 conversion
  • grapheme segmentation (UTS algorithm)
  • symbol/emoji segmentation (UTS algorithm)
  • wcwidth equivalent (unicode::width(char32_t))
  • script segmentation
  • out<T> helper to force explicit ref(val) for more readability.
  • operator<<(ostream&, T) for all UCD properties - in its own header file (ucd_ostream.h)
  • emoji_segmenter: test "x 😀 y" -> {Text, Emoji, Text}
  • make run_segmenter more templated / customizable
  • mktables: enum class builder

Integration TODO

  • integrate into contour
  • see if this makes sense: make use of this library in klex lexical scanner, to allow unicode input