-
Notifications
You must be signed in to change notification settings - Fork 1
/
TODO-Indic
70 lines (55 loc) · 3.3 KB
/
TODO-Indic
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
* Add new positional category for U+103C MYANMAR CONSONANT SIGN MEDIAL RA.
* Separate Vowel_Dependent into Vowel_Dependent and Vowel_Modifier.
Candidates for Vowel_Modifier include vowel length marks, U+AA29, U+10A0D,
U+111CB, U+111CC, etc. Also figure out if U+17CB is better classified as
Vowel_Modifier. Comment from Behdad on U+AA29:
Even though U+AA29 CHAM VOWEL SIGN AA is called a vowel sign, it, in
fact, is a vowel lengthener and can occur after a consonant (lengthening
the inherent A vowel), or any other vowel. As such, it should be
recategorized as a vowel modifier, NOT a vowel (which it currently is)
for Universal Shaping Engine purposes. While we can carry this as an
override in USE, it makes more sense to us that Unicode data files
reflect this distinction. I don't know if a new category is needed.
Andrew Glass suggested the same resolution in private communication with Ben Mitchell.
Ref. https://github.com/harfbuzz/harfbuzz/issues/376
* Go over http://unicode.org/pipermail/unicode/2014-May/000562.html and see
if any categories should be changed based on it.
* Finish going over Andrew Glass's still-unclear InSC overrides.
* Check coverage of Indic properties by writing a tool checking coverage
* Make sure all information in HarfBuzz is reflected properly in the data
files.
* Split the Consonant_Placeholders to three classes: first would be
symbol-like things that replace a whole syllable, and cannot take any
vowels, but can take tone marks such as vedic marks; second would be
consonant-cluster like things that can take matras, but will not
participate with other consonants in forming clusters (similar to how
numbers behave); third would be things that act visually just like
consonants but not real consonants.
* Figure out the status of Jihvamuliya and Upadhmaniya characters in
Vedic. Do they need viramas to form conjuncts or not? In other
words, are they Consonant or Consonant_With_Stacker?
* The Lepcha U+1C29 is listed as Top_And_Left in Indic Matra Category, but
is listed together with Left matras in Table 4-4 of the Core
Specification. This may actually be a left matra, similar to the
Devanagari vowel I that extends to the top of the consonant. There are
other similar characters potentially misclassified, which can be found
below:
U+0B57: Top_And_Right, should be Right (Oriya AU Length Mark)
U+1C29: Top_And_Left, should be Left (Lepcha OO)
U+A9C0: Bottom_And_Right, should be Right (Javanese killer)
U+111BF: Top_And_Right, should be Right (Sharada AU)
* At least seven characters are encoded in Unicode with left and right
pieces separately encoded but with no canonical decompositions to the
pieces. We should check that there is text in the Core Specification and
the NamesList that mention the preferred encoding for each of the cases,
as they are ambiguous. We also need to make sure they are added to the
list of confusables.
0AC9 => 0AC5 0ABE(Gujarati Candra O) [Confirmed fixed by Eric Muller]
0F77 => 0FB2 0F81(Tibetan Vocalic RR)
0F79 => 0FB3 0F81(Tibetan Vocalic LL)
17BE => 17C1 17B8(Khmer OE)
17C4 => 17C1 17B6(Khmer OO)
1925 => 1920 1923(Limbu OO)
1926 => 1920 1924(Limbu AU)
* See if we need a special syllabic category for Myanmar Asat, as it forms
kinzis.