Skip to content

Commit

Permalink
bugfix: char class casefold for certain chars
Browse files Browse the repository at this point in the history
When a character is less than or equal to single byte size (0xff),
yet it takes more than 1 byte in the current encoding, the
case folding code incorrectly put it in bitset instead of code
range. As a result, for utf8 encoding, casefold works incorrectly
on characters in range \u0080 to \u00ff (latin1 supplement).

Before fix:

* `"\u00c2"` `[\u00e0-\u00e5]` returns false
* `"\u00c2"` `[\u00e2]` returns false
* `"\u00c2"` `\u00e2` returns true
  • Loading branch information
haozhun committed Mar 20, 2015
1 parent 005b574 commit 13fe106
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/org/joni/ApplyCaseFold.java
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ public void apply(int from, int[]to, int length, Object o) {

if (Config.CASE_FOLD_IS_APPLIED_INSIDE_NEGATIVE_CCLASS) {
if ((inCC && !cc.isNot()) || (!inCC && cc.isNot())) {
if (enc.minLength() > 1 || to[0] >= BitSet.SINGLE_BYTE_SIZE) {
if (enc.minLength() > 1 || to[0] >= BitSet.SINGLE_BYTE_SIZE || enc.codeToMbcLength(to[0]) > 1) {
cc.addCodeRange(env, to[0], to[0]);
} else {
/* /(?i:[^A-C])/.match("a") ==> fail. */
Expand Down

0 comments on commit 13fe106

Please sign in to comment.