Skip to content

Commit

Permalink
Even in 8-bit mode, perform range computation for char classes if UCP…
Browse files Browse the repository at this point in the history
… flag is set

When testing another patch, I discovered that #474 caused a small change
in the behavior of character classes when caseless mode and UCP were enabled.

Thank you to Zoltan Herczeg for suggesting a fix.

Closes GH-526.
  • Loading branch information
alexdowad committed Oct 13, 2024
1 parent c9bf833 commit d50527d
Show file tree
Hide file tree
Showing 4 changed files with 13 additions and 2 deletions.
2 changes: 1 addition & 1 deletion src/pcre2_compile.c
Original file line number Diff line number Diff line change
Expand Up @@ -5891,7 +5891,7 @@ for (;; pptr++)
#if PCRE2_CODE_UNIT_WIDTH == 8
cranges = NULL;

if (utf)
if (utf || ucp)
#endif
{
if (lengthptr != NULL)
Expand Down
2 changes: 1 addition & 1 deletion src/pcre2_compile_class.c
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ const uint32_t *skip_range = get_nocase_range(c);
uint32_t skip_start = skip_range[0];

#if PCRE2_CODE_UNIT_WIDTH == 8
PCRE2_ASSERT(options & PARSE_CLASS_UTF);
PCRE2_ASSERT(options & PARSE_CLASS_UTF || options & PARSE_CLASS_CASELESS_UTF);
#endif

#if PCRE2_CODE_UNIT_WIDTH == 32
Expand Down
3 changes: 3 additions & 0 deletions testdata/testinput10
Original file line number Diff line number Diff line change
Expand Up @@ -623,6 +623,9 @@
/X(\x{e1})Y/i,ucp,replace=>\L$1<,substitute_extended
X\x{c1}Y

/[a\x{c1}]/iI,ucp
\x{e1}

# Without UTF or UCP characters > 127 have only one case in the default locale.

/X(\x{e1})Y/replace=>\U$1<,substitute_extended
Expand Down
8 changes: 8 additions & 0 deletions testdata/testoutput10
Original file line number Diff line number Diff line change
Expand Up @@ -1883,6 +1883,14 @@ Subject length lower bound = 1
X\x{c1}Y
1: >\xe1<

/[a\x{c1}]/iI,ucp
Capture group count = 0
Options: caseless ucp
Starting code units: A a \xc1 \xe1
Subject length lower bound = 1
\x{e1}
0: \xe1

# Without UTF or UCP characters > 127 have only one case in the default locale.

/X(\x{e1})Y/replace=>\U$1<,substitute_extended
Expand Down

0 comments on commit d50527d

Please sign in to comment.