-
Notifications
You must be signed in to change notification settings - Fork 197
/
NEWS
578 lines (386 loc) · 21.8 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
News about PCRE2 releases
-------------------------
Version 10.45-RC1 27-December-2024
----------------------------------
This is a comparatively large release, incorporating new features, some
bugfixes, and a few changes with slight backwards compatibility implications.
Please see the ChangeLog and Git log for further details.
Only changes to behaviour, changes to the API, and major changes to the pattern
syntax are described here.
This release is the first to be available as a (signed) Git tag, or
alternatively as a (signed) tarball of the Git tag.
This is also the first release to be made by the new maintainers of PCRE2, and
we would like to thank Philip Hazel, creator and maintainer of PCRE and PCRE2.
* (Git change) The sljit project has been split out into a separate Git
repository. Git users must now run `git submodule init; git submodule update`
after a Git checkout.
* (Behaviour change) Update Unicode support to UCD 16.
* (Match behaviour change) Case-insensitive matching of Unicode properties
Ll, Lt, and Lu has been changed to match Perl. Previously, /\p{Ll}/i would
match only lower-case characters (even though case-insensitive matching was
specified). This also affects case-insensitive matching of POSIX classes such
as [:lower:].
* (Minor match behaviour change) Case-insensitive matching of backreferences now
respects the PCRE2_EXTRA_CASELESS_RESTRICT option.
* (Minor pattern syntax change) Parsing of the \x escape is stricter, and is
no longer parsed as an escape for the NUL character if not followed by '{' or
a hexadecimal digit. Use \x00 instead.
* (Major new feature) Add a new feature called scan substring. This is a new
type of assertion which matches the content of a capturing block to a
sub-pattern.
Example: to find a word that contains the rare (in English) sequence of
letters "rh" not at the start:
\b(\w++)(*scan_substring:(1).+rh)
The first group captures a word which is then scanned by the
(*scan_substring:(1) ... ) assertion, which tests whether the pattern ".+rh"
matches the capture group "(1)".
* (Major new feature) Add support for UTS#18 compatible character classes,
using the new option PCRE2_ALT_EXTENDED_CLASS. This adds '[' as a
metacharacter within character classes and the operators '&&', '--' and '~~',
allowing subtractions and intersections of character classes to be easily
expressed.
Example: to match Thai or Greek letters (but not letters or other characters
in those scripts), use [\p{L}&&[\p{Thai}||\p{Greek}]].
* (Major new feature) Add support for Perl-style extended character classes,
using the syntax (?[...]). This also allows expressing subtractions and
intersections of character classes, but using a different syntax to UTS#18.
Example: to match Thai or Greek letters (but not letters or other characters
in those scripts), use (?[\p{L} & (\p{Thai} + \p{Greek})]).
* (Minor feature) Significant improvements to the character class match engine.
Compiled character classes are now more compact, and have faster matching
for large or complex character sets, using binary search through the set.
* JIT compilation now fails with the new error code PCRE2_ERROR_JIT_UNSUPPORTED
for patterns which use features not supported by the JIT compiler.
* (Minor feature) New options PCRE2_EXTRA_NO_BS0 (disallow \0 as an escape for
the NUL character); PCRE2_EXTRA_PYTHON_OCTAL (use Python disambiguation rules
for deciding whether \12 is a backreference or an octal escape);
PCRE2_EXTRA_NEVER_CALLOUT (disable callout syntax entirely);
PCRE2_EXTRA_TURKISH_CASING (use Turkish rules for case-insensitive matching).
* (Minor feature) Add new API function pcre2_set_optimize() for controlling
which optimizations are enabled.
* (Minor new features) A variety of extensions have been made to
pcre2_substitute() and its syntax for replacement strings. These now support:
\123 octal escapes; titlecasing \u\L; \1 backreferences; \g<1> and $<NAME>
backreferences; $& $` $' and $_; new function
pcre2_set_substitute_case_callout() to allow locale-aware case transformation.
Version 10.44 07-June-2024
--------------------------
This is mostly a bug-fix and tidying release. There is one new function, to set
a maximum size for a compiled pattern. The maximum name length for groups is
increased to 128. Some auxiliary files for building under VMS are added.
Version 10.43 16-February-2024
------------------------------
There are quite a lot of changes in this release (see ChangeLog and Git log for
a list). Those that are not bugfixes or code tidies are:
* The JIT code no longer supports ARMv5 architecture.
* A new function pcre2_get_match_data_heapframes_size() for finer heap control.
* New option flags to restrict the interaction between ASCII and non-ASCII
characters for caseless matching and \d and friends. There are also new
pattern constructs to control these flags from within a pattern.
* Upgrade to Unicode 15.0.0.
* Treat a NULL pattern with zero length as an empty string.
* Added support for limited-length variable-length lookbehind assertions, with
a default maximum length of 255 characters (same as Perl) but with a function
to adjust the limit.
* Support for LoongArch in JIT.
* Perl changed the meaning of (for example) {,3} which did not used to be
recognized as a quantifier. Now it means {0,3} and PCRE2 has also changed.
Note that {,} is still not a quantifier.
* Following Perl, allow spaces and tabs after { and before } in all Perl-
compatible items that use braces, and also around commas in quantifiers. The
one exception in PCRE2 is \u{...}, which is from ECMAScript, not Perl, and
PCRE2 follows ECMAScript usage.
* Changed the meaning of \w and its synonyms and derivatives (\b and \B) in UCP
mode to follow Perl. It now matches characters whose general categories are L
or N or whose particular categories are Mn (non-spacing mark) or Pc
(combining punctuation).
* Changed the default meaning of [:xdigit:] in UCP mode to follow Perl. It now
matches the "fullwidth" versions of hex digits. PCRE2_EXTRA_ASCII_DIGIT can
be used to keep it ASCII only.
* Make PCRE2_UCP the default in UTF mode in pcre2grep and add --no-ucp,
--case-restrict and --posix-digit.
* Add --group-separator and --no-group-separator to pcre2grep.
Version 10.42 11-December-2022
------------------------------
This is an unexpectedly early release to fix a problem that was introduced in
10.41. ChangeLog number 19 (GitHub #139) added the default definition of
PCRE2_CALL_CONVENTION to pcre2posix.c instead of pcre2posix.h, which meant that
programs including pcre2posix.h but not pcre2.h couldn't compile. A new test
that checks this case has been added.
A couple of other minor issues are also fixed, and a patch for an intermittent
JIT fault is also included. See ChangeLog and the Git log.
Version 10.41 06-December-2022
------------------------------
This is another mainly bug-fixing and code-tidying release. There is one
significant upgrade to pcre2grep: it now behaves like GNU grep when matching
more than one pattern and a later pattern matches at an earlier point in the
subject when the matched substrings are being identified by colour or by
offsets.
Version 10.40 15-April-2022
---------------------------
This is mostly a bug-fixing and code-tidying release. However, there are some
extensions to Unicode property handling:
* Added support for Bidi_Class and a number of binary Unicode properties,
including Bidi_Control.
* A number of changes to script matching for \p and \P:
(a) Script extensions for a character are now coded as a bitmap instead of
a list of script numbers, which should be faster and does not need a
loop.
(b) Added the syntax \p{script:xxx} and \p{script_extensions:xxx} (synonyms
sc and scx).
(c) Changed \p{scriptname} from being the same as \p{sc:scriptname} to being
the same as \p{scx:scriptname} because this change happened in Perl at
release 5.26.
(d) The standard Unicode 4-letter abbreviations for script names are now
recognized.
(e) In accordance with Unicode and Perl's "loose matching" rules, spaces,
hyphens, and underscores are ignored in property names, which are then
matched independent of case.
As always, see ChangeLog for a list of all changes (also the Git log).
Version 10.39 29-October-2021
-----------------------------
This release is happening soon after 10.38 because the bug fix is important.
1. Fix incorrect detection of alternatives in first character search in JIT.
2. Update to Unicode 14.0.0.
3. Some code cleanups (see ChangeLog).
Version 10.38 01-October-2021
-----------------------------
As well as some bug fixes and tidies (as always, see ChangeLog for details),
the documentation is updated to list the new URLs, following the move of the
source repository to GitHub and the mailing list to Google Groups.
* The CMake build system can now build both static and shared libraries in one
go.
* Following Perl's lead, \K is now locked out in lookaround assertions by
default, but an option is provided to re-enable the previous behaviour.
Version 10.37 26-May-2021
-------------------------
A few more bug fixes and tidies. The only change of real note is the removal of
the actual POSIX names regcomp etc. from the POSIX wrapper library because
these have caused issues for some applications (see 10.33 #2 below).
Version 10.36 04-December-2020
------------------------------
Again, mainly bug fixes and tidies. The only enhancements are the addition of
GNU grep's -m (aka --max-count) option to pcre2grep, and also unifying the
handling of substitution strings for both -O and callouts in pcre2grep, with
the addition of $x{...} and $o{...} to allow for characters whose code points
are greater than 255 in Unicode mode.
NOTE: there is an outstanding issue with JIT support for MacOS on arm64
hardware. For details, please see Bugzilla issue #2618.
Version 10.35 15-April-2020
---------------------------
Bugfixes, tidies, and a few new enhancements.
1. Capturing groups that contain recursive backreferences to themselves are no
longer automatically atomic, because the restriction is no longer necessary
as a result of the 10.30 restructuring.
2. Several new options for pcre2_substitute().
3. When Unicode is supported and PCRE2_UCP is set without PCRE2_UTF, Unicode
character properties are used for upper/lower case computations on characters
whose code points are greater than 127.
4. The character tables (for low-valued characters) can now more easily be
saved and restored in binary.
5. Updated to Unicode 13.0.0.
Version 10.34 21-November-2019
------------------------------
Another release with a few enhancements as well as bugfixes and tidies. The
main new features are:
1. There is now some support for matching in invalid UTF strings.
2. Non-atomic positive lookarounds are implemented in the pcre2_match()
interpreter, but not in JIT.
3. Added two new functions: pcre2_get_match_data_size() and
pcre2_maketables_free().
4. Upgraded to Unicode 12.1.0.
Version 10.33 16-April-2019
---------------------------
Yet more bugfixes, tidies, and a few enhancements, summarized here (see
ChangeLog for the full list):
1. Callouts from pcre2_substitute() are now available.
2. The POSIX functions are now all called pcre2_regcomp() etc., with wrapper
functions that use the standard POSIX names. However, in pcre2posix.h the POSIX
names are defined as macros. This should help avoid linking with the wrong
library in some environments, while still exporting the POSIX names for
pre-existing programs that use them.
3. Some new options:
(a) PCRE2_EXTRA_ESCAPED_CR_IS_LF makes \r behave as \n.
(b) PCRE2_EXTRA_ALT_BSUX enables support for ECMAScript 6's \u{hh...}
construct.
(c) PCRE2_COPY_MATCHED_SUBJECT causes a copy of a matched subject to be
made, instead of just remembering a pointer.
4. Some new Perl features:
(a) Perl 5.28's experimental alphabetic names for atomic groups and
lookaround assertions, for example, (*pla:...) and (*atomic:...).
(b) The new Perl "script run" features (*script_run:...) and
(*atomic_script_run:...) aka (*sr:...) and (*asr:...).
(c) When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in
capture group names.
5. --disable-percent-zt disables the use of %zu and %td in formatting strings
in pcre2test. They were already automatically disabled for VC and older C
compilers.
6. Some changes related to callouts in pcre2grep:
(a) Support for running an external program under VMS has been added, in
addition to Windows and fork() support.
(b) --disable-pcre2grep-callout-fork restricts the callout support in
to the inbuilt echo facility.
Version 10.32 10-September-2018
-------------------------------
This is another mainly bugfix and tidying release with a few minor
enhancements. These are the main ones:
1. pcre2grep now supports the inclusion of binary zeros in patterns that are
read from files via the -f option.
2. ./configure now supports --enable-jit=auto, which automatically enables JIT
if the hardware supports it.
3. In pcre2_dfa_match(), internal recursive calls no longer use the stack for
local workspace and local ovectors. Instead, an initial block of stack is
reserved, but if this is insufficient, heap memory is used. The heap limit
parameter now applies to pcre2_dfa_match().
4. Updated to Unicode version 11.0.0.
5. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.
6. Added support for \N{U+dddd}, but only in Unicode mode.
7. Added support for (?^) to unset all imnsx options.
Version 10.31 12-February-2018
------------------------------
This is mainly a bugfix and tidying release (see ChangeLog for full details).
However, there are some minor enhancements.
1. New pcre2_config() options: PCRE2_CONFIG_NEVER_BACKSLASH_C and
PCRE2_CONFIG_COMPILED_WIDTHS.
2. New pcre2_pattern_info() option PCRE2_INFO_EXTRAOPTIONS to retrieve the
extra compile time options.
3. There are now public names for all the pcre2_compile() error numbers.
4. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new
field callout_flags in callout blocks.
Version 10.30 14-August-2017
----------------------------
The full list of changes that includes bugfixes and tidies is, as always, in
ChangeLog. These are the most important new features:
1. The main interpreter, pcre2_match(), has been refactored into a new version
that does not use recursive function calls (and therefore the system stack) for
remembering backtracking positions. This makes --disable-stack-for-recursion a
NOOP. The new implementation allows backtracking into recursive group calls in
patterns, making it more compatible with Perl, and also fixes some other
previously hard-to-do issues. For patterns that have a lot of backtracking, the
heap is now used, and there is an explicit limit on the amount, settable by
pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
but is renamed as "depth limit" (though the old names remain for
compatibility).
There is also a change in the way callouts from pcre2_match() are handled. The
offset_vector field in the callout block is no longer a pointer to the
actual ovector that was passed to the matching function in the match data
block. Instead it points to an internal ovector of a size large enough to hold
all possible captured substrings in the pattern.
2. The new option PCRE2_ENDANCHORED insists that a pattern match must end at
the end of the subject.
3. The new option PCRE2_EXTENDED_MORE implements Perl's /xx feature, and
pcre2test is upgraded to support it. Setting within the pattern by (?xx) is
also supported.
4. (?n) can be used to set PCRE2_NO_AUTO_CAPTURE, because Perl now has this.
5. Additional compile options in the compile context are now available, and the
first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
6. The newline type PCRE2_NEWLINE_NUL is now available.
7. The match limit value now also applies to pcre2_dfa_match() as there are
patterns that can use up a lot of resources without necessarily recursing very
deeply.
8. The option REG_PEND (a GNU extension) is now available for the POSIX
wrapper. Also there is a new option PCRE2_LITERAL which is used to support
REG_NOSPEC.
9. PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD are implemented for the
benefit of pcre2grep, and pcre2grep's -F, -w, and -x options are re-implemented
using PCRE2_LITERAL, PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This
is tidier and also fixes some bugs.
10. The Unicode tables are upgraded from Unicode 8.0.0 to Unicode 10.0.0.
11. There are some experimental functions for converting foreign patterns
(globs and POSIX patterns) into PCRE2 patterns.
Version 10.23 14-February-2017
------------------------------
1. ChangeLog has the details of a lot of bug fixes and tidies.
2. There has been a major re-factoring of the pcre2_compile.c file. Most syntax
checking is now done in the pre-pass that identifies capturing groups. This has
reduced the amount of duplication and made the code tidier. While doing this,
some minor bugs and Perl incompatibilities were fixed (see ChangeLog for
details.)
3. Back references are now permitted in lookbehind assertions when there are
no duplicated group numbers (that is, (?| has not been used), and, if the
reference is by name, there is only one group of that name. The referenced
group must, of course be of fixed length.
4. \g{+<number>} (e.g. \g{+2} ) is now supported. It is a "forward back
reference" and can be useful in repetitions (compare \g{-<number>} ). Perl does
not recognize this syntax.
5. pcre2grep now automatically expands its buffer up to a maximum set by
--max-buffer-size.
6. The -t option (grand total) has been added to pcre2grep.
7. A new function called pcre2_code_copy_with_tables() exists to copy a
compiled pattern along with a private copy of the character tables that is
uses.
8. A user supplied a number of patches to upgrade pcre2grep under Windows and
tidy the code.
9. Several updates have been made to pcre2test and test scripts (see
ChangeLog).
Version 10.22 29-July-2016
--------------------------
1. ChangeLog has the details of a number of bug fixes.
2. The POSIX wrapper function regcomp() did not used to support back references
and subroutine calls if called with the REG_NOSUB option. It now does.
3. A new function, pcre2_code_copy(), is added, to make a copy of a compiled
pattern.
4. Support for string callouts is added to pcre2grep.
5. Added the PCRE2_NO_JIT option to pcre2_match().
6. The pcre2_get_error_message() function now returns with a negative error
code if the error number it is given is unknown.
7. Several updates have been made to pcre2test and test scripts (see
ChangeLog).
Version 10.21 12-January-2016
-----------------------------
1. Many bugs have been fixed. A large number of them were provoked only by very
strange pattern input, and were discovered by fuzzers. Some others were
discovered by code auditing. See ChangeLog for details.
2. The Unicode tables have been updated to Unicode version 8.0.0.
3. For Perl compatibility in EBCDIC environments, ranges such as a-z in a
class, where both values are literal letters in the same case, omit the
non-letter EBCDIC code points within the range.
4. There have been a number of enhancements to the pcre2_substitute() function,
giving more flexibility to replacement facilities. It is now also possible to
cause the function to return the needed buffer size if the one given is too
small.
5. The PCRE2_ALT_VERBNAMES option causes the "name" parts of special verbs such
as (*THEN:name) to be processed for backslashes and to take note of
PCRE2_EXTENDED.
6. PCRE2_INFO_HASBACKSLASHC makes it possible for a client to find out if a
pattern uses \C, and --never-backslash-C makes it possible to compile a version
PCRE2 in which the use of \C is always forbidden.
7. A limit to the length of pattern that can be handled can now be set by
calling pcre2_set_max_pattern_length().
8. When matching an unanchored pattern, a match can be required to begin within
a given number of code units after the start of the subject by calling
pcre2_set_offset_limit().
9. The pcre2test program has been extended to test new facilities, and it can
now run the tests when LF on its own is not a valid newline sequence.
10. The RunTest script has also been updated to enable more tests to be run.
11. There have been some minor performance enhancements.
Version 10.20 30-June-2015
--------------------------
1. Callouts with string arguments and the pcre2_callout_enumerate() function
have been implemented.
2. The PCRE2_NEVER_BACKSLASH_C option, which locks out the use of \C, is added.
3. The PCRE2_ALT_CIRCUMFLEX option lets ^ match after a newline at the end of a
subject in multiline mode.
4. The way named subpatterns are handled has been refactored. The previous
approach had several bugs.
5. The handling of \c in EBCDIC environments has been changed to conform to the
perlebcdic document. This is an incompatible change.
6. Bugs have been mended, many of them discovered by fuzzers.
Version 10.10 06-March-2015
---------------------------
1. Serialization and de-serialization functions have been added to the API,
making it possible to save and restore sets of compiled patterns, though
restoration must be done in the same environment that was used for compilation.
2. The (*NO_JIT) feature has been added; this makes it possible for a pattern
creator to specify that JIT is not to be used.
3. A number of bugs have been fixed. In particular, bugs that caused building
on Windows using CMake to fail have been mended.
Version 10.00 05-January-2015
-----------------------------
Version 10.00 is the first release of PCRE2, a revised API for the PCRE
library. Changes prior to 10.00 are logged in the ChangeLog file for the old
API, up to item 20 for release 8.36. New programs are recommended to use the
new library. Programs that use the original (PCRE1) API will need changing
before linking with the new library.
****