All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- made the MFA requirement for changes to this gem visible on rubygems
- thanks to Geremia Taglialatela
- fixed unnecessary
$LOAD_PATH
searches at load time- thanks to Koichi ITO
- all expressions now respond to
#negative?
/#negated?
- previously only sets, props, and posix classes did
- implemented
#negative?
/#negated?
for more applicable expressions\B
,\D
,\H
,\S
,\W
,(?!...)
,(?<!...)
- fixed missing support for grapheme cluster break unicode properties
- e.g.
/\p{Grapheme_Cluster_Break=Extend}/
- e.g.
- fixed scanner errors for insignificant leading zeros in numerical group refs
- e.g.
(a)\k<01>
,(a)\g<-01>
,(a)?(?(01)b|c)
- thanks to Markus Schirp for the report
- e.g.
- handle a corner case where parsing redundant number escapes raised an error
- e.g.
parse(/\99/)
, which in Ruby is a valid Regexp that matches99
- thanks to Markus Schirp for the report
- e.g.
- support for extpict unicode property, added in Ruby 2.6
- support for 10 unicode script/block properties added in Ruby 3.2
Regexp::Expression::Shared#ends_at
- e.g.
parse(/a +/x)[0].ends_at # => 3
- e.g.
parse(/a +/x)[0].ends_at(include_quantifier = false) # => 1
- e.g.
Regexp::Expression::Shared#{capturing?,comment?}
- previously only available on capturing and comment groups
Regexp::Expression::Shared#{decorative?}
- true for decorations: comment groups as well as comments and whitespace in x-mode
Regexp::Expression::Shared#parent
- new format argument
:original
forRegexp::Expression::Base#to_s
- includes decorative elements between node and its quantifier
- e.g.
parse(/a (?#comment) +/x)[0].to_s(:original) # => "a (?#comment) +"
- using it is not needed when calling
Root#to_s
as Root can't be quantified
- support calling
Subexpression#{each_expression,flat_map}
with a one-argument block- in this case, only the expressions are passed to the block, no indices
- support calling test methods at Expression class level
capturing?
,comment?
,decorative?
,referential?
,terminal?
- e.g.
Regexp::Expression::CharacterSet.terminal? # => false
Regexp::Expression::Shared#full_length
with whitespace before quantifier- e.g.
parse(/a +/x)[0].full_length
used to yield2
, now it yields3
- e.g.
Subexpression#to_s
output with children with whitespace before their quantifier- e.g.
parse(/a + /x).to_s
used to yield"a+ "
, now it yields"a + "
- calling
#to_s
on sub-nodes still omits such decorative interludes by default- use new
#to_s
format:original
to include it - e.g.
parse(/a + /x)[0].to_s(:original) # => "a +"
- use new
- e.g.
- fixed
Subexpression#te
behaving differently from other expressions- only
Subexpression#te
used to include the quantifier - now
#te
is the end index without quantifier, as for other expressions
- only
- fixed
NoMethodError
when calling#starts_at
or#ts
on empty sequences- e.g.
Regexp::Parser.parse(/|/)[0].starts_at
- e.g.
Regexp::Parser.parse(/[&&]/)[0][0].starts_at
- e.g.
- fixed nested comment groups breaking local x-options
- e.g. in
/(?x:(?#hello)) /
, the x-option wrongly applied to the whitespace
- e.g. in
- fixed nested comment groups breaking conditionals
- e.g. in
/(a)(?(1)b|c(?#hello)d)e/
, the 2nd conditional branch included "e"
- e.g. in
- fixed quantifiers after comment groups being mis-assigned to that group
- e.g. in
/a(?#foo){3}/
(matches 'aaa')
- e.g. in
- fixed Scanner accepting two cases of invalid Regexp syntax
- unmatched closing parentheses (
)
) and k-backrefs with number 0 (\k<0>
) - these are a
SyntaxError
in Ruby, so could only be passed as a String - they now raise a
Regexp::Scanner::ScannerError
- unmatched closing parentheses (
- fixed some scanner errors not inheriting from
Regexp::Scanner::ScannerError
- reduced verbosity of inspect / pretty print output
Regexp::Lexer.lex
now streams tokens when called with a block- it can now take arbitrarily large input, just like
Regexp::Scanner
- this also slightly improves
Regexp::Parser.parse
performance - note:
Regexp::Parser.parse
still does not and will not support streaming
- it can now take arbitrarily large input, just like
- improved performance of
Subexpression#each_expression
- minor improvements to
Regexp::Scanner
performance - overall improvement of parse performance: about 10% for large Regexps
- parsing of octal escape sequences in sets, e.g.
[\141]
- thanks to Randy Stauner for the report
- fixed
SystemStackError
when cloning recursive subexpression calls- e.g.
Regexp::Parser.parse(/a|b\g<0>/).dup
- e.g.
- fixed scanning of two negative lookbehind edge cases
(?<!x)y>
used to raise a ScannerError(?<!x>)y
used to be misinterpreted as a named group- thanks to Sergio Medina for the report
- fixed
#referenced_expression
for\g<0>
(wasnil
, is now theRoot
exp) - fixed
#reference
,#referenced_expression
for recursion level backrefs- e.g.
(a)(b)\k<-1+1>
#referenced_expression
wasnil
, now it is the correctGroup
exp
- e.g.
- detect and raise for two more syntax errors when parsing String input
- quantification of option switches (e.g.
(?i)+
) - invalid references (e.g.
/\k<1>/
) - these are a
SyntaxError
in Ruby, so could only be passed as a String
- quantification of option switches (e.g.
Regexp::Expression::Base#human_name
- returns a nice, human-readable description of the expression
Regexp::Expression::Base#optional?
- returns
true
if the expression is quantified accordingly (e.g. with*
,{,n}
)
- returns
- added a deprecation warning when calling
#to_re
on set members
Regexp::Expression::Base.construct
and.token_class
methods- see the wiki for details
- fixed interpretation of
+
and?
after interval quantifiers ({n,n}
)- they used to be treated as reluctant or possessive mode indicators
- however, Ruby does not support these modes for interval quantifiers
- they are now treated as chained quantifiers instead, as Ruby does it
- c.f. #3
- fixed
Expression::Base#nesting_level
for some tree rewrite cases- e.g. the alternatives in
/a|[b]/
had an inconsistent nesting_level
- e.g. the alternatives in
- fixed
Scanner
accepting invalid posix classes, e.g.[[:foo:]]
- they raise a
SyntaxError
when used in a Regexp, so could only be passed as String - they now raise a
Regexp::Scanner::ValidationError
in theScanner
- they raise a
-
added
Expression::Base#==
for (deep) comparison of expressions -
added
Expression::Base#parts
- returns the text elements and subexpressions of an expression
- e.g.
parse(/(a)/)[0].parts # => ["(", #<Literal @text="a"...>, ")"]
-
added
Expression::Base#te
(a.k.a. token end index)Expression::Subexpression
always had#te
, only terminal nodes lacked it so far
-
made some
Expression::Base
methods available onQuantifier
instances, too#type
,#type?
,#is?
,#one_of?
,#options
,#terminal?
#base_length
,#full_length
,#starts_at
,#te
,#ts
,#offset
#conditional_level
,#level
,#nesting_level
,#set_level
- this allows a more unified handling with
Expression::Base
instances
-
allowed
Quantifier#initialize
to take a token and options Hash like other nodes -
added a deprecation warning for initializing Quantifiers with 4+ arguments:
Calling
Expression::Base#quantify
orQuantifier.new
with 4+ arguments is deprecated.It will no longer be supported in regexp_parser v3.0.0.
Please pass a Regexp::Token instead, e.g. replace
token, text, min, max, mode
with::Regexp::Token.new(:quantifier, token, text)
. min, max, and mode will be derived automatically.Or do
exp.quantifier = Quantifier.construct(token: token, text: str)
.This is consistent with how Expression::Base instances are created.
- removed five inexistent unicode properties from
Syntax#features
- these were never supported by Ruby or the
Regexp::Scanner
- thanks to Markus Schirp for the report
- these were never supported by Ruby or the
- improved parsing performance through
Syntax
refactoring- instead of fresh
Syntax
instances, pre-loaded constants are now re-used - this approximately doubles the parsing speed for simple regexps
- instead of fresh
- added methods to
Syntax
classes to show relative feature sets- e.g.
Regexp::Syntax::V3_2_0.added_features
- e.g.
- support for new unicode properties of Ruby 3.2 / Unicode 14.0
- fixed Syntax version of absence groups (
(?~...)
)- the lexer accepted them for any Ruby version
- now they are only recognized for Ruby >= 2.4.1 in which they were introduced
- reduced gem size by excluding specs from package
- removed deprecated
test_files
gemspec setting - no longer depend on
yaml
/psych
(except for Ruby <= 2.4) - no longer depend on
set
set
was removed from the stdlib and made a standalone gem as of Ruby 3- this made it a hidden/undeclared dependency of
regexp_parser
- added support for 13 new unicode properties introduced in Ruby 3.1.0
- fixed
NameError
when requiring only'regexp_parser/scanner'
in v2.1.0- thanks to Jared White and Sam Ruby for the report
- common ancestor for all scanning/parsing/lexing errors
Regexp::Parser::Error
can now be rescued as a catch-all- the following errors (and their many descendants) now inherit from it:
Regexp::Expression::Conditional::TooManyBranches
Regexp::Parser::ParserError
Regexp::Scanner::ScannerError
Regexp::Scanner::ValidationError
Regexp::Syntax::SyntaxError
- it replaces
ArgumentError
in some rare cases (Regexp::Parser.parse('?')
) - thanks to sandstrom for the cue
- fixed scanning of whole-pattern recursion calls
\g<0>
and\g'0'
- a regression in v2.0.1 had caused them to be scanned as literals
- fixed scanning of some backreference and subexpression call edge cases
- e.g.
\k<+1>
,\g<x-1>
- e.g.
- fixed tokenization of some escapes in character sets
.
,|
,{
,}
,(
,)
,^
,$
,?
,+
,*
- all of these correctly emitted
#type
:literal
and#token
:literal
if not escaped - if escaped, they emitted e.g.
#type
:escape
and#token
:group_open
for[\(]
- the escaped versions now correctly emit
#type
:escape
and#token
:literal
- fixed handling of control/metacontrol escapes in character sets
- e.g.
[\cX]
,[\M-\C-X]
- they were misread as bunch of individual literals, escapes, and ranges
- e.g.
- fixed some cases where calling
#dup
/#clone
on expressions led to shared state
- fixed error when scanning some unlikely and redundant but valid charset patterns
- e.g.
/[[.a-b.]]/
,/[[=e=]]/
,
- e.g.
- fixed ancestry of some error classes related to syntax version lookup
NotImplementedError
,InvalidVersionNameError
,UnknownSyntaxNameError
- they now correctly inherit from
Regexp::Syntax::SyntaxError
instead of Rubys::SyntaxError
- fixed
FrozenError
when calling#to_s
on a frozenGroup::Passive
- thanks to Daniel Gollahon
- fixed error when scanning some group names
- this affected names containing hyphens, digits or multibyte chars, e.g.
/(?<a1>a)/
- thanks to Daniel Gollahon for the report
- this affected names containing hyphens, digits or multibyte chars, e.g.
- fixed error when scanning hex escapes with just one hex digit
- e.g.
/\x0A/
was scanned correctly, but the equivalent/\xA/
was not - thanks to Daniel Gollahon for the report
- e.g.
- some methods that used to return byte-based indices now return char-based indices
- the returned values have only changed for Regexps that contain multibyte chars
- this is only a breaking change if you used such methods directly AND relied on them pointing to bytes
- affected methods:
Regexp::Token
#length
,#offset
,#te
,#ts
Regexp::Expression::Base
#full_length
,#offset
,#starts_at
,#te
,#ts
- thanks to Akinori MUSHA for the report
- removed some deprecated methods/signatures
- these are rarely used and have been showing deprecation warnings for a long time
Regexp::Expression::Subexpression.new
with 3 argumentsRegexp::Expression::Root.new
without a token argumentRegexp::Expression.parsed
Regexp::Expression::Base#base_length
- returns the character count of an expression body, ignoring any quantifier
- pragmatic, experimental support for chained quantifiers
- e.g.:
/^a{10}{4,6}$/
matches exactly 40, 50 or 60a
s - successive quantifiers used to be silently dropped by the parser
- they are now wrapped with passive groups as if they were written
(?:a{10}){4,6}
- thanks to calfeld for reporting this a while back
- e.g.:
- incorrect encoding output for non-ascii comments
- this led to a crash when calling
#to_s
on parse results containing such comments - thanks to Michael Glass for the report
- this led to a crash when calling
- some crashes when scanning contrived patterns such as
'\😋'
- fix
FrozenError
inExpression::Base#repetitions
on Ruby 3.0- thanks to Thomas Walpole
- removed "unknown future version" warning on Ruby 3.0
- fixed scanning of comment-like text in normal mode
- this was an old bug, but had become more prevalent in v1.8.0
- thanks to Tietew for the report
- specified correct minimum Ruby version in gemspec
- it said 1.9 but really required 2.0 as of v1.8.0
- dropped support for running on Ruby 1.9.x
- regexp flags can now be passed when parsing a
String
as regexp body- see the README for details
- thanks to Owen Stephens
- bare occurrences of
\g
and\k
are now allowed and scanned as literal escapes- matches Onigmo behavior
- thanks for the report to Marc-André Lafortune
- fixed parsing comments without preceding space or trailing newline in x-mode
- thanks to Owen Stephens
- Support for literals that include the unescaped delimiters
{
,}
, and]
. These delimiters are informally supported by various regexp engines.
Expression::Base#each_expression
and#traverse
can now be called without a block- this returns an
Enumerator
and allows chaining, e.g.each_expression.select
- thanks to Masataka Kuwabara
- this returns an
MatchLength#each
no longer ignores the givenlimit:
when called without a block
- Added support for 16 new unicode properties introduced in Ruby 2.6.2 and 2.6.3
- Fixed
#options
(and thus#i?
,#u?
etc.) not being set for some expressions:- this affected posix classes as well as alternation, conditional, and intersection branches
#options
was already correct for all child expressions of such branches- this only made an operational difference for posix classes as they respect encoding flags
- Fixed
#options
not respecting all negative options in weird cases like '(?u-m-x)' - Fixed
Group#option_changes
not accounting for indirectly disabled (overridden) encoding flags - Fixed
Scanner
allowing negative encoding options if there were no positive options, e.g. '(?-u)' - Fixed
ScannerError
for some valid meta/control sequences such as '\C-\\' - Fixed
Expression::Base#match
and#=~
not working with a single argument
- Added
#referenced_expression
for backrefs, subexp calls and conditionals- returns the
Group
expression that is being referenced via name or number
- returns the
- Added
Expression::Base#repetitions
- returns a
Range
of allowed repetitions (1..1
if there is no quantifier) - like
#quantity
but with a more uniform interface
- returns a
- Added
Expression::Base#match_length
- allows to inspect and iterate over String lengths matched by the Expression
- Fixed
Expression::Base#clone
"direction"- it used to dup ivars onto the callee, leaving only the clone referencing the original objects
- this will affect you if you call
#eql?
/#equal?
on expressions or use them as Hash keys
- Fixed
#clone
results forSequences
, e.g. alternations and conditionals- the inner
#text
was cloned onto theSequence
and thus duplicated - e.g.
Regexp::Parser.parse(/(a|bc)/).clone.to_s # => (aa|bcbc)
- the inner
- Fixed inconsistent
#to_s
output forSequences
- it used to return only the "specific" text, e.g. "|" for an alternation
- now it includes nested expressions as it does for all other
Subexpressions
- Fixed quantification of codepoint lists with more than one entry (
\u{62 63 64}+
)- quantifiers apply only to the last entry, so this token is now split up if quantified
- Added support for 19 new unicode properties introduced in Ruby 2.6.0
Syntax#features
returns aHash
of all types and tokens supported by a givenSyntax
- Thanks to Akira Matsuda
- eliminated warning "assigned but unused variable - testEof"
Subexpression
(branch node) includesEnumerable
, allowing to#select
children etc.
- Fixed missing quantifier in
Conditional::Expression
methods#to_s
,#to_re
Conditional::Condition
no longer lives outside the recursive#expressions
tree- it used to be the only expression stored in a custom ivar, complicating traversal
- its setter and getter (
#condition=
,#condition
) still work as before
- Added
Quantifier
methods#greedy?
,#possessive?
,#reluctant?
/#lazy?
- Added
Group::Options#option_changes
- shows the options enabled or disabled by the given options group
- as with all other expressions,
#options
shows the overall active options
- Added
Conditional#reference
andCondition#reference
, indicating the determinative group - Added
Subexpression#dig
, acts likeArray#dig
- Fixed parsing of quantified conditional expressions (quantifiers were assigned to the wrong expression)
- Fixed scanning and parsing of forward-referring subexpression calls (e.g.
\g<+1>
) Root
andSequence
expressions now support the same constructor signature as all other expressions
This release includes several breaking changes, mostly to character sets, #map and properties.
- Changed handling of sets (a.k.a. character classes or "bracket expressions")
- see PR #55 / issue #47 for details
- sets are now parsed to expression trees like other nestable expressions
#scan
now emits the same tokens as outside sets (no longer:set, :member
)CharacterSet#members
has been removed- new
Range
andIntersection
classes represent corresponding syntax features - a new
PosixClass
expression class represents e.g.[[:ascii:]]
PosixClass
instances behave likeProperty
ones, e.g. support#negative?
#scan
emits:(non)posixclass, :<type>
instead of:set, :char_(non)<type>
- Changed
Subexpression#map
to act like regularEnumerable#map
- the old behavior is available as
Subexpression#flat_map
- e.g.
parse(/[a]/).map(&:to_s) == ["[a]"]
; used to be["[a]", "a"]
- the old behavior is available as
- Changed expression emissions for some escape sequences
EscapeSequence::Codepoint
,CodepointList
,Hex
andOctal
are now all used- they already existed, but were all parsed as
EscapeSequence::Literal
- e.g.
\x97
is nowEscapeSequence::Hex
instead ofEscapeSequence::Literal
- Changed naming of many property tokens (emitted for
\p{...}
)- if you work with these tokens, see PR #56 for details
- e.g.
:punct_dash
is now:dash_punctuation
- Changed
(?m)
and the likes to emit as:options_switch
token (@4ade4d1)- allows differentiating from group-local
:options
, e.g.(?m:.)
- allows differentiating from group-local
- Changed name of
Backreference::..NestLevel
to..RecursionLevel
(@4184339) - Changed
Backreference::Number#number
fromString
toInteger
(@40a2231)
- Added support for all previously missing properties (about 250)
- Added
Expression::UnicodeProperty#shortcut
(e.g. returns "m" for\p{mark}
) - Added
#char(s)
and#codepoint(s)
methods to allEscapeSequence
expressions - Added
#number
/#name
/#recursion_level
to all backref/call expressions (@174bf21) - Added
#number
and#number_at_level
to capturing group expressions (@40a2231)
- Fixed Ruby version mapping of some properties
- Fixed scanning of some property spellings, e.g. with dashes
- Fixed some incorrect property alias normalizations
- Fixed scanning of codepoint escapes with 6 digits (e.g.
\u{10FFFF}
) - Fixed scanning of
\R
and\X
within sets; they act as literals there
- Changed handling of Ruby versions (PR #53)
- New Ruby versions are now supported by default
- Some deep-lying APIs have changed, which should not affect most users:
Regexp::Syntax::VERSIONS
is gone- Syntax version names have changed from
Regexp::Syntax::Ruby::Vnnn
toRegexp::Syntax::Vn_n_n
- Syntax version classes for Ruby versions without regex feature changes are no longer predefined and are now only created on demand / lazily
Regexp::Syntax::supported?
returns true for any argument >= 1.8.6
- Fixed some use cases of Expression methods #strfregexp and #to_h (@e738107)
- Added full signature support to collection methods of Expressions (@aa7c55a)
- Added ruby version files for 2.2.10 and 2.3.7
- Added ruby version files for 2.4.4 and 2.5.1
- Fixed UnknownSyntaxNameError introduced in v0.4.10 if the gems parent dir tree included a 'ruby' dir
- Added ruby version file for 2.6.0
- Added support for Emoji properties (available in Ruby since 2.5.0)
- Added support for XPosixPunct and Regional_Indicator properties
- Fixed parsing of Unicode 6.0 and 7.0 script properties
- Fixed parsing of the special Assigned property
- Fixed scanning of InCyrillic_Supplement property
- Added ruby version file for 2.5.0
- Added ruby version files for 2.2.9, 2.3.6, and 2.4.3
- Fixed a thread safety issue (issue #45)
- Some public class methods that were only reliable for internal use are now private instance methods (PR #46)
- Improved the usefulness of Expression::Base#options (issue #43) - #options and derived methods such as #i?, #m? and #x? are now defined for all Expressions that are affected by such flags.
- Fixed scanning of whitespace following (?x) (commit 5c94bd2)
- Fixed a Parser bug where the #number attribute of traditional numerical backreferences was not set correctly (commit 851b620)
- Added Parser support for hex escapes in sets (PR #36)
- Added Parser support for octal escapes (PR #37)
- Added support for cluster types \R and \X (PR #38)
- Added support for more metacontrol notations (PR #39)
- Thanks to Janosch Müller](https://github.com/janosch-x):
- Support ruby 2.2.7 (PR #42)
- Added ruby version files for 2.2.8, 2.3.5, and 2.4.2
- Thanks to Janosch Müller](https://github.com/janosch-x):
- Add support for new absence operator (PR #33)
- Thanks to Bartek Bułat:
- Add support for Ruby 2.3.4 version (PR #40)
- Added ruby version file for 2.4.1
- Thanks to Janosch Müller](https://github.com/janosch-x):
- Support ruby 2.4 (PR #30)
- Improve codepoint handling (PR #27)
- Updated ruby version file for 2.3.3
- Added Syntax.supported? method
- Updated ruby versions for latest releases; 2.1.10, 2.2.6, and 2.3.2
- Thanks to John Backus:
- Remove warnings (PR #26)
- Thanks to John Backus:
- Fix parsing of /\xFF/n (hex:escape) (PR #24)
- Thanks to John Backus:
- Fix warnings (PR #19)
- Thanks to Dana Scheider:
- Correct error in README (PR #20)
- Fixed mistyped \h and \H character types (issue #21)
- Added ancestry syntax files for latest rubies (issue #22)
- Thanks to John Backus:
- Fixed scanning of zero length comments (PR #12)
- Fixed missing escape:codepoint_list syntax token (PR #14)
- Fixed to_s for modified interval quantifiers (PR #17)
- Updated ruby versions for latest releases; 2.1.8, 2.2.4, and 2.3.0
- Fixed class name for UnknownSyntaxNameError exception
- Added UnicodeBlocks support to the parser.
- Added UnicodeBlocks support to the scanner.
- Added expand_members method to CharacterSet, returns traditional or unicode property forms of shothands (\d, \W, \s, etc.)
- Improved meaning and output of %t and %T in strfregexp.
- Added syntax versions for ruby 2.1.4 and 2.1.5 and updated latest 2.1 version.
- Added to_h methods to Expression, Subexpression, and Quantifier.
- Added traversal methods; traverse, each_expression, and map.
- Added token/type test methods; type?, is?, and one_of?
- Added printing method strfregexp, inspired by strftime.
- Added scanning and parsing of free spacing (x mode) expressions.
- Improved handling of inline options (?mixdau:...)
- Added conditional expressions. Ruby 2.0.
- Added keep (\K) markers. Ruby 2.0.
- Added d, a, and u options. Ruby 2.0.
- Added missing meta sequences to the parser. They were supported by the scanner only.
- Renamed Lexer's method to lex, added an alias to the old name (scan)
- Use #map instead of #each to run the block in Lexer.lex.
- Replaced VERSION.yml file with a constant.
- Update tokens and scanner with new additions in Unicode 7.0.
- Fixed test and gem building rake tasks and extracted the gem specification from the Rakefile into a .gemspec file.
- Added syntax files for missing ruby 2.x versions. These do not add extra syntax support, they just make the gem work with the newer ruby versions.
- Fixed a parser bug where an alternation sequence that contained nested expressions was incorrectly being appended to the parent expression when the nesting was exited. e.g. in /a|(b)c/, c was appended to the root.
- Fixed a bug where character types were not being correctly scanned within character sets. e.g. in [\d], two tokens were scanned; one for the backslash '' and one for the 'd'
- Added syntax stubs for ruby versions 2.0 and 2.1
- Added clone methods for deep copying expressions.
- Added optional format argument for to_s on expressions to return the text of the expression with (:full, the default) or without (:base) its quantifier.
- Renamed the :beginning_of_line and :end_of_line tokens to :bol and :eol.
- Fixed a bug where alternations with more than two alternatives and one of them ending in a group were being incorrectly nested.
- Improved EOF handling in general and especially from sequences like hex and control escapes.
- Fixed a bug where named groups with an empty name would return a blank token [].
- Fixed a bug where member of a parent set where being added to its last subset.
- Fixed a few mutable string bugs by calling dup on the originals.
- Made ruby 1.8.6 the base for all 1.8 syntax, and the 1.8 name a pointer to the latest (1.8.7 at this time)
- Removed look-behind assertions (positive and negative) from 1.8 syntax
- Added control (\cc and \C-c) and meta (\M-c) escapes to 1.8 syntax
- The default syntax is now the one of the running ruby version in both the lexer and the parser.
- Initial release