An efficient non backtracking regular expression implementation in scala.
This is based on the pikevm as described by Russ Cox
Yes, I am aware Java has support for regular expression since version 1.4. And yes it is integrated in the Scala standard library.
But This project is born due to some stack overflows we encountered with alternatives on big inputs. This is a consequence of how regular expression engine is implemented. Actually, Java regular expressions are too expressive and allow for back references. For our purpose, we never needed it, and if this feature is not present, there exist quite efficient ways to implement regular expressions with no backtracking and no stack use (with tail-call recursions).
This is how this project was born.
Moreover, it is really fun to implement a regular expression engine anyway, so even if nobody else uses this library, it was really enjoyable to do.
Following regular expressions are supported:
.
any character[xyz]
character class[^xyz]
negated character class\d
a digit character (equivalent to[0-9]
)\D
a non digit character (equivalent to[^0-9]
)\w
an alphanumeric character (equivalent to[A-Za-z0-9_]
)\W
a non alphanumeric character (equivalent to[^A-Za-z0-9_]
)\s
a space character (equivalent to[ \t\r\n\f]
)\S
a non space character (equivalent to[^ \t\r\n\f]
)xy
x
followed byy
x|y
x
ory
(preferx
)x*
zero or morex
(prefer more)x+
one or morex
(prefer more)x?
zero or onex
(prefer one)x*?
zero or morex
(prefer zero)x+?
one or morex
(prefer one)x??
zero or onex
(prefer zero)(re)
numbered capturing group (starting at 1)
There is also a DSL inspired by Re, to be found in package gnieh.regex.dsl
.
To build a regular expression from a string, you just need to write this:
import gnieh.regex._
val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".re
val date(year, month, day) = "2014-04-23"
See the documentation of this library for more details.