-
Notifications
You must be signed in to change notification settings - Fork 152
About the Regular Expression Library
Grok heavily uses regular expressions in its pattern definitions. Go's built-in regexp package implements Google's RE2 syntax, which is a stripped-down regular expression language.
While RE2 provides some performance guarantees, like a single scan over the input and O(n) execution time with respect to the length of the input, it does only support features that can be modelled as finite state machines (FSM).
In particular, RE2 does not support backtracking and lookahead assertions, as these cannot be implemented within RE2's performance restrictions.
Grok uses these features a lot, so implementing Grok on top of Go's default regexp package is not possible. However, there are a few 3rd party regular expression libraries for Go that do not have these limitations:
- regexp2 is a port of dotNET's regular expression engine. It is written in pure Go.
-
pcre is a Go wrapper around the Perl Compatible Regular Expression (PCRE) library libpcre (needs
brew install pcre
orsudo apt-get install libpcre++-dev
) -
rubex is a Go wrapper around the Oniguruma regular expression library (needs
brew install oniguruma
orsudo apt-get install libonig-dev
).
As Grok is originally written in Ruby, and Ruby uses Oniguruma as its regular expression library, we decided to use rubex for best compatibility.