-
Notifications
You must be signed in to change notification settings - Fork 30
Perl Incompatible Regular Expressions library
License
yandex/pire
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is PIRE, Perl Incompatible Regular Expressions library. This library is aimed at checking a huge amount of text against relatively many regular expressions. Roughly speaking, it can just check whether given text maches the certain regexp, but can do it really fast (more than 400 MB/s on our hardware is common). Even more, multiple regexps can be combined together, giving capability to check the text against apx.10 regexps in a single pass (and mantaining the same speed). Since Pire examines each character only once, without any lookaheads or rollbacks, spending about five machine instructions per each character, it can be used even in realtime tasks. On the other hand, Pire has very limited functionality (compared to other regexp libraries). Pire does not have any Perlish conditional regexps, lookaheads & backtrackings, greedy/nongreedy matches; neither has it any capturing facilities. Pire was developed in Yandex (http://company.yandex.ru/) as a part of its web crawler. More information can be found in README.ru (in Russian), which is yet to be translated. Please report bugs to dprokoptsev@yandex-team.ru or davenger@yandex-team.ru. Quick Start ============= #include <stdio.h> #include <vector> #include <pire/pire.h> Pire::NonrelocScanner CompileRegexp(const char* pattern) { // Transform the pattern from UTF-8 into UCS4 std::vector<Pire::wchar32> ucs4; Pire::Encodings::Utf8().FromLocal(pattern, pattern + strlen(pattern), std::back_inserter(ucs4)); return Pire::Lexer(ucs4.begin(), ucs4.end()) .AddFeature(Pire::Features::CaseInsensitive()) // enable case insensitivity .SetEncoding(Pire::Encodings::Utf8()) // set input text encoding .Parse() // create an FSM .Surround() // PCRE_ANCHORED behavior .Compile<Pire::NonrelocScanner>(); // compile the FSM } bool Matches(const Pire::NonrelocScanner& scanner, const char* ptr, size_t len) { return Pire::Runner(scanner) .Begin() // '^' .Run(ptr, len) // the text .End(); // '$' // implicitly cast to bool } int main() { char re[] = "hello\\s+w.+d$"; char str[] = "Hello world"; Pire::NonrelocScanner sc = CompileRegexp(re); bool res = Matches(sc, str, strlen(str)); printf("String \"%s\" %s \"%s\"\n", str, (res ? "matches" : "doesn't match"), re); return 0; }
About
Perl Incompatible Regular Expressions library
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published