Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental support for re2 #2138

Merged
merged 13 commits into from
Mar 28, 2023
Merged

Add experimental support for re2 #2138

merged 13 commits into from
Mar 28, 2023

Conversation

blotus
Copy link
Member

@blotus blotus commented Mar 23, 2023

This PR adds support (behind a feature flag) for using re2 instead of the default golang regexp package for GROK patterns and the RegexpInFile helper.

The feature flags are:

  • re2_grok_support
  • re2_regexp_in_file_support

Early benchmarks show:

  • a ~100% performance improvement when parsing a nginx log file with 200k in replay mode without the bad-user-agent scenario (default config, 1 parser routine / 1 bucket routine)
  • If using the bad UA scenario, and with re2 support for expr helpers, a ~40% improvement (default config, 1 parser routine / 1 bucket routine)
  • On a 16 core machine, with 8 parser routines and 8 bucket routine, without bad UA, the time to parse the file is the same but CPU usage is halved when using re2.

@github-actions
Copy link

@blotus: There are no 'kind' label on this PR. You need a 'kind' label to generate the release automatically.

  • /kind feature
  • /kind enhancement
  • /kind fix
  • /kind chore
  • /kind dependencies
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

@github-actions
Copy link

@blotus: There are no area labels on this PR. You can add as many areas as you see fit.

  • /area agent
  • /area local-api
  • /area cscli
  • /area security
  • /area configuration
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

@blotus
Copy link
Member Author

blotus commented Mar 23, 2023

/kind feature

@blotus blotus marked this pull request as draft March 23, 2023 11:07
@codecov-commenter
Copy link

codecov-commenter commented Mar 23, 2023

Codecov Report

Merging #2138 (9a324f5) into master (3884c5f) will increase coverage by 2.75%.
The diff coverage is 42.85%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master    #2138      +/-   ##
==========================================
+ Coverage   52.90%   55.65%   +2.75%     
==========================================
  Files         177      124      -53     
  Lines       24860    16248    -8612     
==========================================
- Hits        13151     9043    -4108     
+ Misses      10234     6289    -3945     
+ Partials     1475      916     -559     
Flag Coverage Δ
func-crowdsec ?
func-cscli ?
unit-linux 55.65% <42.85%> (-0.04%) ⬇️
unit-windows ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/fflag/crowdsec.go 0.00% <0.00%> (ø)
pkg/exprhelpers/exprlib.go 69.95% <44.82%> (-2.11%) ⬇️
pkg/acquisition/acquisition.go 78.44% <100.00%> (+0.26%) ⬆️
pkg/parser/node.go 66.73% <100.00%> (ø)
pkg/parser/unix_parser.go 10.00% <100.00%> (+1.01%) ⬆️

... and 54 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@buixor
Copy link
Contributor

buixor commented Mar 23, 2023

/area agent

@blotus blotus marked this pull request as ready for review March 23, 2023 14:32
@buixor
Copy link
Contributor

buixor commented Mar 23, 2023

Oneshot speed

(note: all tests are in one-shot mode, so account far agent startup and shutdown)

  1. MASTER nginx collection without http-bad-user-agent + 1 parser/bucket
  • nginx-10k: 23,98s user 0,57s system 185% cpu 13,233 total
  • nginx-20k: 41,89s user 0,68s system 214% cpu 19,839 total
  • nginx-50k: 101,25s user 1,34s system 236% cpu 43,406 total
  1. RE2-FFLAG nginx collection without http-bad-user-agent + 1 parser/bucket
  • nginx-10k: 20,40s user 0,54s system 155% cpu 13,453 total
  • nginx-20k: 36,04s user 0,77s system 182% cpu 20,133 total
  • nginx-50k: 74,51s user 1,25s system 232% cpu 32,546 total

conclusion: RE2 is significantly faster, even when accounting the slower startup. re2 startup takes >5s, with >4s used during the compilation of default grok patterns, while master takes ~1s to start.

Memory usage at rest

note: service started w/ the nginx collection without http-bad-user-agent

  1. Master
process_resident_memory_bytes 4.6628864e+07 -> 46M
process_virtual_memory_bytes 2.308620288e+09
process_virtual_memory_max_bytes 1.8446744073709552e+19
  1. re2-fflag
process_resident_memory_bytes 1.27930368e+08 -> 120M
process_virtual_memory_bytes 2.525220864e+0
process_virtual_memory_max_bytes 1.8446744073709552e+19

conclusion: re2 uses twice as much memory at rest. it is specified in re2 doc that compiled expressions are significantly more expensive memory-wise.

@buixor
Copy link
Contributor

buixor commented Mar 27, 2023

image

@buixor
Copy link
Contributor

buixor commented Mar 28, 2023

Let's merge w/ WASM for now, and switch to CGO once we've updated package build pipelines,

Copy link
Contributor

@buixor buixor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants