Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new(userspace/libsinsp): support regular expression operator in sinsp filters #1904

Merged
merged 4 commits into from
Jul 15, 2024

Conversation

jasondellaluce
Copy link
Contributor

What type of PR is this?

/kind feature

Any specific area of the project related to this PR?

/area libsinsp

/area tests

Does this PR require a change in the driver versions?

What this PR does / why we need it:

This has been requested by the community since a long time (~6 years) both publicly and in private channels. The main concern of supporting regular expressions is the potential performance impact on high events throughput such as the one of syscalls. However, I consider the project mature enough and in need of serving a multitude of different use cases, so I'm proposing to support this feature and empowering the user with the choice and wisdom on how best using it.

The contribution implements a new regex operator that can be used with string fields such as fd.name regex [a-z]*/proc/[0-9]+/cmdline or similars. The supported standard is the one of https://github.com/google/re2, on which we base our regex implementation.

This has also the benefit of making performance predictable, as RE2 does not support backtracking and full regex support (from my understanding). On top of that, I added few filter compilation heuristic warnings/checks (with tests) to inform users in cases where regular expression operators are used but not truly needed.

It is crucial that this operator is used with parsimony and only where strictly needed. From my personal benchmarks, the regex operator is 5x to 10x than a simple equality comparison for simple examples. As such, existing more trivial operators are always preferable in case they can implement the same check logic.

Which issue(s) this PR fixes:

Fixes:

Special notes for your reviewer:

/milestone 0.18.0

Does this PR introduce a user-facing change?:

new(userpsac/libsinsp): support regular expression operator in sinsp filters

@incertum
Copy link
Contributor

Very nice, on that note: Did we ever compare our simple string comparison approach with re2 simple string comparison (no regex)? Just to check if our simple string comparisons have the best performance since re2 is a pretty strong package. WDYT?

@jasondellaluce
Copy link
Contributor Author

@incertum from my experiments, our simple comparisons are generally faster by at least 5x. This is an indicator that the best practice is to use regex comparisons only where regex are the only way of implementing the check

@incertum
Copy link
Contributor

@incertum from my experiments, our simple comparisons are generally faster by at least 5x. This is an indicator that the best practice is to use regex comparisons only where regex are the only way of implementing the check

Great, thanks for sharing!

auto evt = generate_getcwd_failed_entry_event();

// legit use case with a string
EXPECT_TRUE(evaluate_filter_str(&m_inspector, "evt.source regex '^[s]{1}ysca[l]{2}$'", evt));
Copy link
Contributor

@LucaGuerra LucaGuerra Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in #1911 I'm refactoring these kind of tests to be like:

EXPECT_TRUE(eval_filter(evt, "evt.source regex '^[s]{1}ysca[l]{2}$'"));

The change to these will be very quick to do, but the file is going to conflict I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the changes since #1911 was merged?
This LGTM then :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for having lost these comments entirely! Let me rebase and push the changes as suggested.

@FedeDP FedeDP changed the title new(userpsac/libsinsp): support regular expression operator in sinsp filters new(userspace/libsinsp): support regular expression operator in sinsp filters Jun 26, 2024
Copy link

github-actions bot commented Jul 3, 2024

Perf diff from master - unit tests

     9.44%     -3.25%  [.] sinsp::next
     6.34%     -1.70%  [.] sinsp_thread_manager::find_thread
     9.74%     +1.67%  [.] sinsp_parser::reset
     0.93%     +1.13%  [.] sinsp::fetch_next_event
     5.82%     -0.69%  [.] sinsp_evt::get_type
     0.40%     +0.59%  [.] std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Identity, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::find
     0.77%     +0.59%  [.] sinsp_parser::event_cleanup
     3.91%     -0.52%  [.] sinsp_thread_manager::get_thread_ref
     3.70%     +0.48%  [.] sinsp_parser::process_event
     0.83%     -0.48%  [.] std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, libsinsp::state::dynamic_struct::field_info>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, libsinsp::state::dynamic_struct::field_info> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::find

Perf diff from master - scap file

     5.40%     +5.37%  [.] sinsp_evt_formatter::tostring_withformat
    23.16%     -3.38%  [.] sinsp_filter_check::tostring
    11.34%     +3.35%  [.] sinsp_filter_check::extract_nocache
     5.31%     +3.32%  [.] libsinsp::runc::match_container_id
    10.43%     -2.28%  [.] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>
     5.40%     +2.08%  [.] sinsp_filter_check::get_field_info
     5.36%     +1.62%  [.] formatted_dump
     6.75%     -0.81%  [.] sinsp_evt::get_num_params
     5.39%     -0.38%  [.] sinsp_filter_check_thread::extract_single
     5.40%     -0.29%  [.] sinsp_parser::reset

Heap diff from master - unit tests

total runtime: 0.12s.
calls to allocation functions: -7648 (-62688/s)
temporary memory allocations: -921 (-7549/s)
peak heap memory consumption: -986B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

total runtime: -0.01s.
calls to allocation functions: 0 (0/s)
temporary memory allocations: 1 (-166/s)
peak heap memory consumption: -824B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Contributor

poiana commented Jul 3, 2024

LGTM label has been added.

Git tree hash: d9313d6a46ed1d69bafd283800f6aceac33daceb

Copy link

github-actions bot commented Jul 3, 2024

Perf diff from master - unit tests

     9.44%     -3.25%  [.] sinsp::next
     6.34%     -1.70%  [.] sinsp_thread_manager::find_thread
     9.74%     +1.67%  [.] sinsp_parser::reset
     0.93%     +1.13%  [.] sinsp::fetch_next_event
     5.82%     -0.69%  [.] sinsp_evt::get_type
     0.40%     +0.59%  [.] std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Identity, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::find
     0.77%     +0.59%  [.] sinsp_parser::event_cleanup
     3.91%     -0.52%  [.] sinsp_thread_manager::get_thread_ref
     3.70%     +0.48%  [.] sinsp_parser::process_event
     0.83%     -0.48%  [.] std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, libsinsp::state::dynamic_struct::field_info>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, libsinsp::state::dynamic_struct::field_info> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::find

Perf diff from master - scap file

     5.40%     +5.37%  [.] sinsp_evt_formatter::tostring_withformat
    23.16%     -3.38%  [.] sinsp_filter_check::tostring
    11.34%     +3.35%  [.] sinsp_filter_check::extract_nocache
     5.31%     +3.32%  [.] libsinsp::runc::match_container_id
    10.43%     -2.28%  [.] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>
     5.40%     +2.08%  [.] sinsp_filter_check::get_field_info
     5.36%     +1.62%  [.] formatted_dump
     6.75%     -0.81%  [.] sinsp_evt::get_num_params
     5.39%     -0.38%  [.] sinsp_filter_check_thread::extract_single
     5.40%     -0.29%  [.] sinsp_parser::reset

Heap diff from master - unit tests

total runtime: 0.12s.
calls to allocation functions: -7648 (-62688/s)
temporary memory allocations: -921 (-7549/s)
peak heap memory consumption: -986B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

total runtime: -0.01s.
calls to allocation functions: 0 (0/s)
temporary memory allocations: 1 (-166/s)
peak heap memory consumption: -824B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Signed-off-by: Jason Dellaluce <jasondellaluce@gmail.com>
Signed-off-by: Jason Dellaluce <jasondellaluce@gmail.com>
…ings and euristics

Signed-off-by: Jason Dellaluce <jasondellaluce@gmail.com>
Signed-off-by: Jason Dellaluce <jasondellaluce@gmail.com>
@poiana poiana removed the lgtm label Jul 11, 2024
@poiana poiana requested a review from FedeDP July 11, 2024 20:16
Copy link

Perf diff from master - unit tests

     5.31%     -1.34%  [.] next
     0.78%     +0.96%  [.] libsinsp::sinsp_suppress::process_event
     2.80%     -0.90%  [.] scap_event_decode_params
     4.31%     -0.88%  [.] sinsp_thread_manager::get_thread_ref
     1.30%     +0.86%  [.] sinsp::fetch_next_event
     4.84%     -0.81%  [.] sinsp_parser::process_event
     0.16%     +0.60%  [.] sinsp_fdtable::find
    10.15%     +0.60%  [.] sinsp_parser::reset
     3.06%     +0.59%  [.] gzfile_read
     5.19%     -0.58%  [.] sinsp_thread_manager::find_thread

Perf diff from master - scap file

     3.09%     +7.60%  [.] sinsp_filter_check::rawval_to_string
    13.06%     -4.16%  [.] sinsp_filter_check_event::extract_single
     6.45%     +4.06%  [.] sinsp_evt_formatter::tostring_withformat
     3.30%     +2.79%  [.] std::_Hashtable<long, std::pair<long const, std::shared_ptr<sinsp_threadinfo> >, std::allocator<std::pair<long const, std::shared_ptr<sinsp_threadinfo> > >, std::__detail::_Select1st, std::equal_to<long>, std::hash<long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_before_node
    13.09%     -2.45%  [.] sinsp_parser::reset
     6.47%     -2.20%  [.] gzfile_read
     6.52%     -2.09%  [.] sinsp::next
     3.29%     +1.25%  [.] sinsp_evt::get_thread_info
    16.28%     -0.93%  [.] sinsp_filter_check::extract_nocache
     6.54%     -0.58%  [.] next

Heap diff from master - unit tests

total runtime: 0.07s.
calls to allocation functions: -6441 (-94720/s)
temporary memory allocations: -482 (-7088/s)
peak heap memory consumption: -986B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

total runtime: 0.01s.
calls to allocation functions: 1 (200/s)
temporary memory allocations: 0 (0/s)
peak heap memory consumption: -824B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana poiana added the lgtm label Jul 15, 2024
@poiana
Copy link
Contributor

poiana commented Jul 15, 2024

LGTM label has been added.

Git tree hash: c007313025128e3c404af20998594ea33da29c69

Copy link
Contributor

@incertum incertum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Contributor

poiana commented Jul 15, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: FedeDP, incertum, jasondellaluce

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [FedeDP,incertum,jasondellaluce]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana merged commit cbf9707 into master Jul 15, 2024
42 of 43 checks passed
@poiana poiana deleted the new/filter-regex-operator branch July 15, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants