-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi string search #624
Merged
Merged
Multi string search #624
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Breaking whitespace diffs into standalone commit.
Currently, a filtering expression with "x in (a, b c, ...)" is transformed into "x=a or x=b or x=c or ...". This can be slow for very long sets of a, b, c, especially when x has to be extracted from the event over and over. Replace this with a set membership test using unordered_set for PT_CHARBUF types. In filter.cpp, for CO_IN operators check the type of the operand. If it's PT_CHARBUF, loop over the items in the (a, b, c) set and call add_filter_value() for each. Otherwise, the current behavior is kept. sinsp_filter_check::add_filter_value now saves pointers to the filter values in the unordered_set m_val_storages_members, containing pairs of (pointer, length). Note that these are only pointers--the actual values are still held in m_val_storages. In sinsp_filter_check::flt_compare, for CO_IN operators simply check the unordered_set and if a match is found return true. This used to be dead code given how all CO_INs were replaced with a sequence of x=a or x=b or ..., but is used again. Custom functors g_hash_membuf/g_equal_to_membuf hash and compare the pointer-to-datas. When compiling with gcc, this simply uses gnu's built-in hash function for a pointer and length, which is quite fast. Otherwise, a standalone function is used.
Take advantage of string length when doing set membership tests: - When comparing strings in the hash equality function, only bother doing the buffer comparison when the string lengths match. - It can be inefficient to hash a very long string when all members of the set are short strings. To make this case faster, keep track of the minumum and maximum string length across the set members and only bother doing the set comparison of the lengths overlap. To effectively use this, the length needs to be filled in when sinsp_filter_check::flt_compare() is called, so do a pass over the existing filterchecks and actually return the actual string length when it's easily known. In flt_compare(), if the type is a string and the provided length is 0, do a strlen() to find it. This should be very rare now that the length is properly passed back.
mstemm
added a commit
to falcosecurity/falco
that referenced
this pull request
Jul 11, 2016
Make changes to rules to improve performance and reduce FPs: - Rely on draios/sysdig#610 that allows specifying an open/openat for reading/writing without having to search through all the flags individually. - For a two-item list (open, openat), and thinking ahead to draios/sysdig#624, check the event type individually instead of as a set membership test, which is a bit faster. - Switch to consistently using evt.type instead of syscall.type. - Move positive tests like etc_dir, bin_dir, sensitive_files, proc.sname, etc., which are most likely to not succeed, to the beginning of rules, so they have a greater chance to cause the rest of the rule to be skipped, which saves time. - Using exim as a mail program--exim also can suid to root. - add a new macro for ssl management binaries and allow them to write below /etc and read sensitive files. - add a new macro for dhcp client binaries and allow them to write below /etc. - Add exe (docker-related program) as a program that can set a namespace using setns. - Don't count /dev/tty as an important file under /dev.
mstemm
added a commit
to falcosecurity/falco
that referenced
this pull request
Jul 11, 2016
Make changes to rules to improve performance and reduce FPs: - Rely on draios/sysdig#610 that allows specifying an open/openat for reading/writing without having to search through all the flags individually. - For a two-item list (open, openat), and thinking ahead to draios/sysdig#624, check the event type individually instead of as a set membership test, which is a bit faster. - Switch to consistently using evt.type instead of syscall.type. - Move positive tests like etc_dir, bin_dir, sensitive_files, proc.sname, etc., which are most likely to not succeed, to the beginning of rules, so they have a greater chance to cause the rest of the rule to be skipped, which saves time. - Using exim as a mail program--exim also can suid to root. - add a new macro for ssl management binaries and allow them to write below /etc and read sensitive files. - add a new macro for dhcp client binaries and allow them to write below /etc. - Add exe (docker-related program) as a program that can set a namespace using setns. - Don't count /dev/tty as an important file under /dev.
mstemm
added a commit
to falcosecurity/falco
that referenced
this pull request
Jul 12, 2016
Make changes to rules to improve performance and reduce FPs: - Rely on draios/sysdig#610 that allows specifying an open/openat for reading/writing without having to search through all the flags individually. - For a two-item list (open, openat), and thinking ahead to draios/sysdig#624, check the event type individually instead of as a set membership test, which is a bit faster. - Switch to consistently using evt.type instead of syscall.type. - Move positive tests like etc_dir, bin_dir, sensitive_files, proc.sname, etc., which are most likely to not succeed, to the beginning of rules, so they have a greater chance to cause the rest of the rule to be skipped, which saves time. - Using exim as a mail program--exim also can suid to root. - add a new macro for ssl management binaries and allow them to write below /etc and read sensitive files. - add a new macro for dhcp client binaries and allow them to write below /etc. - Add exe (docker-related program) as a program that can set a namespace using setns. - Don't count /dev/tty as an important file under /dev.
mstemm
added a commit
to falcosecurity/falco
that referenced
this pull request
Jul 12, 2016
Make changes to rules to improve performance and reduce FPs: - Rely on draios/sysdig#610 that allows specifying an open/openat for reading/writing without having to search through all the flags individually. - For a two-item list (open, openat), and thinking ahead to draios/sysdig#624, check the event type individually instead of as a set membership test, which is a bit faster. - Switch to consistently using evt.type instead of syscall.type. - Move positive tests like etc_dir, bin_dir, sensitive_files, proc.sname, etc., which are most likely to not succeed, to the beginning of rules, so they have a greater chance to cause the rest of the rule to be skipped, which saves time. - Using exim as a mail program--exim also can suid to root. - add a new macro for ssl management binaries and allow them to write below /etc and read sensitive files. - add a new macro for dhcp client binaries and allow them to write below /etc. - Add exe (docker-related program) as a program that can set a namespace using setns. - Don't count /dev/tty as an important file under /dev.
gianlucaborello
pushed a commit
that referenced
this pull request
Aug 16, 2016
* Whitespace diffs. Breaking whitespace diffs into standalone commit. * Implement "in (...)" as hash table lookups. Currently, a filtering expression with "x in (a, b c, ...)" is transformed into "x=a or x=b or x=c or ...". This can be slow for very long sets of a, b, c, especially when x has to be extracted from the event over and over. Replace this with a set membership test using unordered_set for PT_CHARBUF types. In filter.cpp, for CO_IN operators check the type of the operand. If it's PT_CHARBUF, loop over the items in the (a, b, c) set and call add_filter_value() for each. Otherwise, the current behavior is kept. sinsp_filter_check::add_filter_value now saves pointers to the filter values in the unordered_set m_val_storages_members, containing pairs of (pointer, length). Note that these are only pointers--the actual values are still held in m_val_storages. In sinsp_filter_check::flt_compare, for CO_IN operators simply check the unordered_set and if a match is found return true. This used to be dead code given how all CO_INs were replaced with a sequence of x=a or x=b or ..., but is used again. Custom functors g_hash_membuf/g_equal_to_membuf hash and compare the pointer-to-datas. When compiling with gcc, this simply uses gnu's built-in hash function for a pointer and length, which is quite fast. Otherwise, a standalone function is used. * Take advantage of string length. Take advantage of string length when doing set membership tests: - When comparing strings in the hash equality function, only bother doing the buffer comparison when the string lengths match. - It can be inefficient to hash a very long string when all members of the set are short strings. To make this case faster, keep track of the minumum and maximum string length across the set members and only bother doing the set comparison of the lengths overlap. To effectively use this, the length needs to be filled in when sinsp_filter_check::flt_compare() is called, so do a pass over the existing filterchecks and actually return the actual string length when it's easily known. In flt_compare(), if the type is a string and the provided length is 0, do a strlen() to find it. This should be very rare now that the length is properly passed back.
dmyerscough
pushed a commit
to dmyerscough/sysdig
that referenced
this pull request
Mar 3, 2017
* Whitespace diffs. Breaking whitespace diffs into standalone commit. * Implement "in (...)" as hash table lookups. Currently, a filtering expression with "x in (a, b c, ...)" is transformed into "x=a or x=b or x=c or ...". This can be slow for very long sets of a, b, c, especially when x has to be extracted from the event over and over. Replace this with a set membership test using unordered_set for PT_CHARBUF types. In filter.cpp, for CO_IN operators check the type of the operand. If it's PT_CHARBUF, loop over the items in the (a, b, c) set and call add_filter_value() for each. Otherwise, the current behavior is kept. sinsp_filter_check::add_filter_value now saves pointers to the filter values in the unordered_set m_val_storages_members, containing pairs of (pointer, length). Note that these are only pointers--the actual values are still held in m_val_storages. In sinsp_filter_check::flt_compare, for CO_IN operators simply check the unordered_set and if a match is found return true. This used to be dead code given how all CO_INs were replaced with a sequence of x=a or x=b or ..., but is used again. Custom functors g_hash_membuf/g_equal_to_membuf hash and compare the pointer-to-datas. When compiling with gcc, this simply uses gnu's built-in hash function for a pointer and length, which is quite fast. Otherwise, a standalone function is used. * Take advantage of string length. Take advantage of string length when doing set membership tests: - When comparing strings in the hash equality function, only bother doing the buffer comparison when the string lengths match. - It can be inefficient to hash a very long string when all members of the set are short strings. To make this case faster, keep track of the minumum and maximum string length across the set members and only bother doing the set comparison of the lengths overlap. To effectively use this, the length needs to be filled in when sinsp_filter_check::flt_compare() is called, so do a pass over the existing filterchecks and actually return the actual string length when it's easily known. In flt_compare(), if the type is a string and the provided length is 0, do a strlen() to find it. This should be very rare now that the length is properly passed back.
dmyerscough
pushed a commit
to dmyerscough/sysdig
that referenced
this pull request
Mar 3, 2017
While merging together draios#624 (multi string search) and draios#625 (don't sanitize strings), the length was only being set when string sanitization was enabled. Additionally, the length was not being initialized to 0, meaning that it was not defined. This led to some FPs for some of the falco tests.
mstemm
added a commit
that referenced
this pull request
Dec 2, 2017
In a prior PR (#624) I did a pass ensuring that the length is properly set for string-based extractions. I want to complete this work by ensuring it works for all fields extracted by a filtercheck. This way you can use the hashing/equality operators in filter_value.h to group collections by filtercheck value.
mstemm
added a commit
that referenced
this pull request
Dec 4, 2017
In a prior PR (#624) I did a pass ensuring that the length is properly set for string-based extractions. I want to complete this work by ensuring it works for all fields extracted by a filtercheck. This way you can use the hashing/equality operators in filter_value.h to group collections by filtercheck value.
leogr
pushed a commit
to falcosecurity/rules
that referenced
this pull request
Dec 21, 2022
Make changes to rules to improve performance and reduce FPs: - Rely on draios/sysdig#610 that allows specifying an open/openat for reading/writing without having to search through all the flags individually. - For a two-item list (open, openat), and thinking ahead to draios/sysdig#624, check the event type individually instead of as a set membership test, which is a bit faster. - Switch to consistently using evt.type instead of syscall.type. - Move positive tests like etc_dir, bin_dir, sensitive_files, proc.sname, etc., which are most likely to not succeed, to the beginning of rules, so they have a greater chance to cause the rest of the rule to be skipped, which saves time. - Using exim as a mail program--exim also can suid to root. - add a new macro for ssl management binaries and allow them to write below /etc and read sensitive files. - add a new macro for dhcp client binaries and allow them to write below /etc. - Add exe (docker-related program) as a program that can set a namespace using setns. - Don't count /dev/tty as an important file under /dev.
leogr
pushed a commit
to falcosecurity/rules
that referenced
this pull request
Dec 21, 2022
Make changes to rules to improve performance and reduce FPs: - Rely on draios/sysdig#610 that allows specifying an open/openat for reading/writing without having to search through all the flags individually. - For a two-item list (open, openat), and thinking ahead to draios/sysdig#624, check the event type individually instead of as a set membership test, which is a bit faster. - Switch to consistently using evt.type instead of syscall.type. - Move positive tests like etc_dir, bin_dir, sensitive_files, proc.sname, etc., which are most likely to not succeed, to the beginning of rules, so they have a greater chance to cause the rest of the rule to be skipped, which saves time. - Using exim as a mail program--exim also can suid to root. - add a new macro for ssl management binaries and allow them to write below /etc and read sensitive files. - add a new macro for dhcp client binaries and allow them to write below /etc. - Add exe (docker-related program) as a program that can set a namespace using setns. - Don't count /dev/tty as an important file under /dev.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implement x in (a, b, c, ...) using a hash table instead of x=a or x=b or x=c or ..., which significantly speeds up falco when doing tests against large sets of binaries.
See commits for specific changes.
@luca3m @ldegio @gianlucaborello wanna double-check these changes?