Implement rule sets. #1597

wxsBSD · 2021-12-02T02:27:38Z

This adds the ability to use rule sets just like we do with string sets.
Specifically, it enables conditions like the following:

5 of (a, b)
50% of (a*)

where "a" is an existing rule and "b*" is a wildcard rule identifier, which is
expanded out precisely as wildcard string identifiers are.

I chose to make OP_OF and OP_OF_PERCENT emit an argument which is the "type" of
set it is expected to match on. The two acceptable values are OF_STRING_SET and
OF_RULE_SET. When OF_STRING_SET is used then the references on the stack are
expected to be strings and OF_RULE_SET will check if the rules on the stack
match.

I added a yr_parser_emit_pushes_for_rules() which behaves similarly to the
string counterpart. The one big exception is that I can't use yr_rules_foreach
macro since that needs the terminating "null" rule which is not yet added, so
instead I have to iterate the rules manually using yr_arena_get_ptr() to walk
all the currently compiled rules.

This adds the ability to use rule sets just like we do with string sets. Specifically, it enables conditions like the following: 5 of (a, b) 50% of (a*) where "a" is an existing rule and "b*" is a wildcard rule identifier, which is expanded out precisely as wildcard string identifiers are. I chose to make OP_OF and OP_OF_PERCENT emit an argument which is the "type" of set it is expected to match on. The two acceptable values are OF_STRING_SET and OF_RULE_SET. When OF_STRING_SET is used then the references on the stack are expected to be strings and OF_RULE_SET will check if the rules on the stack match. I added a yr_parser_emit_pushes_for_rules() which behaves similarly to the string counterpart. The one big exception is that I can't use yr_rules_foreach macro since that needs the terminating "null" rule which is not yet added, so instead I have to iterate the rules manually using yr_arena_get_ptr() to walk all the currently compiled rules.

virusdefender · 2021-12-02T04:02:33Z

a simple way to use this feature without this pr is using math.to_number(private rule name)

condition:
    # at lease two rules
    math.to_number(rule1) + math.to_number(rule2) + math.to_number(rule3) >= 2

condition:
    # add weight / score to rule
    math.to_number(rule1) * 20 + math.to_number(rule2) * 30 + math.to_number(rule3) * 30 >= 60

the problem is all rules must be matched at first, yara can not optimize this expression to stop earlier (for example, the first demo, if rule1 and rule2 can be matched, it's not necessary to run rule3）

wxsBSD · 2021-12-02T11:49:38Z

You are correct that this would work but there are two reasons this PR is a good idea. First, "2 of (rule1, rule2, rule3)" is a much more concise expression. Second, this supports wildcards so you can say "2 of (foo*)" and never have to update your condition if you add another rule.

Your point about weights is valid, and if you want to weight one rule more than another you need to use your approach.

plusvic · 2021-12-03T10:07:15Z

This is really nice, but I have a concern that I'm not sure how to solve. When you have a condition like 50% of (a*), this will take into account rules starting with a and defined before the rule that contain the condition. Any rule starting with a but defined after that rule won't be taken into account. This same limitation exists today when you want to use some rule foo as part of the condition for another rule bar (foo must be defined before bar) but you can detect errors in compile time and notify the user. However, with the introduction of wildcards subtle errors can go unnoticed.

You are currently handling the case in which a rule with condition 50% of (a*) doesn't find any matching rule name, that's a good start, but still someone can add some other matching rule at the end of the source file and expect that this rule is also taken into account.

I guess the only solution for that is making the order in which rules are defined irrelevant, but that probably means more important changes, and introduces new situations that must be detected, like cyclical dependencies among rules.

I'm not sure how to tackle this.

wxsBSD · 2021-12-03T12:46:00Z

That's a really good point I had not thought of. Let me think about this a bit and maybe I can come up with a solution.

wxsBSD · 2021-12-03T13:49:32Z

You're right that doing anything at runtime is going to open up new situations to deal with. Instead, I think this can be turned into a compiler error. The rough plan I am going to try is:

Anytime the compiler is dealing with an identifier with a wildcard it will remember the identifier.
Anytime the compiler is compiling a rule it will check if the rule is matched by the expansion of any of the identifiers from step 1.

This would recognize that you're using a rule name that is covered by an identifier with a wildcard AFTER the rule with the wildcarded identifier is defined. I could turn this into an error so that the rules will not compile unless the order is correct.

plusvic · 2021-12-03T16:15:13Z

That sounds like a good compromise.

Make sure that when we compile a new rule that any existing rule sets do not match the new rule. Specifically we are making the following a compiler error: rule a1 { condition: true } rule b { condition: 1 of (a*) } rule a2 { condition: true } This is a compiler error because "a2" is defined after the usage of (a*) in rule b. There is still an edge case here. These rules compile: rule a { condition 1 of (a) } rule b { condition: 1 of (b*) } In both cases those rules evaluate to false, which is in line with the behavior of this rule: rule c { condition: c } Given that the behavior around rule sets matches the current behavior I think this is acceptable.

wxsBSD · 2021-12-04T03:07:29Z

I implemented the strict checking in the compiler. I ended up using a hash table to allow me to quickly check if a wildcard identifier has been processed already (so I can avoid adding it to the list). I'm not sure this is worth the tradeoff in memory usage during compilation, but I figure it was worth it so I can do quick lookups as I can see rule sets being used fairly often in.

plusvic · 2021-12-07T11:23:00Z

What if instead of having both a linked list and a hash table, we implement a mechanism for iterating all the items in a hash table? You are using the linked list for storing all the wildcard identifiers, while the hash table is used only for avoiding duplicate entries in the linked list, but the hash table already contains all you need, it's just that we don't have a way of iterating over all the items in the hash table.

In this case the hash table is just a collection of linked list of YR_HASH_TABLE_ENTRY structures, which already contain the original keys and values. When you perform a lookup in the hash table you go straight to one of those linked lists, but iterating over all the linked lists in the hash table is straightforward. It's just a matter of adding the appropriate API to hash.h

wxsBSD · 2021-12-07T11:26:30Z

I thought about doing that but decided not to since I wasn't sure it would be worth it for this one use case. I'll add it to this PR soon (currently working on addressing concerns in my outstanding gyp PR).

libyara/hash.c

libyara/parser.c

This adds the ability to use rule sets just like we do with string sets. Specifically, it enables conditions like the following: 5 of (a, b) 50% of (a*) where "a" is an existing rule and "b*" is a wildcard rule identifier, which is expanded out precisely as wildcard string identifiers are. I chose to make OP_OF and OP_OF_PERCENT emit an argument which is the "type" of set it is expected to match on. The two acceptable values are OF_STRING_SET and OF_RULE_SET. When OF_STRING_SET is used then the references on the stack are expected to be strings and OF_RULE_SET will check if the rules on the stack match. I added a yr_parser_emit_pushes_for_rules() which behaves similarly to the string counterpart. The one big exception is that I can't use yr_rules_foreach macro since that needs the terminating "null" rule which is not yet added, so instead I have to iterate the rules manually using yr_arena_get_ptr() to walk all the currently compiled rules. * Stricter ordering of rule set in compiler. Make sure that when we compile a new rule that any existing rule sets do not match the new rule. Specifically we are making the following a compiler error: rule a1 { condition: true } rule b { condition: 1 of (a*) } rule a2 { condition: true } This is a compiler error because "a2" is defined after the usage of (a*) in rule b. There is still an edge case here. These rules compile: rule a { condition 1 of (a) } rule b { condition: 1 of (b*) } In both cases those rules evaluate to false, which is in line with the behavior of this rule: rule c { condition: c } Given that the behavior around rule sets matches the current behavior I think this is acceptable. * Add docs for rule sets. * Implement hash table iterators. * Remove old comment about linked list. * Move variable declaration. * Iterate over rules better.

Add docs for rule sets.

1ee85fc

wxsBSD added 2 commits December 9, 2021 22:29

Implement hash table iterators.

05d86da

Remove old comment about linked list.

36aa9bd

plusvic reviewed Dec 10, 2021

View reviewed changes

libyara/hash.c Outdated Show resolved Hide resolved

Move variable declaration.

5bb261a

plusvic reviewed Dec 10, 2021

View reviewed changes

libyara/parser.c Outdated Show resolved Hide resolved

Iterate over rules better.

ff8fe40

plusvic approved these changes Dec 21, 2021

View reviewed changes

plusvic merged commit 7fabd95 into VirusTotal:master Dec 21, 2021

wxsBSD deleted the rule_sets branch December 21, 2021 13:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement rule sets. #1597

Implement rule sets. #1597

wxsBSD commented Dec 2, 2021

virusdefender commented Dec 2, 2021 •

edited

Loading

wxsBSD commented Dec 2, 2021 •

edited

Loading

plusvic commented Dec 3, 2021

wxsBSD commented Dec 3, 2021

wxsBSD commented Dec 3, 2021

plusvic commented Dec 3, 2021

wxsBSD commented Dec 4, 2021

plusvic commented Dec 7, 2021

wxsBSD commented Dec 7, 2021

Implement rule sets. #1597

Implement rule sets. #1597

Conversation

wxsBSD commented Dec 2, 2021

virusdefender commented Dec 2, 2021 • edited Loading

wxsBSD commented Dec 2, 2021 • edited Loading

plusvic commented Dec 3, 2021

wxsBSD commented Dec 3, 2021

wxsBSD commented Dec 3, 2021

plusvic commented Dec 3, 2021

wxsBSD commented Dec 4, 2021

plusvic commented Dec 7, 2021

wxsBSD commented Dec 7, 2021

virusdefender commented Dec 2, 2021 •

edited

Loading

wxsBSD commented Dec 2, 2021 •

edited

Loading