Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement rule sets. #1597

Merged
merged 7 commits into from
Dec 21, 2021
Merged

Implement rule sets. #1597

merged 7 commits into from
Dec 21, 2021

Conversation

wxsBSD
Copy link
Collaborator

@wxsBSD wxsBSD commented Dec 2, 2021

This adds the ability to use rule sets just like we do with string sets.
Specifically, it enables conditions like the following:

5 of (a, b)
50% of (a*)

where "a" is an existing rule and "b*" is a wildcard rule identifier, which is
expanded out precisely as wildcard string identifiers are.

I chose to make OP_OF and OP_OF_PERCENT emit an argument which is the "type" of
set it is expected to match on. The two acceptable values are OF_STRING_SET and
OF_RULE_SET. When OF_STRING_SET is used then the references on the stack are
expected to be strings and OF_RULE_SET will check if the rules on the stack
match.

I added a yr_parser_emit_pushes_for_rules() which behaves similarly to the
string counterpart. The one big exception is that I can't use yr_rules_foreach
macro since that needs the terminating "null" rule which is not yet added, so
instead I have to iterate the rules manually using yr_arena_get_ptr() to walk
all the currently compiled rules.

This adds the ability to use rule sets just like we do with string sets.
Specifically, it enables conditions like the following:

5 of (a, b)
50% of (a*)

where "a" is an existing rule and "b*" is a wildcard rule identifier, which is
expanded out precisely as wildcard string identifiers are.

I chose to make OP_OF and OP_OF_PERCENT emit an argument which is the "type" of
set it is expected to match on. The two acceptable values are OF_STRING_SET and
OF_RULE_SET. When OF_STRING_SET is used then the references on the stack are
expected to be strings and OF_RULE_SET will check if the rules on the stack
match.

I added a yr_parser_emit_pushes_for_rules() which behaves similarly to the
string counterpart. The one big exception is that I can't use yr_rules_foreach
macro since that needs the terminating "null" rule which is not yet added, so
instead I have to iterate the rules manually using yr_arena_get_ptr() to walk
all the currently compiled rules.
@virusdefender
Copy link
Contributor

virusdefender commented Dec 2, 2021

a simple way to use this feature without this pr is using math.to_number(private rule name)

condition:
    # at lease two rules
    math.to_number(rule1) + math.to_number(rule2) + math.to_number(rule3) >= 2
condition:
    # add weight / score to rule
    math.to_number(rule1) * 20 + math.to_number(rule2) * 30 + math.to_number(rule3) * 30 >= 60

the problem is all rules must be matched at first, yara can not optimize this expression to stop earlier (for example, the first demo, if rule1 and rule2 can be matched, it's not necessary to run rule3)

@wxsBSD
Copy link
Collaborator Author

wxsBSD commented Dec 2, 2021

You are correct that this would work but there are two reasons this PR is a good idea. First, "2 of (rule1, rule2, rule3)" is a much more concise expression. Second, this supports wildcards so you can say "2 of (foo*)" and never have to update your condition if you add another rule.

Your point about weights is valid, and if you want to weight one rule more than another you need to use your approach.

@plusvic
Copy link
Member

plusvic commented Dec 3, 2021

This is really nice, but I have a concern that I'm not sure how to solve. When you have a condition like 50% of (a*), this will take into account rules starting with a and defined before the rule that contain the condition. Any rule starting with a but defined after that rule won't be taken into account. This same limitation exists today when you want to use some rule foo as part of the condition for another rule bar (foo must be defined before bar) but you can detect errors in compile time and notify the user. However, with the introduction of wildcards subtle errors can go unnoticed.

You are currently handling the case in which a rule with condition 50% of (a*) doesn't find any matching rule name, that's a good start, but still someone can add some other matching rule at the end of the source file and expect that this rule is also taken into account.

I guess the only solution for that is making the order in which rules are defined irrelevant, but that probably means more important changes, and introduces new situations that must be detected, like cyclical dependencies among rules.

I'm not sure how to tackle this.

@wxsBSD
Copy link
Collaborator Author

wxsBSD commented Dec 3, 2021

That's a really good point I had not thought of. Let me think about this a bit and maybe I can come up with a solution.

@wxsBSD
Copy link
Collaborator Author

wxsBSD commented Dec 3, 2021

You're right that doing anything at runtime is going to open up new situations to deal with. Instead, I think this can be turned into a compiler error. The rough plan I am going to try is:

  1. Anytime the compiler is dealing with an identifier with a wildcard it will remember the identifier.
  2. Anytime the compiler is compiling a rule it will check if the rule is matched by the expansion of any of the identifiers from step 1.

This would recognize that you're using a rule name that is covered by an identifier with a wildcard AFTER the rule with the wildcarded identifier is defined. I could turn this into an error so that the rules will not compile unless the order is correct.

@plusvic
Copy link
Member

plusvic commented Dec 3, 2021

That sounds like a good compromise.

Make sure that when we compile a new rule that any existing rule sets do not
match the new rule. Specifically we are making the following a compiler error:

rule a1 { condition: true }
rule b { condition: 1 of (a*) }
rule a2 { condition: true }

This is a compiler error because "a2" is defined after the usage of (a*) in rule
b.

There is still an edge case here. These rules compile:

rule a { condition 1 of (a) }
rule b { condition: 1 of (b*) }

In both cases those rules evaluate to false, which is in line with the behavior
of this rule:

rule c { condition: c }

Given that the behavior around rule sets matches the current behavior I think
this is acceptable.
@wxsBSD
Copy link
Collaborator Author

wxsBSD commented Dec 4, 2021

I implemented the strict checking in the compiler. I ended up using a hash table to allow me to quickly check if a wildcard identifier has been processed already (so I can avoid adding it to the list). I'm not sure this is worth the tradeoff in memory usage during compilation, but I figure it was worth it so I can do quick lookups as I can see rule sets being used fairly often in.

@plusvic
Copy link
Member

plusvic commented Dec 7, 2021

What if instead of having both a linked list and a hash table, we implement a mechanism for iterating all the items in a hash table? You are using the linked list for storing all the wildcard identifiers, while the hash table is used only for avoiding duplicate entries in the linked list, but the hash table already contains all you need, it's just that we don't have a way of iterating over all the items in the hash table.

In this case the hash table is just a collection of linked list of YR_HASH_TABLE_ENTRY structures, which already contain the original keys and values. When you perform a lookup in the hash table you go straight to one of those linked lists, but iterating over all the linked lists in the hash table is straightforward. It's just a matter of adding the appropriate API to hash.h

@wxsBSD
Copy link
Collaborator Author

wxsBSD commented Dec 7, 2021

I thought about doing that but decided not to since I wasn't sure it would be worth it for this one use case. I'll add it to this PR soon (currently working on addressing concerns in my outstanding gyp PR).

@plusvic plusvic merged commit 7fabd95 into VirusTotal:master Dec 21, 2021
@wxsBSD wxsBSD deleted the rule_sets branch December 21, 2021 13:29
tarterp pushed a commit to mandiant/yara that referenced this pull request Mar 31, 2022
This adds the ability to use rule sets just like we do with string sets.
Specifically, it enables conditions like the following:

5 of (a, b)
50% of (a*)

where "a" is an existing rule and "b*" is a wildcard rule identifier, which is
expanded out precisely as wildcard string identifiers are.

I chose to make OP_OF and OP_OF_PERCENT emit an argument which is the "type" of
set it is expected to match on. The two acceptable values are OF_STRING_SET and
OF_RULE_SET. When OF_STRING_SET is used then the references on the stack are
expected to be strings and OF_RULE_SET will check if the rules on the stack
match.

I added a yr_parser_emit_pushes_for_rules() which behaves similarly to the
string counterpart. The one big exception is that I can't use yr_rules_foreach
macro since that needs the terminating "null" rule which is not yet added, so
instead I have to iterate the rules manually using yr_arena_get_ptr() to walk
all the currently compiled rules.

* Stricter ordering of rule set in compiler.

Make sure that when we compile a new rule that any existing rule sets do not
match the new rule. Specifically we are making the following a compiler error:

rule a1 { condition: true }
rule b { condition: 1 of (a*) }
rule a2 { condition: true }

This is a compiler error because "a2" is defined after the usage of (a*) in rule
b.

There is still an edge case here. These rules compile:

rule a { condition 1 of (a) }
rule b { condition: 1 of (b*) }

In both cases those rules evaluate to false, which is in line with the behavior
of this rule:

rule c { condition: c }

Given that the behavior around rule sets matches the current behavior I think
this is acceptable.

* Add docs for rule sets.

* Implement hash table iterators.

* Remove old comment about linked list.

* Move variable declaration.

* Iterate over rules better.
maximelb pushed a commit to refractionPOINT/yara that referenced this pull request Nov 17, 2022
This adds the ability to use rule sets just like we do with string sets.
Specifically, it enables conditions like the following:

5 of (a, b)
50% of (a*)

where "a" is an existing rule and "b*" is a wildcard rule identifier, which is
expanded out precisely as wildcard string identifiers are.

I chose to make OP_OF and OP_OF_PERCENT emit an argument which is the "type" of
set it is expected to match on. The two acceptable values are OF_STRING_SET and
OF_RULE_SET. When OF_STRING_SET is used then the references on the stack are
expected to be strings and OF_RULE_SET will check if the rules on the stack
match.

I added a yr_parser_emit_pushes_for_rules() which behaves similarly to the
string counterpart. The one big exception is that I can't use yr_rules_foreach
macro since that needs the terminating "null" rule which is not yet added, so
instead I have to iterate the rules manually using yr_arena_get_ptr() to walk
all the currently compiled rules.

* Stricter ordering of rule set in compiler.

Make sure that when we compile a new rule that any existing rule sets do not
match the new rule. Specifically we are making the following a compiler error:

rule a1 { condition: true }
rule b { condition: 1 of (a*) }
rule a2 { condition: true }

This is a compiler error because "a2" is defined after the usage of (a*) in rule
b.

There is still an edge case here. These rules compile:

rule a { condition 1 of (a) }
rule b { condition: 1 of (b*) }

In both cases those rules evaluate to false, which is in line with the behavior
of this rule:

rule c { condition: c }

Given that the behavior around rule sets matches the current behavior I think
this is acceptable.

* Add docs for rule sets.

* Implement hash table iterators.

* Remove old comment about linked list.

* Move variable declaration.

* Iterate over rules better.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants