Tree-of-AST is a static-pseudocode analysing approach inspired by
Tree of Thoughts: Deliberate Problem Solving with Large Language Models that can be used in both security-analysis and jobs debugging tracing backs, However, in Tree-of-AST we are creating a Tree-of-AST
embed with both code
generated by AST tools and LMs
's thought. With a basic input point, we can use evaluation on both tree node probability and creating a sub-tree of LM's response
Inspired by Tree of Thoughts: Deliberate Problem Solving with Large Language Models, A solution combining the tree of Aspartate transferase and the tree of Chain-Of-thought. However, we are not using chain-of-thought but creating a Chain-of-attack in favor of construction of ToTs, including heuristically evaluating states;
Construction of ToT can be summarized by four questions mentioned in the paper:
- How to decompose the intermediate process into thought steps;
- How to generate potential thoughts from each state;
- How to heuristically evaluate states;
- What search algorithm to use.
Focused on the job of analyzing via AST, in this case specifically analyzing possible vulnerabilities, we can conclude that:
- Information will given ( Potential Vulnerability: Tools like
bandit
, Tree construction: AST ) - To generate possible
K
, choosing fromSampling
, andProposing
, we will useProposing
sinceProposing
works better when the thought space is more constrained (e.g. each thought is just a word or a line), so proposing different thoughts in the same context avoids duplication. - Choosing from
Vote
andValue
or using together, we choose COMBINING TWO BOTH. Since vulnerability analyzing jobs require both parallel judgment and future possible prediction, the final call will be considered with weight of 8:2 in 10. Depth-first search (DFS)
since we are actually conducting analysis on an existing possible vulnerability.
The whole workflow should be like:
1. Static vulnerability existence detection (Bandit for instance)
2. AST Tree building from and analyzing (Finding xreferences and input)
3. Eval Tree nodes reversely using `Vote` and `Value` of external LMs
* If node score > threshold, mark the current node and restart on previous doubtful node using BHF.
* If no node score > threshold Until the final node is reached (t > T), mark as none.
4. Finally, export the marked branch, in further discussions, we will call it "Tree-of-AST"
For exploitations, we will use another approach based on ToT. To begin with, we will start on from tail-to-Head travaling the Tree-of-AST in order to proceed inverse operation of the Vuln-input. For each node we met in the original Tree-of-AST, we will call it a layer; For each layer we met, we will use LMs ( no-shot or few shot ) to generate a part of exploit, these exploits will be embed in this layer of Tree-of-AST.
Same as ToT goes, we will generate multiple nodes of exploits in a layer, however, we will only evaluate them in vote
, value
with lookahead simulations
in order to avoid issues such as unsure type, after this round of travel. Thus the weight should be 2:8 now
In the exploitation process, we might face the possibility that we can insert arbitrary since in this layer this part have not encountered any limitation, in this case, we will leave this part with <arbitrary>
tag to avoid future asserts since we are traveling from Tail-To-Head.
Additionally, if <arbitrary>
encounted unspecificed assert, such as len(), we can modify it by adding a limitation property <arbitrary limit='length < 7'>
.
Or we can re-travel the tree from head-to-tail, using ToT to reconstruct arbitrary
import os
class UserInput:
def __init__(self,input) -> None:
self.input = input
def __setattr__(self, name, value):
print(f'[!] {name} now is {value}')
super().__setattr__(name, value)
def vuln(input=input('Your address: ')):
# ANALYSIS-TOP: [<arbitrary>, <arbitrary>, <arbitrary>retr0reglove<COMMAND>]
user_input = UserInput(input=input).input
# ANALYSIS: tytytyty<arbitrary>, <arbitrary>, <arbitrary>retr0reglove<COMMAND>
if 'tytytyty' not in user_input[:7]:
exit(0)
else:
user_input = user_input[7:]
# ANALYSIS: <arbitrary>, <arbitrary>, <arbitrary>retr0reglove<COMMAND>
user_input = user_input.split(',')
# ANALYSIS UNSURE TYPE, lookahead simulations
# ANALYSIS: [<arbitrary>, <arbitrary>, <arbitrary>retr0reglove<COMMAND>]
user_input = user_input[3]
# ANALYSIS: <arbitrary>retr0reglove<COMMAND>,
# retr0reg<arbitrary>love<COMMAND>
if not 'retr0reg' in user_input:
exit()
# ANALYSIS-TAIL: <arbitrary>love<COMMAND>
user_input = user_input.split('love')[1]
return os.system(user_input)
vuln()
Think might be easier than doing, but in this case both is hard. To turn our ideal model in to reality acutally took lots of efforts. To begin with, we started on buiding a source-to-sink
alike framework using Python AST with reverse-tracking. However, the AST do not support direct reverse-traveling, thus we will have to assign each node with a fwd
, bck
and id
similar to glibc bidirectional linked list in dynamic memory allocation; while adding parents
and children
to remain the tree property of our AST.
class ToAfy(ast.NodeVisitor):
""" Adding father and children attribution to every node so we can reverse travel the AST """
def __init__(self, target_function=None, target_variable=None):
self.parent = None
self.last_id = 0
self.last_node = None
self.target_function = target_function
self.target_variable = target_variable
self.vulnerable_note = None
def generic_visit(self, node):
node.node_id = self.last_id
self.last_id += 1
# print(f'setting {node.id}')
if self.last_node is not None:
self.last_node.fwd = node
node.bck = self.last_node
else:
node.bck = None
node.fwd = None
self.last_node = node
# if node.bck and self.last_node:
# print(f'{node.id}: last_node\'s fwd {self.last_node.id}, note\'s bck {node.bck.id}')
if not hasattr(node, 'parent'):
node.parent = self.parent
if not hasattr(node, 'children'):
node.children = []
Additionally, we will have to locate the node that's representing our vulnerable code segment. this usually takes other round of positive directional traveling, nevertheless, if we sets the fwd
, bck
, id
while try locating that vulnerability sink, we can manage to use only once of positive directional traveling of the AST.
if isinstance(node, ast.Call):
full_function_name = get_full_function_name(node.func)
if full_function_name == self.target_function:
for arg in node.args:
if isinstance(arg, ast.Name) and arg.id == self.target_variable:
# VULN
self.vulnerable_note = node
Finally wrap up setting parent nodes.
self.parent = node
current_last_node = self.last_node
for child in ast.iter_child_nodes(node):
node.children.append(child)
self.visit(child)
self.last_node = current_last_node
## RESTORING self.last_node