You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Even after encoding the MLIL instruction, we want to assign a more sophisticated data structure to the underlying expression we will be using. Particularly, using an abstract syntax tree (AST) is a good data structure to use when trying to define a formal language. Before we can even begin writing a representation of our AST, we need to define a formal specification for our language. This will be in the form of a context-free grammar which will represent all possible "programs" of boolean statements that can be represented in obfuscated programs. The goal of this component is to design a CFG to represent boolean expressions and write some classes which will represent nodes of the AST which a parser can build. One quick note is that there are many different classes of possible CFGs that can represent this language, however, we will be using a particular type of CFG known as an LL(1) grammar. Using an LL(1) grammar will make the process a lot easier when writing our parser. Remember that boolean expressions are slightly simplified because of our encoder but here are some sample boolean programs.
A | !A
A & (BC | !D)
(ZZ | !(D & !A))
Action Items
Write the following in parser/ast.py.
Design a context-free grammar (CFG) for all possible valid boolean "programs". Make sure to capture all example programs shown above.
Ensure that the designed CFG is a valid LL(1) grammar by drawing a parse table for this. Write this parse table in a new file called parser/README.md using a Markdown table. This will be useful when writing our paper to prove that we have an LL(1) grammar.
Write a few classes which represent each node in abstract syntax tree (AST) for a valid parse of a program defined by the designed CFG. Ensure that each class has a constructor, fields, and a method which overrides the building Python string representation (i.e. __str__) which will be used to print out the encoded form of the boolean expression when calling print on the start symbol.
Resources
UCLA CS 132 Notes. Notes from UCLA's compilers class which contain some information on designing LL(1) grammars and writing parsers. Chapters 2 & 3 are the only relevant ones for this project.
LL(1) Academy: A website used for learning about LL(1) grammars. It contains a tutorial for how to build a parse table and verify that a grammar is LL(1) along with some practice problems. Updated Link:http://ao.bliu.tech/
To be added. An example of writing an LL(1) parser in Python from Benson. Remind me to add this once I get a chance to finish it.
The text was updated successfully, but these errors were encountered:
bliutech
changed the title
Parser: Design a Context Free-Grammar for Boolean Expressions
Parser: Design a Context-Free Grammar for Boolean Expressions
Jul 9, 2024
Even after encoding the MLIL instruction, we want to assign a more sophisticated data structure to the underlying expression we will be using. Particularly, using an abstract syntax tree (AST) is a good data structure to use when trying to define a formal language. Before we can even begin writing a representation of our AST, we need to define a formal specification for our language. This will be in the form of a context-free grammar which will represent all possible "programs" of boolean statements that can be represented in obfuscated programs. The goal of this component is to design a CFG to represent boolean expressions and write some classes which will represent nodes of the AST which a parser can build. One quick note is that there are many different classes of possible CFGs that can represent this language, however, we will be using a particular type of CFG known as an LL(1) grammar. Using an LL(1) grammar will make the process a lot easier when writing our parser. Remember that boolean expressions are slightly simplified because of our encoder but here are some sample boolean programs.
Action Items
Write the following in
parser/ast.py
.parser/README.md
using a Markdown table. This will be useful when writing our paper to prove that we have an LL(1) grammar.__str__
) which will be used to print out the encoded form of the boolean expression when callingprint
on the start symbol.Resources
The text was updated successfully, but these errors were encountered: