Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser: Design a Context-Free Grammar for Boolean Expressions #10

Closed
3 tasks done
bliutech opened this issue Jul 9, 2024 · 0 comments · Fixed by #31
Closed
3 tasks done

Parser: Design a Context-Free Grammar for Boolean Expressions #10

bliutech opened this issue Jul 9, 2024 · 0 comments · Fixed by #31
Assignees

Comments

@bliutech
Copy link
Owner

bliutech commented Jul 9, 2024

Even after encoding the MLIL instruction, we want to assign a more sophisticated data structure to the underlying expression we will be using. Particularly, using an abstract syntax tree (AST) is a good data structure to use when trying to define a formal language. Before we can even begin writing a representation of our AST, we need to define a formal specification for our language. This will be in the form of a context-free grammar which will represent all possible "programs" of boolean statements that can be represented in obfuscated programs. The goal of this component is to design a CFG to represent boolean expressions and write some classes which will represent nodes of the AST which a parser can build. One quick note is that there are many different classes of possible CFGs that can represent this language, however, we will be using a particular type of CFG known as an LL(1) grammar. Using an LL(1) grammar will make the process a lot easier when writing our parser. Remember that boolean expressions are slightly simplified because of our encoder but here are some sample boolean programs.

A | !A
A & (BC | !D)
(ZZ | !(D & !A))

Action Items

Write the following in parser/ast.py.

  • Design a context-free grammar (CFG) for all possible valid boolean "programs". Make sure to capture all example programs shown above.
  • Ensure that the designed CFG is a valid LL(1) grammar by drawing a parse table for this. Write this parse table in a new file called parser/README.md using a Markdown table. This will be useful when writing our paper to prove that we have an LL(1) grammar.
  • Write a few classes which represent each node in abstract syntax tree (AST) for a valid parse of a program defined by the designed CFG. Ensure that each class has a constructor, fields, and a method which overrides the building Python string representation (i.e. __str__) which will be used to print out the encoded form of the boolean expression when calling print on the start symbol.

Resources

  • UCLA CS 132 Notes. Notes from UCLA's compilers class which contain some information on designing LL(1) grammars and writing parsers. Chapters 2 & 3 are the only relevant ones for this project.
  • LL(1) Academy: A website used for learning about LL(1) grammars. It contains a tutorial for how to build a parse table and verify that a grammar is LL(1) along with some practice problems. Updated Link: http://ao.bliu.tech/
  • To be added. An example of writing an LL(1) parser in Python from Benson. Remind me to add this once I get a chance to finish it.
@bliutech bliutech changed the title Parser: Design a Context Free-Grammar for Boolean Expressions Parser: Design a Context-Free Grammar for Boolean Expressions Jul 9, 2024
@timoslater timoslater self-assigned this Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants