Skip to content

Interpreters for three simple programming languages created for self-educational purposes in Haskell

License

Notifications You must be signed in to change notification settings

eugenkiss/loopgotowhile

Repository files navigation

This project contains interpreters for the programming languages Loop, Goto & While that are based on Prof Dr. Heribert Vollmer's lecture notes for his introductory course in theoretical computer science. They are modeled to feel similar to common imperative programming languages but they are delibaretly greatly simplified. Furthermore, when possible, functions to transform source code from one language to another are provided, too.

I created a live web evaluation service where you can write some code and have it either evaluated or transformed. Additionally, to keep yourself entertained, you can find some exercises on that website, too.

Execute runtests.sh to run the tests.

Installation

  1. Install the Haskell platform. This will give you GHC and the cabal-install build tool.

  2. Clone this project and build it with cabal:

     git clone git://github.com/eugenkiss/loopgotowhile.git
     cd loopgotowhile-site
     cabal install
    

Now you can use this project as a library in Haskell source files.

Background

The purpose of these languages in Prof Dr. Heribert Vollmer's introductory course in theoretical computer science was to provide formal semantics for the specification of algorithms. Moreover, the languages were used to show the equivalence of turing complete languages. There was no executable implementation of these languages to my knowledge. Frankly, why should there have been? Their purpose didn't require that. However, I thought it would be an interesting exercise to create exectuable implementations for these languages. Even more so as I always wanted to create an executable implementation of a programming language and it was the perfect opportunity to get to know Haskell better.

Most of all, I wanted to see how elegantly the problem of creating parsers, evaluators and "transformators" for these languages could be expressed in Haskell. And I'm quote pleased with the result although I acknowledge that there is much to improve especially regarding the code duplication for transforming each language to its strict subset.

Design of the Code

In reality there aren't three languages but six; each language is given in a strict and extended version. When being evaluated a language is first transformed into its strict subset - as it is easier to write an evaluator for the strict subset - and only then (indirectly) evaluated.

Only strict Goto and strict While are evaluated directly. Loop is evaluated by first transforming it to While and then evaluating that While program. Although it would suffice to only have one directly evaluated language this is not done due to performance reasons. It is much faster to directly evaluate Goto as opposed to evaluating Goto through While.

Shared functionality is outsourced to the "Shared" namespace in order to reduce code duplication and thus reuse more code.

I know that much can be improved. If you have suggestions then contact me, please.

Note: Some of the "to-strict-transformation"-tests don't use a hand-written strict program to test the transformations against but instead they test against a simplified extended program that is automatically transformed to a strict program. The reason for this decision is that a strict program would simply be too long and too convoluted to comprehend the applied transformations. Nonetheless, these tests are not useless as the preceding, more primitive tests "ensure" that the transformations of the simplified extended version to a strict version will be correct.

Future Improvements

Code/Style Improvements

  • Remove code duplication especially regarding the code duplication for transforming each language to its strict subset. Also, try to reduce code duplication for While & Goto evaluation.

  • Some parts are quite ugly (see TODO's in the source code). Beautify!

  • Customization of Error Messages in Parsec leaves much to be desired.

  • Improve output of pretty printers (e.g. dismiss superfluous parentheses).

  • Unify naming convetions (e.g. if qualifed import AS than GotoAS not Goto)

  • Use more type system constraints

    • e.g. GotoStat = Label GotoStat | .... This definition e.g. allows Label (Label ...), which is actually not allowed in the Goto definition. If I could enforce the type to only allow nesting of 1 Level, then the compiler would have saved me really early from a bug that I accidentelly discovered really late...
    • The same is true for e.g. GotoStat = Seq [GotoStat] | .... It shouldn't be possible to define Seq (Seq ...) as that is the same as Seq ... and some functions rely on the latter "normalized" form. It is error prone to depend on not forgetting to normalize the data. Much better would be an early compiler error message.

Feature Additions

  • Command line tools for each language

    • Options
      • -p, --prettyprint
      • -s, --strict
      • -t [loop,goto,while], --transform [loop,goto,while] (list only those that are possible)
      • -o, --output (default: stdout)
    • Extract common functionality to Shared
    • Create executables in cabal
  • Optimizations

    • Create benchmark suite
    • Maybe improve efficiency of arithmetic functions (e.g. is / reduction to strict code efficient or is it possible to make the algorithm faster)
    • Maybe write evaluator for extended While/Goto so that operations like x0 := x0 + x1 are one instruction and not a loop.
    • Maybe JIT?
    • Maybe use hashtable implementation of Map in order to speed up lookup of variables?
    • Other optimizations? Loop invariants (analysis)?
  • Function Definitions

Tests

Test duplication/redundancy could be greatly reduced if at first code duplication is reduced. This way, tests could be reused on the one hand and on the other hand a lot of "to-strict-transformation" tests for While and Goto could be removed since it would suffice to test Loop's "to-strict-transformations" thouroughly if all languages shared (almost) the same code.

Language Specifications

You can only implement functions whose codomain are the natural numbers and whose domain are the natural numbers to the power of n, where n is the number of arguments. Also, every variable is by default initialized to 0 and arguments, if any, are stored in x1,...,xn. That means, providing the argument list 4,87,3 is the same as prepending x1 := 4; x2 := 87, x3 := 3 to the top of your program. Also, note that each language is case sensitive!

Loop

An exemplary (extended and pointless) Loop program looks as follows:

x4 := abc + y * 4;
IF hg != 7 || hg != 7 THEN 
  LOOP x4 - (x4 + 1) DO
    x0 := 1
  END
ELSE 
  x0 := counter
END

Syntax

The syntactical components are

  • Variables: x0,x1,x2,x3,x4,...
  • Constants: 0,1,2,3,4,...
  • Operators: + and -
  • Separators: ; and :=
  • Keywords: LOOP,DO and END

Let xi and xj be variables and let c be a constant. Then

xi := xj + c

and

xi := xj - c

are Loop programs.

Let P1 and P2 be Loop programs. Then

P1; P2

is a Loop Program.

Let P be a Loop program and let xi be a variable. Then

LOOP xi DO P END

is a Loop Program.

Sematics

Let P be a Loop Program. P computes a function f: ℕ^k → ℕ like so: At the beginning of the computation the arguments n1,...,nk ∈ ℕ are to be found in the variables x1,...,xk. All other variables have the starting value 0. P is executed as follows:

  • By executing the program xi := xj + c xi is assigned the value of xj + c.
  • By executing the program xi := xj - c xi is assigned the value of xj - c if the value is non-negative. Otherwise xi is assigned 0.
  • By executing the program P1; P2 at first P1 is executed and after that P2.
  • The execution of the program LOOP xi DO P' END happens as follows: The Program P' is executed as often as the value of xi has been at the Beginning. That means, assignments to xi inside P' do not influence the number of repetitions.

The result of P's execution is the value of x0 or put in another way f(n1,...,nk) = Value of x0 after execution.

A function f: ℕ^k → ℕ is called Loop-computable if there exists a Loop program that computes f as described above.

Extensions

Variables can be named arbitrarily. The only restrictions are that they begin with a letter and are not a keyword, e.g. counter4 is a valid variable identifier. Apart from that the following Loop programs are all valid and have the respective intuitive semantics:

  • xi := xj (stands for xi := xj + 0)
  • xi := c (stands for xi := xj + c where xj is an unused variable)
  • xi := aexp where aexp is an arbitrary arithmetic expression consisting of variables, constants, (optional) parantheses and operators (+,-,*,/,^ and %) like for example y * (hg + 8 / x8) - 2 % 4.
  • LOOP aexp DO P END where aexp is an arbitrary arithmetic expression.
  • IF xi = 0 THEN P END
  • IF !(xi < xj && !(xk != 3)) THEN P1 ELSE P2 END
  • IF bexp THEN P END where bexp is an arbitrary "boolean" expression consisting of variables, constants, (optional) parantheses, relational operators (<,<=,>,>=,= and !=) and boolean operators (&&,|| and !).
  • IF bexp THEN P1 ELSE P2 END

Also, you are allowed to insert comments in your source code. The syntax is similar to Java's comment syntax, i.e. // introduces a comment whose scope ends at the next line and /* */ can be used for multiline comments.

Goto

An exemplary (extended and pointless) Goto program looks as follows:

M1: x4 := abc + y * 4;
Cool: IF hg != 7 || hg != 7 THEN GOTO M1 ELSE HALT END;
GOTO Cool;
M42: HALT

Syntax

A Goto program is a succession

M1: A1;
M2: A2;
   ⋮
Mk: Ak

where Mi is a so called label and Ai is an instruction. Possible instructions are:

  • Value assignment: xi := xj ± c (xi, xj are variables, c is a constant)
  • Jump: GOTO Mi
  • Conditional Jump: IF xi = c THEN GOTO Mj
  • Stop instruction: HALT

The last instruction is either a stop instruction or a jump.

Sematics

The execution of a Goto program starts with the first instruction. The execution of instructions of each type is as follows:

  • xi := xj ± c: The value of xi becomes xj ± c and the next instruction is executed.
  • GOTO Mi: Proceed with the instruction with label Mi.
  • IF xi = c THEN GOTO Mj: If the value of xi is equal to c proceed with the instruction with label Mj. Otherwise the next instruction is executed.
  • HALT: Stop the execution of the program.

A jump to a label that is not existent in the program is not defined.

Let P be a Goto Program. P computes a function f: ℕ^k → ℕ like so: At the beginning of the computation the arguments n1,...,nk ∈ ℕ are to be found in the variables x1,...,xk. All other variables have the starting value 0.

f(n1,...,nk) is the value of x0 after execution if P terminates. Otherwise f(n1,...,nk) is undefined.

A function f: ℕ^k → ℕ is called Goto-computable if there exists a Goto program that computes f as described above.

Extensions

The program doesn't have to end with a jump or a stop instruction. Labels can be named arbitrarily or can be omitted altogether. Note, tough, that labels must be unique. IF statements must be completed with the lexeme END because they can contain several statements. Furthermore, a HALT statement may appear in the body of an IF.

Apart from that, all extensions from the Loop language - except for the LOOP construct which is not present in the extended Goto language - apply to the extended Goto language.

While

The While language is an extension of the Loop language. An exemplary (extended and pointless) While program looks as follows:

x4 := abc + y * 4;
IF hg != 7 || hg != 7 THEN 
  WHILE !(x4 = x4 + 1) DO
    x0 := 1
  END
ELSE 
  x0 := counter
END

Syntax

Apart from the LOOP construct which is not part of the While language the syntax is the same as that of Loop. Additionally, a new keyword (WHILE) with the accompanying syntactical construct, namely the WHILE loop, is introduced.

Let P be a While program and let xi be a variable. Then

WHILE xi != 0 DO P END

is a While program.

Sematics

The execution of WHILE xi != 0 DO P END happens so, that the program P is executed as long as the value of xi is not equal to 0.

Let P be a While Program. P computes a function f: ℕ^k → ℕ like so: At the beginning of the computation the arguments n1,...,nk ∈ ℕ are to be found in the variables x1,...,xk. All other variables have the starting value 0.

f(n1,...,nk) is the value of x0 after execution if P terminates. Otherwise f(n1,...,nk) is undefined.

A function f: ℕ^k → ℕ is called While-computable if there exists a While program that computes f as described above.

Extensions

Apart from the LOOP related extensions - since the While language has no LOOP construct - the Loop extensions are all valid While extensions. Additionally, the head of a WHILE loop can have an arbitrary boolean expression, e.g. WHILE xyz != 9 || u = 8 DO P END is a valid (extended) While program.

About

Interpreters for three simple programming languages created for self-educational purposes in Haskell

Resources

License

Stars

Watchers

Forks

Packages

No packages published